System: 1 GHz ARM Cortex A8 (Beagleboard XM)

Compiler: gcc 4.4.5 (native)

Configure settings: --enable-single --enable-neon --enable-perf-events ARM_CPU_TYPE=cortex-a8

Comments: This is why NEON support matters: 5-8 times the performance on a wide range of FFT sizes. Peak performance is typically 600 MF to 1 GF. These results were obtained without using the fused multiply-add instructions of the NEON FPU; see here for an identical set of benchmarks with FMA enabled (and 5-10% lower performance).

Complex 1-D Powers of Two Complex 1-D Non-powers of Two Real 1-D Powers of Two Real 1-D Non-powers of Two Complex 2-D Powers of Two Complex 2-D Non-powers of Two Real 2-D Powers of Two Real 2-D Non-powers of Two Complex 3-D Powers of Two Complex 3-D Non-powers of Two Real 3-D Powers of Two Real 3-D Non-powers of Two

Detailed NEON timing data for these cases

Copyright © 2010-11 Vesperix Corporation