System: 1 GHz ARM Cortex A9 (Pandaboard)

Compiler: gcc 4.4.5 (native)

Configure settings: --enable-single --enable-neon --enable-fma --enable-armv7a-cycle-counter ARM_CPU_TYPE=cortex-a9

Comments: Using NEON's fused multiply-add instructions, peak speeds of about 1 GF are reached. Average performance is 5-10% slower than using separate vector multiply and add instructions.

Complex 1-D Powers of Two Complex 1-D Non-powers of Two Real 1-D Powers of Two Real 1-D Non-powers of Two Complex 2-D Powers of Two Complex 2-D Non-powers of Two Real 2-D Powers of Two Real 2-D Non-powers of Two Complex 3-D Powers of Two Complex 3-D Non-powers of Two Real 3-D Powers of Two Real 3-D Non-powers of Two

Detailed NEON timing data for these cases

Copyright © 2010-11 Vesperix Corporation