System: 1 GHz ARM Cortex A9 (Pandaboard)

Compiler: gcc 4.4.5 (native)

Configure settings: --enable-single --enable-neon --enable-armv7a-cycle-counter ARM_CPU_TYPE=cortex-a9

Comments: Scalar performance is 4-7 times better on the Cortex A9 than the Cortex A8, thanks to a much better VFP unit, while NEON performance is only slightly improved. This makes the typical speed gain from NEON 1.2-1.5x on the A9, with peak rates around 1.0 gigaflops on a single core. As with the A8, it is faster not to use NEON's fused multiply-add instructions.

Complex 1-D Powers of Two Complex 1-D Non-powers of Two Real 1-D Powers of Two Real 1-D Non-powers of Two Complex 2-D Powers of Two Complex 2-D Non-powers of Two Real 2-D Powers of Two Real 2-D Non-powers of Two Complex 3-D Powers of Two Complex 3-D Non-powers of Two Real 3-D Powers of Two Real 3-D Non-powers of Two

Detailed NEON timing data for these cases

Copyright © 2010-11 Vesperix Corporation