Building FFTW-ARM Vesperix Logo

The latest version of the mainline FFTW distribution (FFTW 3.3.4) includes support for ARM NEON. It was developed independently by the original developers of FFTW, and is available from the FFTW download page.

We will continue to make FFTW-ARM available here for users too stubborn to change, but we strongly suggest transitioning to the mainline distribution since it will be supported by the original developers. FFTW-ARM is no longer supported or maintained.

The process for building FFTW-ARM is the same as FFTW 3.2.2; if you do not have experience doing this, please take a look at the installation section of the FFTW manual and run through the build process with the unmodified FFTW 3.2.2 source first.

Summary for the Impatient:

  1. Get the source, untar it, cd into fftw-3.2.2-arm
  2. ./configure --enable-single --enable-neon (cycle counter options are also available, see below)
  3. make
  4. make check (please do not skip this step!)
  5. Let us know of any problems you find, or of successful builds on new system types

For additional configuration options and more detailed information, follow the steps below:

Step 1: Download and unpack the source code for FFTW-ARM

The source is available in tar.gz format or zip format.

Unpack it with tar xvzf fftw-3.2.2-arm.tar.gz or unzip fftw-3.2.2-arm.zip.

All the remaining steps happen in the source directory, so cd fftw-3.2.2-arm.

Step 2: Determine your settings and run ./configure

It's very likely that you want to use the options --enable-single --enable-neon, since they're the whole reason for FFTW-ARM; additional options are described below. All of the standard configure settings for FFTW 3.2.2 are still available, of course, and they should all behave in the ususal way. The full list of options can be shown using ./configure --help.

Generating NEON SIMD Code

Use the configure option --enable-neon (which requires --enable-single) to cause SIMD code for NEON to be generated. If you do not specify this option, FFTW-ARM should generate scalar code just like the standard FFTW 3.2.2.

Note that the gcc shipped with some distributions, including those based on Angstrom, requires explicit specification of the CPU and ABI before it will generate NEON code. Check the output of configure to be certain that it reports arm_neon.h as usable. If it does not, try passing the option ARM_FLOAT_ABI=softfp to configure.

The default is to generate code that does not use fused multiply-add instructions, because we have found that using NEON FMA is 5-10% slower on Cortex A8 and A9 processors (see the benchmarks section for details). The NEON routines fully support FMA, and you can cause these instructions to be used with the standard FFTW 3.2.2 option --enable-fma. Please let us know if you find a system where FFTW-ARM is faster with FMA enabled.

Cycle Counters

FFTW can use a cycle counter to provide accurate timings of the alternative ways to compute an FFT of the desired length. The best way found is saved in a plan that should be reused for later FFTs of the same length. If you do not have a cycle counter you can either

The timer comparisons in the benchmarks section show the speeds obtained with and without cycle counters. The standard --with-slow-timer gives performance as good as a cycle counter in almost all cases we've tested; it just takes much longer to generate a plan the first time you compute an FFT of a given length.

If you prefer or require a cycle counter, we have provided two alternatives, one using the perf events subsystem that has been added to the Linux kernel starting with version 2.6.31, and one using Mans Rullgard's earlier USER_PMON (userspace access to performance monitors) kernel patch. Unfortunately, there are a huge number of kernel variants used on ARMv7 processors, and neither of the cycle counter alternatives we provide is certain to be available out of the box.

We provide small test programs below to check whether your kernel supports the counters we provide, but if it does not, then you'll need to recompile your kernel to use either of these cycle counters. See the FAQ and your distribution's instructions on recompiling the kernel for more information.

Using the perf events cycle counter:
--enable-perf-events
You can compile and run test_perf_events.c to verify that your kernel supports hardware perf events. Note that some OMAP4 kernels (e.g. Ubuntu 10.10) only have software perf events enabled, which are no help to FFTW; the test program will tell you that the hardware cycle counter is not available. If this happens to you, try the USER_PMON patch below.
Using the ARMv7a cycle counter:
--enable-arm-v7a-cycle-counter
You can compile and run test_user_pmon.c to verify that your cycle counter is working after you have applied Mans Rullgard's USER_PMON patch (available here) and recompiled your kernel.
Using mach_absolute_time on iOS Systems:
On iOS systems like the iPhone and iPad, we believe (but have not yet confirmed) that the mach_absolute_time function, already supported by FFTW 3.2.2, should be detected by configure and used automatically as a cycle counter. You should not need to specify a configure option for this to happen. Please let us know if you can confirm whether mach_absolute_time works or fails on iOS ARM systems.

Additional Options

Two new command line arguments, ARM_CPU_TYPE and ARM_FLOAT_ABI, have been added to allow passing additional flags to gcc or any other compiler (such as clang) that accepts the same command-line argument syntax.

Specify the ARM CPU type:
ARM_CPU_TYPE=[any value supported by the -mcpu= option of your compiler]
For example: "ARM_CPU_TYPE=cortex-a9". If this is not specified, the compiler is passed march=armv7a instead. Some versions of gcc require a CPU type to be specified before they will generate NEON code, even when the NEON code is inline assembly (as our SIMD kernels are).
Specify the floating point ABI:
ARM_FLOAT_ABI=[any value supported by the -mfloat-abi= option of your compiler]
For example: "ARM_FLOAT_ABI=softfp" is required by some compilers (including those in most Angstrom-based distributions and the Code Sourcery cross-compilers) to generate NEON code. If this is not specified, no mfloat-abi specification is passed, and the compiler default is used.

All the other command line arguments supported by FFTW 3.2.2 (CC, CFLAGS, CPPFLAGS, etc.) should work as usual.

Examples:

Single precision, using NEON and no cycle counter:
./configure --enable-single --enable-neon
Single precision, using NEON and the perf events cycle counter:
./configure --enable-single --enable-neon --enable-perf-events
Single precision, using NEON and the ARMv7a cycle counter:
./configure --enable-single --enable-neon --enable-arm-v7a-cycle-counter
Single precision, using NEON and the standard FFTW 3.2.2 slow timer option:
./configure --enable-single --enable-neon --with-slow-timer
Single precision, using NEON and the ARMv7a cycle counter, on Angstrom or any other distribution where gcc requires explicit specification of CPU type and ABI optimizing for the Cortex A9:
./configure --enable-single --enable-neon --enable-arm-v7a-cycle-counter ARM_CPU_TYPE=cortex-a9 ARM_FLOAT_ABI=softfp
Single precision, using NEON and the ARMv7a cycle counter, cross-compiling from an x86 Linux system with Code Sourcery G++, optimizing for the Cortex A9:
./configure --enable-single --enable-neon --enable-arm-v7a-cycle-counter --build=i686-pc-linux-gnu --host=arm-none-linux-gnueabi --disable-fortran ARM_CPU_TYPE=cortex-a9 ARM_FLOAT_ABI=softfp
Single precision, using NEON and the perf events cycle counter, compiling with Clang (which does not automatically set preprocessor flags or pass options to the linker like gcc):
./configure --enable-single --enable-neon --enable-perf-events CC=clang CPPFLAGS="-D__ARM_NEON__" CFLAGS="-O3 -Wa,-mcpu=cortex-a8 -Wa,-mfpu=neon"

If the configure process fails, please take a look at its output. The configure script is designed to check whether the system has everything it needs to build FFTW, and the error messages can often tell you what part of the system is missing or misconfigured. If you cannot easily resolve the problem using this process, please let us know what happened, including the output of the script.

Step 3: Make the FFTW library

make

If the make process fails, please let us know, and include any messages given by make.

Step 4: Check the results

make check

If you are ready to leave for the day, please also run a more comprensive set of tests. These will take several hours, but will help us make sure that the results are correct for a very wide range of problems:

cd tests
make bigcheck

Copyright © 2011-4 Vesperix Corporation