The latest version of the mainline FFTW distribution (FFTW 3.3.4) includes support for ARM NEON. It was developed independently by the original developers of FFTW, and is available from the FFTW download page.
We will continue to make FFTW-ARM available here for users too stubborn to change, but we strongly suggest transitioning to the mainline distribution since it will be supported by the original developers. FFTW-ARM is no longer supported or maintained.
The process for building FFTW-ARM is the same as FFTW 3.2.2; if you do not have experience doing this, please take a look at the installation section of the FFTW manual and run through the build process with the unmodified FFTW 3.2.2 source first.
For additional configuration options and more detailed information, follow the steps below:
Unpack it with tar xvzf fftw-3.2.2-arm.tar.gz or unzip fftw-3.2.2-arm.zip.
All the remaining steps happen in the source directory, so cd fftw-3.2.2-arm.
It's very likely that you want to use the options --enable-single --enable-neon, since they're the whole reason for FFTW-ARM; additional options are described below. All of the standard configure settings for FFTW 3.2.2 are still available, of course, and they should all behave in the ususal way. The full list of options can be shown using ./configure --help.
Use the configure option --enable-neon (which requires --enable-single) to cause SIMD code for NEON to be generated. If you do not specify this option, FFTW-ARM should generate scalar code just like the standard FFTW 3.2.2.
Note that the gcc shipped with some distributions, including those based on Angstrom, requires explicit specification of the CPU and ABI before it will generate NEON code. Check the output of configure to be certain that it reports arm_neon.h as usable. If it does not, try passing the option ARM_FLOAT_ABI=softfp to configure.
The default is to generate code that does not use fused multiply-add instructions, because we have found that using NEON FMA is 5-10% slower on Cortex A8 and A9 processors (see the benchmarks section for details). The NEON routines fully support FMA, and you can cause these instructions to be used with the standard FFTW 3.2.2 option --enable-fma. Please let us know if you find a system where FFTW-ARM is faster with FMA enabled.
FFTW can use a cycle counter to provide accurate timings of the alternative ways to compute an FFT of the desired length. The best way found is saved in a plan that should be reused for later FFTs of the same length. If you do not have a cycle counter you can either
The timer comparisons in the benchmarks section show the speeds obtained with and without cycle counters. The standard --with-slow-timer gives performance as good as a cycle counter in almost all cases we've tested; it just takes much longer to generate a plan the first time you compute an FFT of a given length.
If you prefer or require a cycle counter, we have provided two alternatives, one using the perf events subsystem that has been added to the Linux kernel starting with version 2.6.31, and one using Mans Rullgard's earlier USER_PMON (userspace access to performance monitors) kernel patch. Unfortunately, there are a huge number of kernel variants used on ARMv7 processors, and neither of the cycle counter alternatives we provide is certain to be available out of the box.
We provide small test programs below to check whether your kernel supports the counters we provide, but if it does not, then you'll need to recompile your kernel to use either of these cycle counters. See the FAQ and your distribution's instructions on recompiling the kernel for more information.
Two new command line arguments, ARM_CPU_TYPE and ARM_FLOAT_ABI, have been added to allow passing additional flags to gcc or any other compiler (such as clang) that accepts the same command-line argument syntax.
All the other command line arguments supported by FFTW 3.2.2 (CC, CFLAGS, CPPFLAGS, etc.) should work as usual.
If the configure process fails, please take a look at its output. The configure script is designed to check whether the system has everything it needs to build FFTW, and the error messages can often tell you what part of the system is missing or misconfigured. If you cannot easily resolve the problem using this process, please let us know what happened, including the output of the script.
If the make process fails, please let us know, and include any messages given by make.
If you are ready to leave for the day, please also run a more comprensive set of tests. These will take several hours, but will help us make sure that the results are correct for a very wide range of problems:
Copyright © 2011-4 Vesperix Corporation