362 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			362 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top | ||
|  | @chapter Installation and Customization | ||
|  | @cindex installation | ||
|  | 
 | ||
|  | This chapter describes the installation and customization of FFTW, the | ||
|  | latest version of which may be downloaded from | ||
|  | @uref{http://www.fftw.org, the FFTW home page}. | ||
|  | 
 | ||
|  | In principle, FFTW should work on any system with an ANSI C compiler | ||
|  | (@code{gcc} is fine).  However, planner time is drastically reduced if | ||
|  | FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter | ||
|  | support for all modern general-purpose CPUs, but you may need to add a | ||
|  | couple of lines of code if your compiler is not yet supported | ||
|  | (@pxref{Cycle Counters}).  (On Unix, there will be a warning at the end | ||
|  | of the @code{configure} output if no cycle counter is found.) | ||
|  | @cindex cycle counter | ||
|  | @cindex compiler | ||
|  | @cindex portability | ||
|  | 
 | ||
|  | 
 | ||
|  | Installation of FFTW is simplest if you have a Unix or a GNU system, | ||
|  | such as GNU/Linux, and we describe this case in the first section below, | ||
|  | including the use of special configuration options to e.g. install | ||
|  | different precisions or exploit optimizations for particular | ||
|  | architectures (e.g. SIMD).  Compilation on non-Unix systems is a more | ||
|  | manual process, but we outline the procedure in the second section.  It | ||
|  | is also likely that pre-compiled binaries will be available for popular | ||
|  | systems. | ||
|  | 
 | ||
|  | Finally, we describe how you can customize FFTW for particular needs by | ||
|  | generating @emph{codelets} for fast transforms of sizes not supported | ||
|  | efficiently by the standard FFTW distribution. | ||
|  | @cindex codelet | ||
|  | 
 | ||
|  | @menu | ||
|  | * Installation on Unix:: | ||
|  | * Installation on non-Unix systems:: | ||
|  | * Cycle Counters:: | ||
|  | * Generating your own code:: | ||
|  | @end menu | ||
|  | 
 | ||
|  | @c ------------------------------------------------------------ | ||
|  | 
 | ||
|  | @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization | ||
|  | @section Installation on Unix | ||
|  | 
 | ||
|  | FFTW comes with a @code{configure} program in the GNU style. | ||
|  | Installation can be as simple as: | ||
|  | @fpindex configure | ||
|  | 
 | ||
|  | @example | ||
|  | ./configure | ||
|  | make | ||
|  | make install | ||
|  | @end example | ||
|  | 
 | ||
|  | This will build the uniprocessor complex and real transform libraries | ||
|  | along with the test programs.  (We recommend that you use GNU | ||
|  | @code{make} if it is available; on some systems it is called | ||
|  | @code{gmake}.)  The ``@code{make install}'' command installs the fftw | ||
|  | and rfftw libraries in standard places, and typically requires root | ||
|  | privileges (unless you specify a different install directory with the | ||
|  | @code{--prefix} flag to @code{configure}).  You can also type | ||
|  | ``@code{make check}'' to put the FFTW test programs through their paces. | ||
|  | If you have problems during configuration or compilation, you may want | ||
|  | to run ``@code{make distclean}'' before trying again; this ensures that | ||
|  | you don't have any stale files left over from previous compilation | ||
|  | attempts. | ||
|  | 
 | ||
|  | The @code{configure} script chooses the @code{gcc} compiler by default, | ||
|  | if it is available; you can select some other compiler with: | ||
|  | @example | ||
|  | ./configure CC="@r{@i{<the name of your C compiler>}}" | ||
|  | @end example | ||
|  | 
 | ||
|  | The @code{configure} script knows good @code{CFLAGS} (C compiler flags) | ||
|  | @cindex compiler flags | ||
|  | for a few systems.  If your system is not known, the @code{configure} | ||
|  | script will print out a warning.  In this case, you should re-configure | ||
|  | FFTW with the command | ||
|  | @example | ||
|  | ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}" | ||
|  | @end example | ||
|  | and then compile as usual.  If you do find an optimal set of | ||
|  | @code{CFLAGS} for your system, please let us know what they are (along | ||
|  | with the output of @code{config.guess}) so that we can include them in | ||
|  | future releases. | ||
|  | 
 | ||
|  | @code{configure} supports all the standard flags defined by the GNU | ||
|  | Coding Standards; see the @code{INSTALL} file in FFTW or | ||
|  | @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}. | ||
|  | Note especially @code{--help} to list all flags and | ||
|  | @code{--enable-shared} to create shared, rather than static, libraries. | ||
|  | @code{configure} also accepts a few FFTW-specific flags, particularly: | ||
|  | 
 | ||
|  | @itemize @bullet | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex precision | ||
|  | @code{--enable-float}: Produces a single-precision version of FFTW | ||
|  | (@code{float}) instead of the default double-precision (@code{double}). | ||
|  | @xref{Precision}. | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex precision | ||
|  | @code{--enable-long-double}: Produces a long-double precision version of | ||
|  | FFTW (@code{long double}) instead of the default double-precision | ||
|  | (@code{double}).  The @code{configure} script will halt with an error | ||
|  | message if @code{long double} is the same size as @code{double} on your | ||
|  | machine/compiler.  @xref{Precision}. | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex precision | ||
|  | @code{--enable-quad-precision}: Produces a quadruple-precision version | ||
|  | of FFTW using the nonstandard @code{__float128} type provided by | ||
|  | @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures, | ||
|  | instead of the default double-precision (@code{double}).  The | ||
|  | @code{configure} script will halt with an error message if the | ||
|  | compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s | ||
|  | @code{libquadmath} library is not installed.  @xref{Precision}. | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex threads | ||
|  | @code{--enable-threads}: Enables compilation and installation of the | ||
|  | FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a | ||
|  | simple interface to parallel transforms for SMP systems.  By default, | ||
|  | the threads routines are not compiled. | ||
|  | 
 | ||
|  | @item | ||
|  | @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP | ||
|  | compiler directives in order to induce parallelism rather than | ||
|  | spawning its own threads directly, and installing an @samp{fftw3_omp} library | ||
|  | rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded            | ||
|  | FFTW}).  You can use both @code{--enable-openmp} and @code{--enable-threads} | ||
|  | since they compile/install libraries with different names.  By default, | ||
|  | the OpenMP routines are not compiled. | ||
|  | 
 | ||
|  | @item | ||
|  | @code{--with-combined-threads}: By default, if @code{--enable-threads} | ||
|  | is used, the threads support is compiled into a separate library that | ||
|  | must be linked in addition to the main FFTW library.  This is so that | ||
|  | users of the serial library do not need to link the system threads | ||
|  | libraries.  If @code{--with-combined-threads} is specified, however, | ||
|  | then no separate threads library is created, and threads are included | ||
|  | in the main FFTW library.  This is mainly useful under Windows, where | ||
|  | no system threads library is required and inter-library dependencies | ||
|  | are problematic. | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex MPI | ||
|  | @code{--enable-mpi}: Enables compilation and installation of the FFTW | ||
|  | MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides | ||
|  | parallel transforms for distributed-memory systems with MPI.  (By | ||
|  | default, the MPI routines are not compiled.)  @xref{FFTW MPI | ||
|  | Installation}. | ||
|  | 
 | ||
|  | @item | ||
|  | @cindex Fortran-callable wrappers | ||
|  | @code{--disable-fortran}: Disables inclusion of legacy-Fortran | ||
|  | wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard | ||
|  | FFTW libraries.  These wrapper routines increase the library size by | ||
|  | only a negligible amount, so they are included by default as long as | ||
|  | the @code{configure} script finds a Fortran compiler on your system. | ||
|  | (To specify a particular Fortran compiler @i{foo}, pass | ||
|  | @code{F77=}@i{foo} to @code{configure}.) | ||
|  | 
 | ||
|  | @item | ||
|  | @code{--with-g77-wrappers}: By default, when Fortran wrappers are | ||
|  | included, the wrappers employ the linking conventions of the Fortran | ||
|  | compiler detected by the @code{configure} script.  If this compiler is | ||
|  | GNU @code{g77}, however, then @emph{two} versions of the wrappers are | ||
|  | included: one with @code{g77}'s idiosyncratic convention of appending | ||
|  | two underscores to identifiers, and one with the more common | ||
|  | convention of appending only a single underscore.  This way, the same | ||
|  | FFTW library will work with both @code{g77} and other Fortran | ||
|  | compilers, such as GNU @code{gfortran}.  However, the converse is not | ||
|  | true: if you configure with a different compiler, then the | ||
|  | @code{g77}-compatible wrappers are not included.  By specifying | ||
|  | @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are | ||
|  | included in addition to wrappers for whatever Fortran compiler | ||
|  | @code{configure} finds. | ||
|  | @fpindex g77 | ||
|  | 
 | ||
|  | @item | ||
|  | @code{--with-slow-timer}: Disables the use of hardware cycle counters, | ||
|  | and falls back on @code{gettimeofday} or @code{clock}.  This greatly | ||
|  | worsens performance, and should generally not be used (unless you don't | ||
|  | have a cycle counter but still really want an optimized plan regardless | ||
|  | of the time).  @xref{Cycle Counters}. | ||
|  | 
 | ||
|  | @item | ||
|  | @code{--enable-sse} (single precision), | ||
|  | @code{--enable-sse2} (single, double), | ||
|  | @code{--enable-avx} (single, double), | ||
|  | @code{--enable-avx2} (single, double), | ||
|  | @code{--enable-avx512} (single, double), | ||
|  | @code{--enable-avx-128-fma}, | ||
|  | @code{--enable-kcvi} (single), | ||
|  | @code{--enable-altivec} (single), | ||
|  | @code{--enable-vsx} (single, double), | ||
|  | @code{--enable-neon} (single, double on aarch64), | ||
|  | @code{--enable-generic-simd128}, | ||
|  | and | ||
|  | @code{--enable-generic-simd256}: | ||
|  | 
 | ||
|  | Enable various SIMD instruction sets.  You need compiler that supports | ||
|  | the given SIMD extensions, but FFTW will try to detect at runtime | ||
|  | whether the CPU supports these extensions.  That is, you can compile | ||
|  | with@code{--enable-avx} and the code will still run on a CPU without AVX | ||
|  | support. | ||
|  | 
 | ||
|  | @itemize @minus | ||
|  | @item | ||
|  | These options require a compiler supporting SIMD extensions, and | ||
|  | compiler support is always a bit flaky: see the FFTW FAQ for a list of | ||
|  | compiler versions that have problems compiling FFTW. | ||
|  | @item | ||
|  | Because of the large variety of ARM processors and ABIs, FFTW | ||
|  | does not attempt to guess the correct @code{gcc} flags for generating | ||
|  | NEON code.  In general, you will have to provide them on the command line. | ||
|  | This command line is known to have worked at least once: | ||
|  | @example | ||
|  | ./configure --with-slow-timer --host=arm-linux-gnueabi \ | ||
|  |   --enable-single --enable-neon \ | ||
|  |   "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp" | ||
|  | @end example | ||
|  | @end itemize | ||
|  | 
 | ||
|  | @end itemize | ||
|  | 
 | ||
|  | @cindex compiler | ||
|  | To force @code{configure} to use a particular C compiler @i{foo} | ||
|  | (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the  | ||
|  | @code{configure} script; you may also need to set the flags via the variable | ||
|  | @code{CFLAGS} as described above. | ||
|  | @cindex compiler flags | ||
|  | 
 | ||
|  | @c ------------------------------------------------------------ | ||
|  | @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization | ||
|  | @section Installation on non-Unix systems | ||
|  | 
 | ||
|  | It should be relatively straightforward to compile FFTW even on non-Unix | ||
|  | systems lacking the niceties of a @code{configure} script.  Basically, | ||
|  | you need to edit the @code{config.h} header (copy it from | ||
|  | @code{config.h.in}) to @code{#define} the various options and compiler | ||
|  | characteristics, and then compile all the @samp{.c} files in the | ||
|  | relevant directories.   | ||
|  | 
 | ||
|  | The @code{config.h} header contains about 100 options to set, each one | ||
|  | initially an @code{#undef}, each documented with a comment, and most of | ||
|  | them fairly obvious.  For most of the options, you should simply | ||
|  | @code{#define} them to @code{1} if they are applicable, although a few | ||
|  | options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should | ||
|  | be defined to the size of the @code{long long} type, in bytes, or zero | ||
|  | if it is not supported).  We will likely post some sample | ||
|  | @code{config.h} files for various operating systems and compilers for | ||
|  | you to use (at least as a starting point).  Please let us know if you | ||
|  | have to hand-create a configuration file (and/or a pre-compiled binary) | ||
|  | that you want to share. | ||
|  | 
 | ||
|  | To create the FFTW library, you will then need to compile all of the | ||
|  | @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar}, | ||
|  | @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar}, | ||
|  | @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb}, | ||
|  | @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories. | ||
|  | If you are compiling with SIMD support (e.g. you defined | ||
|  | @code{HAVE_SSE2} in @code{config.h}), then you also need to compile | ||
|  | the @code{.c} files in the @code{simd-support}, | ||
|  | @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories. | ||
|  | 
 | ||
|  | Once these files are all compiled, link them into a library, or a shared | ||
|  | library, or directly into your program. | ||
|  | 
 | ||
|  | To compile the FFTW test program, additionally compile the code in the | ||
|  | @code{libbench2/} directory, and link it into a library.  Then compile | ||
|  | the code in the @code{tests/} directory and link it to the | ||
|  | @code{libbench2} and FFTW libraries.  To compile the @code{fftw-wisdom} | ||
|  | (command-line) tool (@pxref{Wisdom Utilities}), compile | ||
|  | @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW | ||
|  | libraries | ||
|  | 
 | ||
|  | @c ------------------------------------------------------------ | ||
|  | @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization | ||
|  | @section Cycle Counters | ||
|  | @cindex cycle counter | ||
|  | 
 | ||
|  | FFTW's planner actually executes and times different possible FFT | ||
|  | algorithms in order to pick the fastest plan for a given @math{n}.  In | ||
|  | order to do this in as short a time as possible, however, the timer must | ||
|  | have a very high resolution, and to accomplish this we employ the | ||
|  | hardware @dfn{cycle counters} that are available on most CPUs. | ||
|  | Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha, | ||
|  | UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors. | ||
|  | 
 | ||
|  | @cindex compiler | ||
|  | Access to the cycle counters, unfortunately, is a compiler and/or | ||
|  | operating-system dependent task, often requiring inline assembly | ||
|  | language, and it may be that your compiler is not supported.  If you are | ||
|  | @emph{not} supported, FFTW will by default fall back on its estimator | ||
|  | (effectively using @code{FFTW_ESTIMATE} for all plans). | ||
|  | @ctindex FFTW_ESTIMATE | ||
|  | 
 | ||
|  | You can add support by editing the file @code{kernel/cycle.h}; normally, | ||
|  | this will involve adapting one of the examples already present in order | ||
|  | to use the inline-assembler syntax for your C compiler, and will only | ||
|  | require a couple of lines of code.  Anyone adding support for a new | ||
|  | system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}. | ||
|  | 
 | ||
|  | If a cycle counter is not available on your system (e.g. some embedded | ||
|  | processor), and you don't want to use estimated plans, as a last resort | ||
|  | you can use the @code{--with-slow-timer} option to @code{configure} (on | ||
|  | Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere). | ||
|  | This will use the much lower-resolution @code{gettimeofday} function, or even | ||
|  | @code{clock} if the former is unavailable, and planning will be | ||
|  | extremely slow. | ||
|  | 
 | ||
|  | @c ------------------------------------------------------------ | ||
|  | @node Generating your own code,  , Cycle Counters, Installation and Customization | ||
|  | @section Generating your own code | ||
|  | @cindex code generator | ||
|  | 
 | ||
|  | The directory @code{genfft} contains the programs that were used to | ||
|  | generate FFTW's ``codelets,'' which are hard-coded transforms of small | ||
|  | sizes. | ||
|  | @cindex codelet | ||
|  | We do not expect casual users to employ the generator, which is a rather | ||
|  | sophisticated program that generates directed acyclic graphs of FFT | ||
|  | algorithms and performs algebraic simplifications on them.  It was | ||
|  | written in Objective Caml, a dialect of ML, which is available at | ||
|  | @uref{http://caml.inria.fr/ocaml/index.en.html}. | ||
|  | @cindex Caml | ||
|  | 
 | ||
|  | 
 | ||
|  | If you have Objective Caml installed (along with recent versions of | ||
|  | GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you | ||
|  | can change the set of codelets that are generated or play with the | ||
|  | generation options.  The set of generated codelets is specified by the | ||
|  | @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files.  For example, you can add | ||
|  | efficient REDFT codelets of small sizes by modifying | ||
|  | @code{rdft/codelets/r2r/Makefile.am}. | ||
|  | @cindex REDFT | ||
|  | After you modify any @code{Makefile.am} files, you can type @code{sh | ||
|  | bootstrap.sh} in the top-level directory followed by @code{make} to | ||
|  | re-generate the files. | ||
|  | 
 | ||
|  | We do not provide more details about the code-generation process, since | ||
|  | we do not expect that most users will need to generate their own code. | ||
|  | However, feel free to contact us at @email{fftw@@fftw.org} if | ||
|  | you are interested in the subject. | ||
|  | 
 | ||
|  | @cindex monadic programming | ||
|  | You might find it interesting to learn Caml and/or some modern | ||
|  | programming techniques that we used in the generator (including monadic | ||
|  | programming), especially if you heard the rumor that Java and | ||
|  | object-oriented programming are the latest advancement in the field. | ||
|  | The internal operation of the codelet generator is described in the | ||
|  | paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is | ||
|  | available from the @uref{http://www.fftw.org,FFTW home page} and also | ||
|  | appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on | ||
|  | Programming Language Design and Implementation (PLDI)}. | ||
|  | 
 |