362 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			362 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
 | |
| @chapter Installation and Customization
 | |
| @cindex installation
 | |
| 
 | |
| This chapter describes the installation and customization of FFTW, the
 | |
| latest version of which may be downloaded from
 | |
| @uref{http://www.fftw.org, the FFTW home page}.
 | |
| 
 | |
| In principle, FFTW should work on any system with an ANSI C compiler
 | |
| (@code{gcc} is fine).  However, planner time is drastically reduced if
 | |
| FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
 | |
| support for all modern general-purpose CPUs, but you may need to add a
 | |
| couple of lines of code if your compiler is not yet supported
 | |
| (@pxref{Cycle Counters}).  (On Unix, there will be a warning at the end
 | |
| of the @code{configure} output if no cycle counter is found.)
 | |
| @cindex cycle counter
 | |
| @cindex compiler
 | |
| @cindex portability
 | |
| 
 | |
| 
 | |
| Installation of FFTW is simplest if you have a Unix or a GNU system,
 | |
| such as GNU/Linux, and we describe this case in the first section below,
 | |
| including the use of special configuration options to e.g. install
 | |
| different precisions or exploit optimizations for particular
 | |
| architectures (e.g. SIMD).  Compilation on non-Unix systems is a more
 | |
| manual process, but we outline the procedure in the second section.  It
 | |
| is also likely that pre-compiled binaries will be available for popular
 | |
| systems.
 | |
| 
 | |
| Finally, we describe how you can customize FFTW for particular needs by
 | |
| generating @emph{codelets} for fast transforms of sizes not supported
 | |
| efficiently by the standard FFTW distribution.
 | |
| @cindex codelet
 | |
| 
 | |
| @menu
 | |
| * Installation on Unix::
 | |
| * Installation on non-Unix systems::
 | |
| * Cycle Counters::
 | |
| * Generating your own code::
 | |
| @end menu
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| 
 | |
| @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
 | |
| @section Installation on Unix
 | |
| 
 | |
| FFTW comes with a @code{configure} program in the GNU style.
 | |
| Installation can be as simple as:
 | |
| @fpindex configure
 | |
| 
 | |
| @example
 | |
| ./configure
 | |
| make
 | |
| make install
 | |
| @end example
 | |
| 
 | |
| This will build the uniprocessor complex and real transform libraries
 | |
| along with the test programs.  (We recommend that you use GNU
 | |
| @code{make} if it is available; on some systems it is called
 | |
| @code{gmake}.)  The ``@code{make install}'' command installs the fftw
 | |
| and rfftw libraries in standard places, and typically requires root
 | |
| privileges (unless you specify a different install directory with the
 | |
| @code{--prefix} flag to @code{configure}).  You can also type
 | |
| ``@code{make check}'' to put the FFTW test programs through their paces.
 | |
| If you have problems during configuration or compilation, you may want
 | |
| to run ``@code{make distclean}'' before trying again; this ensures that
 | |
| you don't have any stale files left over from previous compilation
 | |
| attempts.
 | |
| 
 | |
| The @code{configure} script chooses the @code{gcc} compiler by default,
 | |
| if it is available; you can select some other compiler with:
 | |
| @example
 | |
| ./configure CC="@r{@i{<the name of your C compiler>}}"
 | |
| @end example
 | |
| 
 | |
| The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
 | |
| @cindex compiler flags
 | |
| for a few systems.  If your system is not known, the @code{configure}
 | |
| script will print out a warning.  In this case, you should re-configure
 | |
| FFTW with the command
 | |
| @example
 | |
| ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
 | |
| @end example
 | |
| and then compile as usual.  If you do find an optimal set of
 | |
| @code{CFLAGS} for your system, please let us know what they are (along
 | |
| with the output of @code{config.guess}) so that we can include them in
 | |
| future releases.
 | |
| 
 | |
| @code{configure} supports all the standard flags defined by the GNU
 | |
| Coding Standards; see the @code{INSTALL} file in FFTW or
 | |
| @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
 | |
| Note especially @code{--help} to list all flags and
 | |
| @code{--enable-shared} to create shared, rather than static, libraries.
 | |
| @code{configure} also accepts a few FFTW-specific flags, particularly:
 | |
| 
 | |
| @itemize @bullet
 | |
| 
 | |
| @item
 | |
| @cindex precision
 | |
| @code{--enable-float}: Produces a single-precision version of FFTW
 | |
| (@code{float}) instead of the default double-precision (@code{double}).
 | |
| @xref{Precision}.
 | |
| 
 | |
| @item
 | |
| @cindex precision
 | |
| @code{--enable-long-double}: Produces a long-double precision version of
 | |
| FFTW (@code{long double}) instead of the default double-precision
 | |
| (@code{double}).  The @code{configure} script will halt with an error
 | |
| message if @code{long double} is the same size as @code{double} on your
 | |
| machine/compiler.  @xref{Precision}.
 | |
| 
 | |
| @item
 | |
| @cindex precision
 | |
| @code{--enable-quad-precision}: Produces a quadruple-precision version
 | |
| of FFTW using the nonstandard @code{__float128} type provided by
 | |
| @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
 | |
| instead of the default double-precision (@code{double}).  The
 | |
| @code{configure} script will halt with an error message if the
 | |
| compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
 | |
| @code{libquadmath} library is not installed.  @xref{Precision}.
 | |
| 
 | |
| @item
 | |
| @cindex threads
 | |
| @code{--enable-threads}: Enables compilation and installation of the
 | |
| FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
 | |
| simple interface to parallel transforms for SMP systems.  By default,
 | |
| the threads routines are not compiled.
 | |
| 
 | |
| @item
 | |
| @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
 | |
| compiler directives in order to induce parallelism rather than
 | |
| spawning its own threads directly, and installing an @samp{fftw3_omp} library
 | |
| rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded           
 | |
| FFTW}).  You can use both @code{--enable-openmp} and @code{--enable-threads}
 | |
| since they compile/install libraries with different names.  By default,
 | |
| the OpenMP routines are not compiled.
 | |
| 
 | |
| @item
 | |
| @code{--with-combined-threads}: By default, if @code{--enable-threads}
 | |
| is used, the threads support is compiled into a separate library that
 | |
| must be linked in addition to the main FFTW library.  This is so that
 | |
| users of the serial library do not need to link the system threads
 | |
| libraries.  If @code{--with-combined-threads} is specified, however,
 | |
| then no separate threads library is created, and threads are included
 | |
| in the main FFTW library.  This is mainly useful under Windows, where
 | |
| no system threads library is required and inter-library dependencies
 | |
| are problematic.
 | |
| 
 | |
| @item
 | |
| @cindex MPI
 | |
| @code{--enable-mpi}: Enables compilation and installation of the FFTW
 | |
| MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
 | |
| parallel transforms for distributed-memory systems with MPI.  (By
 | |
| default, the MPI routines are not compiled.)  @xref{FFTW MPI
 | |
| Installation}.
 | |
| 
 | |
| @item
 | |
| @cindex Fortran-callable wrappers
 | |
| @code{--disable-fortran}: Disables inclusion of legacy-Fortran
 | |
| wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
 | |
| FFTW libraries.  These wrapper routines increase the library size by
 | |
| only a negligible amount, so they are included by default as long as
 | |
| the @code{configure} script finds a Fortran compiler on your system.
 | |
| (To specify a particular Fortran compiler @i{foo}, pass
 | |
| @code{F77=}@i{foo} to @code{configure}.)
 | |
| 
 | |
| @item
 | |
| @code{--with-g77-wrappers}: By default, when Fortran wrappers are
 | |
| included, the wrappers employ the linking conventions of the Fortran
 | |
| compiler detected by the @code{configure} script.  If this compiler is
 | |
| GNU @code{g77}, however, then @emph{two} versions of the wrappers are
 | |
| included: one with @code{g77}'s idiosyncratic convention of appending
 | |
| two underscores to identifiers, and one with the more common
 | |
| convention of appending only a single underscore.  This way, the same
 | |
| FFTW library will work with both @code{g77} and other Fortran
 | |
| compilers, such as GNU @code{gfortran}.  However, the converse is not
 | |
| true: if you configure with a different compiler, then the
 | |
| @code{g77}-compatible wrappers are not included.  By specifying
 | |
| @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
 | |
| included in addition to wrappers for whatever Fortran compiler
 | |
| @code{configure} finds.
 | |
| @fpindex g77
 | |
| 
 | |
| @item
 | |
| @code{--with-slow-timer}: Disables the use of hardware cycle counters,
 | |
| and falls back on @code{gettimeofday} or @code{clock}.  This greatly
 | |
| worsens performance, and should generally not be used (unless you don't
 | |
| have a cycle counter but still really want an optimized plan regardless
 | |
| of the time).  @xref{Cycle Counters}.
 | |
| 
 | |
| @item
 | |
| @code{--enable-sse} (single precision),
 | |
| @code{--enable-sse2} (single, double),
 | |
| @code{--enable-avx} (single, double),
 | |
| @code{--enable-avx2} (single, double),
 | |
| @code{--enable-avx512} (single, double),
 | |
| @code{--enable-avx-128-fma},
 | |
| @code{--enable-kcvi} (single),
 | |
| @code{--enable-altivec} (single),
 | |
| @code{--enable-vsx} (single, double),
 | |
| @code{--enable-neon} (single, double on aarch64),
 | |
| @code{--enable-generic-simd128},
 | |
| and
 | |
| @code{--enable-generic-simd256}:
 | |
| 
 | |
| Enable various SIMD instruction sets.  You need compiler that supports
 | |
| the given SIMD extensions, but FFTW will try to detect at runtime
 | |
| whether the CPU supports these extensions.  That is, you can compile
 | |
| with@code{--enable-avx} and the code will still run on a CPU without AVX
 | |
| support.
 | |
| 
 | |
| @itemize @minus
 | |
| @item
 | |
| These options require a compiler supporting SIMD extensions, and
 | |
| compiler support is always a bit flaky: see the FFTW FAQ for a list of
 | |
| compiler versions that have problems compiling FFTW.
 | |
| @item
 | |
| Because of the large variety of ARM processors and ABIs, FFTW
 | |
| does not attempt to guess the correct @code{gcc} flags for generating
 | |
| NEON code.  In general, you will have to provide them on the command line.
 | |
| This command line is known to have worked at least once:
 | |
| @example
 | |
| ./configure --with-slow-timer --host=arm-linux-gnueabi \
 | |
|   --enable-single --enable-neon \
 | |
|   "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
 | |
| @end example
 | |
| @end itemize
 | |
| 
 | |
| @end itemize
 | |
| 
 | |
| @cindex compiler
 | |
| To force @code{configure} to use a particular C compiler @i{foo}
 | |
| (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the 
 | |
| @code{configure} script; you may also need to set the flags via the variable
 | |
| @code{CFLAGS} as described above.
 | |
| @cindex compiler flags
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
 | |
| @section Installation on non-Unix systems
 | |
| 
 | |
| It should be relatively straightforward to compile FFTW even on non-Unix
 | |
| systems lacking the niceties of a @code{configure} script.  Basically,
 | |
| you need to edit the @code{config.h} header (copy it from
 | |
| @code{config.h.in}) to @code{#define} the various options and compiler
 | |
| characteristics, and then compile all the @samp{.c} files in the
 | |
| relevant directories.  
 | |
| 
 | |
| The @code{config.h} header contains about 100 options to set, each one
 | |
| initially an @code{#undef}, each documented with a comment, and most of
 | |
| them fairly obvious.  For most of the options, you should simply
 | |
| @code{#define} them to @code{1} if they are applicable, although a few
 | |
| options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
 | |
| be defined to the size of the @code{long long} type, in bytes, or zero
 | |
| if it is not supported).  We will likely post some sample
 | |
| @code{config.h} files for various operating systems and compilers for
 | |
| you to use (at least as a starting point).  Please let us know if you
 | |
| have to hand-create a configuration file (and/or a pre-compiled binary)
 | |
| that you want to share.
 | |
| 
 | |
| To create the FFTW library, you will then need to compile all of the
 | |
| @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
 | |
| @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
 | |
| @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
 | |
| @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
 | |
| If you are compiling with SIMD support (e.g. you defined
 | |
| @code{HAVE_SSE2} in @code{config.h}), then you also need to compile
 | |
| the @code{.c} files in the @code{simd-support},
 | |
| @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
 | |
| 
 | |
| Once these files are all compiled, link them into a library, or a shared
 | |
| library, or directly into your program.
 | |
| 
 | |
| To compile the FFTW test program, additionally compile the code in the
 | |
| @code{libbench2/} directory, and link it into a library.  Then compile
 | |
| the code in the @code{tests/} directory and link it to the
 | |
| @code{libbench2} and FFTW libraries.  To compile the @code{fftw-wisdom}
 | |
| (command-line) tool (@pxref{Wisdom Utilities}), compile
 | |
| @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
 | |
| libraries
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
 | |
| @section Cycle Counters
 | |
| @cindex cycle counter
 | |
| 
 | |
| FFTW's planner actually executes and times different possible FFT
 | |
| algorithms in order to pick the fastest plan for a given @math{n}.  In
 | |
| order to do this in as short a time as possible, however, the timer must
 | |
| have a very high resolution, and to accomplish this we employ the
 | |
| hardware @dfn{cycle counters} that are available on most CPUs.
 | |
| Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
 | |
| UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
 | |
| 
 | |
| @cindex compiler
 | |
| Access to the cycle counters, unfortunately, is a compiler and/or
 | |
| operating-system dependent task, often requiring inline assembly
 | |
| language, and it may be that your compiler is not supported.  If you are
 | |
| @emph{not} supported, FFTW will by default fall back on its estimator
 | |
| (effectively using @code{FFTW_ESTIMATE} for all plans).
 | |
| @ctindex FFTW_ESTIMATE
 | |
| 
 | |
| You can add support by editing the file @code{kernel/cycle.h}; normally,
 | |
| this will involve adapting one of the examples already present in order
 | |
| to use the inline-assembler syntax for your C compiler, and will only
 | |
| require a couple of lines of code.  Anyone adding support for a new
 | |
| system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
 | |
| 
 | |
| If a cycle counter is not available on your system (e.g. some embedded
 | |
| processor), and you don't want to use estimated plans, as a last resort
 | |
| you can use the @code{--with-slow-timer} option to @code{configure} (on
 | |
| Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
 | |
| This will use the much lower-resolution @code{gettimeofday} function, or even
 | |
| @code{clock} if the former is unavailable, and planning will be
 | |
| extremely slow.
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node Generating your own code,  , Cycle Counters, Installation and Customization
 | |
| @section Generating your own code
 | |
| @cindex code generator
 | |
| 
 | |
| The directory @code{genfft} contains the programs that were used to
 | |
| generate FFTW's ``codelets,'' which are hard-coded transforms of small
 | |
| sizes.
 | |
| @cindex codelet
 | |
| We do not expect casual users to employ the generator, which is a rather
 | |
| sophisticated program that generates directed acyclic graphs of FFT
 | |
| algorithms and performs algebraic simplifications on them.  It was
 | |
| written in Objective Caml, a dialect of ML, which is available at
 | |
| @uref{http://caml.inria.fr/ocaml/index.en.html}.
 | |
| @cindex Caml
 | |
| 
 | |
| 
 | |
| If you have Objective Caml installed (along with recent versions of
 | |
| GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
 | |
| can change the set of codelets that are generated or play with the
 | |
| generation options.  The set of generated codelets is specified by the
 | |
| @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files.  For example, you can add
 | |
| efficient REDFT codelets of small sizes by modifying
 | |
| @code{rdft/codelets/r2r/Makefile.am}.
 | |
| @cindex REDFT
 | |
| After you modify any @code{Makefile.am} files, you can type @code{sh
 | |
| bootstrap.sh} in the top-level directory followed by @code{make} to
 | |
| re-generate the files.
 | |
| 
 | |
| We do not provide more details about the code-generation process, since
 | |
| we do not expect that most users will need to generate their own code.
 | |
| However, feel free to contact us at @email{fftw@@fftw.org} if
 | |
| you are interested in the subject.
 | |
| 
 | |
| @cindex monadic programming
 | |
| You might find it interesting to learn Caml and/or some modern
 | |
| programming techniques that we used in the generator (including monadic
 | |
| programming), especially if you heard the rumor that Java and
 | |
| object-oriented programming are the latest advancement in the field.
 | |
| The internal operation of the codelet generator is described in the
 | |
| paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
 | |
| available from the @uref{http://www.fftw.org,FFTW home page} and also
 | |
| appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
 | |
| Programming Language Design and Implementation (PLDI)}.
 | |
| 
 | 
