6305 lines
		
	
	
		
			293 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			6305 lines
		
	
	
		
			293 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| This is fftw3.info, produced by makeinfo version 6.7 from fftw3.texi.
 | ||
| 
 | ||
| This manual is for FFTW (version 3.3.10, 10 December 2020).
 | ||
| 
 | ||
|    Copyright (C) 2003 Matteo Frigo.
 | ||
| 
 | ||
|    Copyright (C) 2003 Massachusetts Institute of Technology.
 | ||
| 
 | ||
|      Permission is granted to make and distribute verbatim copies of
 | ||
|      this manual provided the copyright notice and this permission
 | ||
|      notice are preserved on all copies.
 | ||
| 
 | ||
|      Permission is granted to copy and distribute modified versions of
 | ||
|      this manual under the conditions for verbatim copying, provided
 | ||
|      that the entire resulting derived work is distributed under the
 | ||
|      terms of a permission notice identical to this one.
 | ||
| 
 | ||
|      Permission is granted to copy and distribute translations of this
 | ||
|      manual into another language, under the above conditions for
 | ||
|      modified versions, except that this permission notice may be stated
 | ||
|      in a translation approved by the Free Software Foundation.
 | ||
| INFO-DIR-SECTION Development
 | ||
| START-INFO-DIR-ENTRY
 | ||
| * fftw3: (fftw3).	FFTW User's Manual.
 | ||
| END-INFO-DIR-ENTRY
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Top,  Next: Introduction,  Prev: (dir),  Up: (dir)
 | ||
| 
 | ||
| FFTW User Manual
 | ||
| ****************
 | ||
| 
 | ||
| Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is a
 | ||
| collection of fast C routines to compute the discrete Fourier transform.
 | ||
| This manual documents FFTW version 3.3.10.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Introduction::
 | ||
| * Tutorial::
 | ||
| * Other Important Topics::
 | ||
| * FFTW Reference::
 | ||
| * Multi-threaded FFTW::
 | ||
| * Distributed-memory FFTW with MPI::
 | ||
| * Calling FFTW from Modern Fortran::
 | ||
| * Calling FFTW from Legacy Fortran::
 | ||
| * Upgrading from FFTW version 2::
 | ||
| * Installation and Customization::
 | ||
| * Acknowledgments::
 | ||
| * License and Copyright::
 | ||
| * Concept Index::
 | ||
| * Library Index::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Introduction,  Next: Tutorial,  Prev: Top,  Up: Top
 | ||
| 
 | ||
| 1 Introduction
 | ||
| **************
 | ||
| 
 | ||
| This manual documents version 3.3.10 of FFTW, the _Fastest Fourier
 | ||
| Transform in the West_.  FFTW is a comprehensive collection of fast C
 | ||
| routines for computing the discrete Fourier transform (DFT) and various
 | ||
| special cases thereof.
 | ||
|    * FFTW computes the DFT of complex data, real data, even- or
 | ||
|      odd-symmetric real data (these symmetric transforms are usually
 | ||
|      known as the discrete cosine or sine transform, respectively), and
 | ||
|      the discrete Hartley transform (DHT) of real data.
 | ||
| 
 | ||
|    * The input data can have arbitrary length.  FFTW employs O(n log n)
 | ||
|      algorithms for all lengths, including prime numbers.
 | ||
| 
 | ||
|    * FFTW supports arbitrary multi-dimensional data.
 | ||
| 
 | ||
|    * FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX,
 | ||
|      and NEON vector instruction sets.
 | ||
| 
 | ||
|    * FFTW includes parallel (multi-threaded) transforms for
 | ||
|      shared-memory systems.
 | ||
|    * Starting with version 3.3, FFTW includes distributed-memory
 | ||
|      parallel transforms using MPI.
 | ||
| 
 | ||
|    We assume herein that you are familiar with the properties and uses
 | ||
| of the DFT that are relevant to your application.  Otherwise, see e.g.
 | ||
| 'The Fast Fourier Transform and Its Applications' by E. O. Brigham
 | ||
| (Prentice-Hall, Englewood Cliffs, NJ, 1988).  Our web page
 | ||
| (http://www.fftw.org) also has links to FFT-related information online.
 | ||
| 
 | ||
|    In order to use FFTW effectively, you need to learn one basic concept
 | ||
| of FFTW's internal structure: FFTW does not use a fixed algorithm for
 | ||
| computing the transform, but instead it adapts the DFT algorithm to
 | ||
| details of the underlying hardware in order to maximize performance.
 | ||
| Hence, the computation of the transform is split into two phases.
 | ||
| First, FFTW's "planner" "learns" the fastest way to compute the
 | ||
| transform on your machine.  The planner produces a data structure called
 | ||
| a "plan" that contains this information.  Subsequently, the plan is
 | ||
| "executed" to transform the array of input data as dictated by the plan.
 | ||
| The plan can be reused as many times as needed.  In typical
 | ||
| high-performance applications, many transforms of the same size are
 | ||
| computed and, consequently, a relatively expensive initialization of
 | ||
| this sort is acceptable.  On the other hand, if you need a single
 | ||
| transform of a given size, the one-time cost of the planner becomes
 | ||
| significant.  For this case, FFTW provides fast planners based on
 | ||
| heuristics or on previously computed plans.
 | ||
| 
 | ||
|    FFTW supports transforms of data with arbitrary length, rank,
 | ||
| multiplicity, and a general memory layout.  In simple cases, however,
 | ||
| this generality may be unnecessary and confusing.  Consequently, we
 | ||
| organized the interface to FFTW into three levels of increasing
 | ||
| generality.
 | ||
|    * The "basic interface" computes a single transform of contiguous
 | ||
|      data.
 | ||
|    * The "advanced interface" computes transforms of multiple or strided
 | ||
|      arrays.
 | ||
|    * The "guru interface" supports the most general data layouts,
 | ||
|      multiplicities, and strides.
 | ||
|    We expect that most users will be best served by the basic interface,
 | ||
| whereas the guru interface requires careful attention to the
 | ||
| documentation to avoid problems.
 | ||
| 
 | ||
|    Besides the automatic performance adaptation performed by the
 | ||
| planner, it is also possible for advanced users to customize FFTW
 | ||
| manually.  For example, if code space is a concern, we provide a tool
 | ||
| that links only the subset of FFTW needed by your application.
 | ||
| Conversely, you may need to extend FFTW because the standard
 | ||
| distribution is not sufficient for your needs.  For example, the
 | ||
| standard FFTW distribution works most efficiently for arrays whose size
 | ||
| can be factored into small primes (2, 3, 5, and 7), and otherwise it
 | ||
| uses a slower general-purpose routine.  If you need efficient transforms
 | ||
| of other sizes, you can use FFTW's code generator, which produces fast C
 | ||
| programs ("codelets") for any particular array size you may care about.
 | ||
| For example, if you need transforms of size 513 = 19 x 3^3, you can
 | ||
| customize FFTW to support the factor 19 efficiently.
 | ||
| 
 | ||
|    For more information regarding FFTW, see the paper, "The Design and
 | ||
| Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an
 | ||
| invited paper in 'Proc. IEEE' 93 (2), p.  216 (2005).  The code
 | ||
| generator is described in the paper "A fast Fourier transform compiler",
 | ||
| by M. Frigo, in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
 | ||
| Programming Language Design and Implementation (PLDI), Atlanta, Georgia,
 | ||
| May 1999'.  These papers, along with the latest version of FFTW, the
 | ||
| FAQ, benchmarks, and other links, are available at the FFTW home page
 | ||
| (http://www.fftw.org).
 | ||
| 
 | ||
|    The current version of FFTW incorporates many good ideas from the
 | ||
| past thirty years of FFT literature.  In one way or another, FFTW uses
 | ||
| the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
 | ||
| algorithm for prime sizes, and a split-radix algorithm (with a
 | ||
| "conjugate-pair" variation pointed out to us by Dan Bernstein).  FFTW's
 | ||
| code generator also produces new algorithms that we do not completely
 | ||
| understand.  The reader is referred to the cited papers for the
 | ||
| appropriate references.
 | ||
| 
 | ||
|    The rest of this manual is organized as follows.  We first discuss
 | ||
| the sequential (single-processor) implementation.  We start by
 | ||
| describing the basic interface/features of FFTW in *note Tutorial::.
 | ||
| Next, *note Other Important Topics:: discusses data alignment (*note
 | ||
| SIMD alignment and fftw_malloc::), the storage scheme of
 | ||
| multi-dimensional arrays (*note Multi-dimensional Array Format::), and
 | ||
| FFTW's mechanism for storing plans on disk (*note Words of Wisdom-Saving
 | ||
| Plans::).  Next, *note FFTW Reference:: provides comprehensive
 | ||
| documentation of all FFTW's features.  Parallel transforms are discussed
 | ||
| in their own chapters: *note Multi-threaded FFTW:: and *note
 | ||
| Distributed-memory FFTW with MPI::.  Fortran programmers can also use
 | ||
| FFTW, as described in *note Calling FFTW from Legacy Fortran:: and *note
 | ||
| Calling FFTW from Modern Fortran::.  *note Installation and
 | ||
| Customization:: explains how to install FFTW in your computer system and
 | ||
| how to adapt FFTW to your needs.  License and copyright information is
 | ||
| given in *note License and Copyright::.  Finally, we thank all the
 | ||
| people who helped us in *note Acknowledgments::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Tutorial,  Next: Other Important Topics,  Prev: Introduction,  Up: Top
 | ||
| 
 | ||
| 2 Tutorial
 | ||
| **********
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Complex One-Dimensional DFTs::
 | ||
| * Complex Multi-Dimensional DFTs::
 | ||
| * One-Dimensional DFTs of Real Data::
 | ||
| * Multi-Dimensional DFTs of Real Data::
 | ||
| * More DFTs of Real Data::
 | ||
| 
 | ||
| This chapter describes the basic usage of FFTW, i.e., how to compute the
 | ||
| Fourier transform of a single array.  This chapter tells the truth, but
 | ||
| not the _whole_ truth.  Specifically, FFTW implements additional
 | ||
| routines and flags that are not documented here, although in many cases
 | ||
| we try to indicate where added capabilities exist.  For more complete
 | ||
| information, see *note FFTW Reference::.  (Note that you need to compile
 | ||
| and install FFTW before you can use it in a program.  For the details of
 | ||
| the installation, see *note Installation and Customization::.)
 | ||
| 
 | ||
|    We recommend that you read this tutorial in order.(1)  At the least,
 | ||
| read the first section (*note Complex One-Dimensional DFTs::) before
 | ||
| reading any of the others, even if your main interest lies in one of the
 | ||
| other transform types.
 | ||
| 
 | ||
|    Users of FFTW version 2 and earlier may also want to read *note
 | ||
| Upgrading from FFTW version 2::.
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) You can read the tutorial in bit-reversed order after computing
 | ||
| your first transform.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Complex One-Dimensional DFTs,  Next: Complex Multi-Dimensional DFTs,  Prev: Tutorial,  Up: Tutorial
 | ||
| 
 | ||
| 2.1 Complex One-Dimensional DFTs
 | ||
| ================================
 | ||
| 
 | ||
|      Plan: To bother about the best method of accomplishing an
 | ||
|      accidental result.  [Ambrose Bierce, 'The Enlarged Devil's
 | ||
|      Dictionary'.]
 | ||
| 
 | ||
|    The basic usage of FFTW to compute a one-dimensional DFT of size 'N'
 | ||
| is simple, and it typically looks something like this code:
 | ||
| 
 | ||
|      #include <fftw3.h>
 | ||
|      ...
 | ||
|      {
 | ||
|          fftw_complex *in, *out;
 | ||
|          fftw_plan p;
 | ||
|          ...
 | ||
|          in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
 | ||
|          out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
 | ||
|          p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
 | ||
|          ...
 | ||
|          fftw_execute(p); /* repeat as needed */
 | ||
|          ...
 | ||
|          fftw_destroy_plan(p);
 | ||
|          fftw_free(in); fftw_free(out);
 | ||
|      }
 | ||
| 
 | ||
|    You must link this code with the 'fftw3' library.  On Unix systems,
 | ||
| link with '-lfftw3 -lm'.
 | ||
| 
 | ||
|    The example code first allocates the input and output arrays.  You
 | ||
| can allocate them in any way that you like, but we recommend using
 | ||
| 'fftw_malloc', which behaves like 'malloc' except that it properly
 | ||
| aligns the array when SIMD instructions (such as SSE and Altivec) are
 | ||
| available (*note SIMD alignment and fftw_malloc::).  [Alternatively, we
 | ||
| provide a convenient wrapper function 'fftw_alloc_complex(N)' which has
 | ||
| the same effect.]
 | ||
| 
 | ||
|    The data is an array of type 'fftw_complex', which is by default a
 | ||
| 'double[2]' composed of the real ('in[i][0]') and imaginary ('in[i][1]')
 | ||
| parts of a complex number.
 | ||
| 
 | ||
|    The next step is to create a "plan", which is an object that contains
 | ||
| all the data that FFTW needs to compute the FFT. This function creates
 | ||
| the plan:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
| 
 | ||
|    The first argument, 'n', is the size of the transform you are trying
 | ||
| to compute.  The size 'n' can be any positive integer, but sizes that
 | ||
| are products of small factors are transformed most efficiently (although
 | ||
| prime sizes still use an O(n log n) algorithm).
 | ||
| 
 | ||
|    The next two arguments are pointers to the input and output arrays of
 | ||
| the transform.  These pointers can be equal, indicating an "in-place"
 | ||
| transform.
 | ||
| 
 | ||
|    The fourth argument, 'sign', can be either 'FFTW_FORWARD' ('-1') or
 | ||
| 'FFTW_BACKWARD' ('+1'), and indicates the direction of the transform you
 | ||
| are interested in; technically, it is the sign of the exponent in the
 | ||
| transform.
 | ||
| 
 | ||
|    The 'flags' argument is usually either 'FFTW_MEASURE' or
 | ||
| 'FFTW_ESTIMATE'.  'FFTW_MEASURE' instructs FFTW to run and measure the
 | ||
| execution time of several FFTs in order to find the best way to compute
 | ||
| the transform of size 'n'.  This process takes some time (usually a few
 | ||
| seconds), depending on your machine and on the size of the transform.
 | ||
| 'FFTW_ESTIMATE', on the contrary, does not run any computation and just
 | ||
| builds a reasonable plan that is probably sub-optimal.  In short, if
 | ||
| your program performs many transforms of the same size and
 | ||
| initialization time is not important, use 'FFTW_MEASURE'; otherwise use
 | ||
| the estimate.
 | ||
| 
 | ||
|    _You must create the plan before initializing the input_, because
 | ||
| 'FFTW_MEASURE' overwrites the 'in'/'out' arrays.  (Technically,
 | ||
| 'FFTW_ESTIMATE' does not touch your arrays, but you should always create
 | ||
| plans first just to be sure.)
 | ||
| 
 | ||
|    Once the plan has been created, you can use it as many times as you
 | ||
| like for transforms on the specified 'in'/'out' arrays, computing the
 | ||
| actual transforms via 'fftw_execute(plan)':
 | ||
|      void fftw_execute(const fftw_plan plan);
 | ||
| 
 | ||
|    The DFT results are stored in-order in the array 'out', with the
 | ||
| zero-frequency (DC) component in 'out[0]'.  If 'in != out', the
 | ||
| transform is "out-of-place" and the input array 'in' is not modified.
 | ||
| Otherwise, the input array is overwritten with the transform.
 | ||
| 
 | ||
|    If you want to transform a _different_ array of the same size, you
 | ||
| can create a new plan with 'fftw_plan_dft_1d' and FFTW automatically
 | ||
| reuses the information from the previous plan, if possible.
 | ||
| Alternatively, with the "guru" interface you can apply a given plan to a
 | ||
| different array, if you are careful.  *Note FFTW Reference::.
 | ||
| 
 | ||
|    When you are done with the plan, you deallocate it by calling
 | ||
| 'fftw_destroy_plan(plan)':
 | ||
|      void fftw_destroy_plan(fftw_plan plan);
 | ||
|    If you allocate an array with 'fftw_malloc()' you must deallocate it
 | ||
| with 'fftw_free()'.  Do not use 'free()' or, heaven forbid, 'delete'.
 | ||
| 
 | ||
|    FFTW computes an _unnormalized_ DFT. Thus, computing a forward
 | ||
| followed by a backward transform (or vice versa) results in the original
 | ||
| array scaled by 'n'.  For the definition of the DFT, see *note What FFTW
 | ||
| Really Computes::.
 | ||
| 
 | ||
|    If you have a C compiler, such as 'gcc', that supports the C99
 | ||
| standard, and you '#include <complex.h>' _before_ '<fftw3.h>', then
 | ||
| 'fftw_complex' is the native double-precision complex type and you can
 | ||
| manipulate it with ordinary arithmetic.  Otherwise, FFTW defines its own
 | ||
| complex type, which is bit-compatible with the C99 complex type.  *Note
 | ||
| Complex numbers::.  (The C++ '<complex>' template class may also be
 | ||
| usable via a typecast.)
 | ||
| 
 | ||
|    To use single or long-double precision versions of FFTW, replace the
 | ||
| 'fftw_' prefix by 'fftwf_' or 'fftwl_' and link with '-lfftw3f' or
 | ||
| '-lfftw3l', but use the _same_ '<fftw3.h>' header file.
 | ||
| 
 | ||
|    Many more flags exist besides 'FFTW_MEASURE' and 'FFTW_ESTIMATE'.
 | ||
| For example, use 'FFTW_PATIENT' if you're willing to wait even longer
 | ||
| for a possibly even faster plan (*note FFTW Reference::).  You can also
 | ||
| save plans for future use, as described by *note Words of Wisdom-Saving
 | ||
| Plans::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Complex Multi-Dimensional DFTs,  Next: One-Dimensional DFTs of Real Data,  Prev: Complex One-Dimensional DFTs,  Up: Tutorial
 | ||
| 
 | ||
| 2.2 Complex Multi-Dimensional DFTs
 | ||
| ==================================
 | ||
| 
 | ||
| Multi-dimensional transforms work much the same way as one-dimensional
 | ||
| transforms: you allocate arrays of 'fftw_complex' (preferably using
 | ||
| 'fftw_malloc'), create an 'fftw_plan', execute it as many times as you
 | ||
| want with 'fftw_execute(plan)', and clean up with
 | ||
| 'fftw_destroy_plan(plan)' (and 'fftw_free').
 | ||
| 
 | ||
|    FFTW provides two routines for creating plans for 2d and 3d
 | ||
| transforms, and one routine for creating plans of arbitrary
 | ||
| dimensionality.  The 2d and 3d routines have the following signature:
 | ||
|      fftw_plan fftw_plan_dft_2d(int n0, int n1,
 | ||
|                                 fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
 | ||
|                                 fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
| 
 | ||
|    These routines create plans for 'n0' by 'n1' two-dimensional (2d)
 | ||
| transforms and 'n0' by 'n1' by 'n2' 3d transforms, respectively.  All of
 | ||
| these transforms operate on contiguous arrays in the C-standard
 | ||
| "row-major" order, so that the last dimension has the fastest-varying
 | ||
| index in the array.  This layout is described further in *note
 | ||
| Multi-dimensional Array Format::.
 | ||
| 
 | ||
|    FFTW can also compute transforms of higher dimensionality.  In order
 | ||
| to avoid confusion between the various meanings of the the word
 | ||
| "dimension", we use the term _rank_ to denote the number of independent
 | ||
| indices in an array.(1)  For example, we say that a 2d transform has
 | ||
| rank 2, a 3d transform has rank 3, and so on.  You can plan transforms
 | ||
| of arbitrary rank by means of the following function:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft(int rank, const int *n,
 | ||
|                              fftw_complex *in, fftw_complex *out,
 | ||
|                              int sign, unsigned flags);
 | ||
| 
 | ||
|    Here, 'n' is a pointer to an array 'n[rank]' denoting an 'n[0]' by
 | ||
| 'n[1]' by ... by 'n[rank-1]' transform.  Thus, for example, the call
 | ||
|      fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
 | ||
|    is equivalent to the following code fragment:
 | ||
|      int n[2];
 | ||
|      n[0] = n0;
 | ||
|      n[1] = n1;
 | ||
|      fftw_plan_dft(2, n, in, out, sign, flags);
 | ||
|    'fftw_plan_dft' is not restricted to 2d and 3d transforms, however,
 | ||
| but it can plan transforms of arbitrary rank.
 | ||
| 
 | ||
|    You may have noticed that all the planner routines described so far
 | ||
| have overlapping functionality.  For example, you can plan a 1d or 2d
 | ||
| transform by using 'fftw_plan_dft' with a 'rank' of '1' or '2', or even
 | ||
| by calling 'fftw_plan_dft_3d' with 'n0' and/or 'n1' equal to '1' (with
 | ||
| no loss in efficiency).  This pattern continues, and FFTW's planning
 | ||
| routines in general form a "partial order," sequences of interfaces with
 | ||
| strictly increasing generality but correspondingly greater complexity.
 | ||
| 
 | ||
|    'fftw_plan_dft' is the most general complex-DFT routine that we
 | ||
| describe in this tutorial, but there are also the advanced and guru
 | ||
| interfaces, which allow one to efficiently combine multiple/strided
 | ||
| transforms into a single FFTW plan, transform a subset of a larger
 | ||
| multi-dimensional array, and/or to handle more general complex-number
 | ||
| formats.  For more information, see *note FFTW Reference::.
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) The term "rank" is commonly used in the APL, FORTRAN, and Common
 | ||
| Lisp traditions, although it is not so common in the C world.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: One-Dimensional DFTs of Real Data,  Next: Multi-Dimensional DFTs of Real Data,  Prev: Complex Multi-Dimensional DFTs,  Up: Tutorial
 | ||
| 
 | ||
| 2.3 One-Dimensional DFTs of Real Data
 | ||
| =====================================
 | ||
| 
 | ||
| In many practical applications, the input data 'in[i]' are purely real
 | ||
| numbers, in which case the DFT output satisfies the "Hermitian"
 | ||
| redundancy: 'out[i]' is the conjugate of 'out[n-i]'.  It is possible to
 | ||
| take advantage of these circumstances in order to achieve roughly a
 | ||
| factor of two improvement in both speed and memory usage.
 | ||
| 
 | ||
|    In exchange for these speed and space advantages, the user sacrifices
 | ||
| some of the simplicity of FFTW's complex transforms.  First of all, the
 | ||
| input and output arrays are of _different sizes and types_: the input is
 | ||
| 'n' real numbers, while the output is 'n/2+1' complex numbers (the
 | ||
| non-redundant outputs); this also requires slight "padding" of the input
 | ||
| array for in-place transforms.  Second, the inverse transform (complex
 | ||
| to real) has the side-effect of _overwriting its input array_, by
 | ||
| default.  Neither of these inconveniences should pose a serious problem
 | ||
| for users, but it is important to be aware of them.
 | ||
| 
 | ||
|    The routines to perform real-data transforms are almost the same as
 | ||
| those for complex transforms: you allocate arrays of 'double' and/or
 | ||
| 'fftw_complex' (preferably using 'fftw_malloc' or 'fftw_alloc_complex'),
 | ||
| create an 'fftw_plan', execute it as many times as you want with
 | ||
| 'fftw_execute(plan)', and clean up with 'fftw_destroy_plan(plan)' (and
 | ||
| 'fftw_free').  The only differences are that the input (or output) is of
 | ||
| type 'double' and there are new routines to create the plan.  In one
 | ||
| dimension:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
 | ||
|                                     unsigned flags);
 | ||
| 
 | ||
|    for the real input to complex-Hermitian output ("r2c") and
 | ||
| complex-Hermitian input to real output ("c2r") transforms.  Unlike the
 | ||
| complex DFT planner, there is no 'sign' argument.  Instead, r2c DFTs are
 | ||
| always 'FFTW_FORWARD' and c2r DFTs are always 'FFTW_BACKWARD'.  (For
 | ||
| single/long-double precision 'fftwf' and 'fftwl', 'double' should be
 | ||
| replaced by 'float' and 'long double', respectively.)
 | ||
| 
 | ||
|    Here, 'n' is the "logical" size of the DFT, not necessarily the
 | ||
| physical size of the array.  In particular, the real ('double') array
 | ||
| has 'n' elements, while the complex ('fftw_complex') array has 'n/2+1'
 | ||
| elements (where the division is rounded down).  For an in-place
 | ||
| transform, 'in' and 'out' are aliased to the same array, which must be
 | ||
| big enough to hold both; so, the real array would actually have
 | ||
| '2*(n/2+1)' elements, where the elements beyond the first 'n' are unused
 | ||
| padding.  (Note that this is very different from the concept of
 | ||
| "zero-padding" a transform to a larger length, which changes the logical
 | ||
| size of the DFT by actually adding new input data.)  The kth element of
 | ||
| the complex array is exactly the same as the kth element of the
 | ||
| corresponding complex DFT. All positive 'n' are supported; products of
 | ||
| small factors are most efficient, but an O(n log n) algorithm is used
 | ||
| even for prime sizes.
 | ||
| 
 | ||
|    As noted above, the c2r transform destroys its input array even for
 | ||
| out-of-place transforms.  This can be prevented, if necessary, by
 | ||
| including 'FFTW_PRESERVE_INPUT' in the 'flags', with unfortunately some
 | ||
| sacrifice in performance.  This flag is also not currently supported for
 | ||
| multi-dimensional real DFTs (next section).
 | ||
| 
 | ||
|    Readers familiar with DFTs of real data will recall that the 0th (the
 | ||
| "DC") and 'n/2'-th (the "Nyquist" frequency, when 'n' is even) elements
 | ||
| of the complex output are purely real.  Some implementations therefore
 | ||
| store the Nyquist element where the DC imaginary part would go, in order
 | ||
| to make the input and output arrays the same size.  Such packing,
 | ||
| however, does not generalize well to multi-dimensional transforms, and
 | ||
| the space savings are miniscule in any case; FFTW does not support it.
 | ||
| 
 | ||
|    An alternative interface for one-dimensional r2c and c2r DFTs can be
 | ||
| found in the 'r2r' interface (*note The Halfcomplex-format DFT::), with
 | ||
| "halfcomplex"-format output that _is_ the same size (and type) as the
 | ||
| input array.  That interface, although it is not very useful for
 | ||
| multi-dimensional transforms, may sometimes yield better performance.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Multi-Dimensional DFTs of Real Data,  Next: More DFTs of Real Data,  Prev: One-Dimensional DFTs of Real Data,  Up: Tutorial
 | ||
| 
 | ||
| 2.4 Multi-Dimensional DFTs of Real Data
 | ||
| =======================================
 | ||
| 
 | ||
| Multi-dimensional DFTs of real data use the following planner routines:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
 | ||
|                                     double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
 | ||
|                                     double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
 | ||
|                                  double *in, fftw_complex *out,
 | ||
|                                  unsigned flags);
 | ||
| 
 | ||
|    as well as the corresponding 'c2r' routines with the input/output
 | ||
| types swapped.  These routines work similarly to their complex
 | ||
| analogues, except for the fact that here the complex output array is cut
 | ||
| roughly in half and the real array requires padding for in-place
 | ||
| transforms (as in 1d, above).
 | ||
| 
 | ||
|    As before, 'n' is the logical size of the array, and the consequences
 | ||
| of this on the the format of the complex arrays deserve careful
 | ||
| attention.  Suppose that the real data has dimensions n[0] x n[1] x n[2]
 | ||
| x ...  x n[d-1] (in row-major order).  Then, after an r2c transform, the
 | ||
| output is an n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) array of
 | ||
| 'fftw_complex' values in row-major order, corresponding to slightly over
 | ||
| half of the output of the corresponding complex DFT. (The division is
 | ||
| rounded down.)  The ordering of the data is otherwise exactly the same
 | ||
| as in the complex-DFT case.
 | ||
| 
 | ||
|    For out-of-place transforms, this is the end of the story: the real
 | ||
| data is stored as a row-major array of size n[0] x n[1] x n[2] x ...  x
 | ||
| n[d-1] and the complex data is stored as a row-major array of size n[0]
 | ||
| x n[1] x n[2] x ...  x (n[d-1]/2 + 1) .
 | ||
| 
 | ||
|    For in-place transforms, however, extra padding of the real-data
 | ||
| array is necessary because the complex array is larger than the real
 | ||
| array, and the two arrays share the same memory locations.  Thus, for
 | ||
| in-place transforms, the final dimension of the real-data array must be
 | ||
| padded with extra values to accommodate the size of the complex
 | ||
| data--two values if the last dimension is even and one if it is odd.
 | ||
| That is, the last dimension of the real data must physically contain 2 *
 | ||
| (n[d-1]/2+1) 'double' values (exactly enough to hold the complex data).
 | ||
| This physical array size does not, however, change the _logical_ array
 | ||
| size--only n[d-1] values are actually stored in the last dimension, and
 | ||
| n[d-1] is the last dimension passed to the plan-creation routine.
 | ||
| 
 | ||
|    For example, consider the transform of a two-dimensional real array
 | ||
| of size 'n0' by 'n1'.  The output of the r2c transform is a
 | ||
| two-dimensional complex array of size 'n0' by 'n1/2+1', where the 'y'
 | ||
| dimension has been cut nearly in half because of redundancies in the
 | ||
| output.  Because 'fftw_complex' is twice the size of 'double', the
 | ||
| output array is slightly bigger than the input array.  Thus, if we want
 | ||
| to compute the transform in place, we must _pad_ the input array so that
 | ||
| it is of size 'n0' by '2*(n1/2+1)'.  If 'n1' is even, then there are two
 | ||
| padding elements at the end of each row (which need not be initialized,
 | ||
| as they are only used for output).
 | ||
| 
 | ||
|    These transforms are unnormalized, so an r2c followed by a c2r
 | ||
| transform (or vice versa) will result in the original data scaled by the
 | ||
| number of real data elements--that is, the product of the (logical)
 | ||
| dimensions of the real data.
 | ||
| 
 | ||
|    (Because the last dimension is treated specially, if it is equal to
 | ||
| '1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
 | ||
| transform.  In that case, the last complex dimension also has size '1'
 | ||
| ('=1/2+1'), and no advantage is gained over the complex transforms.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: More DFTs of Real Data,  Prev: Multi-Dimensional DFTs of Real Data,  Up: Tutorial
 | ||
| 
 | ||
| 2.5 More DFTs of Real Data
 | ||
| ==========================
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * The Halfcomplex-format DFT::
 | ||
| * Real even/odd DFTs (cosine/sine transforms)::
 | ||
| * The Discrete Hartley Transform::
 | ||
| 
 | ||
| FFTW supports several other transform types via a unified "r2r"
 | ||
| (real-to-real) interface, so called because it takes a real ('double')
 | ||
| array and outputs a real array of the same size.  These r2r transforms
 | ||
| currently fall into three categories: DFTs of real input and
 | ||
| complex-Hermitian output in halfcomplex format, DFTs of real input with
 | ||
| even/odd symmetry (a.k.a.  discrete cosine/sine transforms, DCTs/DSTs),
 | ||
| and discrete Hartley transforms (DHTs), all described in more detail by
 | ||
| the following sections.
 | ||
| 
 | ||
|    The r2r transforms follow the by now familiar interface of creating
 | ||
| an 'fftw_plan', executing it with 'fftw_execute(plan)', and destroying
 | ||
| it with 'fftw_destroy_plan(plan)'.  Furthermore, all r2r transforms
 | ||
| share the same planner interface:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind, unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
 | ||
|                                 unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
 | ||
|                                 double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind0,
 | ||
|                                 fftw_r2r_kind kind1,
 | ||
|                                 fftw_r2r_kind kind2,
 | ||
|                                 unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
 | ||
|                              const fftw_r2r_kind *kind, unsigned flags);
 | ||
| 
 | ||
|    Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
 | ||
| transforms for contiguous arrays in row-major order, transforming (real)
 | ||
| input to output of the same size, where 'n' specifies the _physical_
 | ||
| dimensions of the arrays.  All positive 'n' are supported (with the
 | ||
| exception of 'n=1' for the 'FFTW_REDFT00' kind, noted in the real-even
 | ||
| subsection below); products of small factors are most efficient
 | ||
| (factorizing 'n-1' and 'n+1' for 'FFTW_REDFT00' and 'FFTW_RODFT00'
 | ||
| kinds, described below), but an O(n log n) algorithm is used even for
 | ||
| prime sizes.
 | ||
| 
 | ||
|    Each dimension has a "kind" parameter, of type 'fftw_r2r_kind',
 | ||
| specifying the kind of r2r transform to be used for that dimension.  (In
 | ||
| the case of 'fftw_plan_r2r', this is an array 'kind[rank]' where
 | ||
| 'kind[i]' is the transform kind for the dimension 'n[i]'.)  The kind can
 | ||
| be one of a set of predefined constants, defined in the following
 | ||
| subsections.
 | ||
| 
 | ||
|    In other words, FFTW computes the separable product of the specified
 | ||
| r2r transforms over each dimension, which can be used e.g.  for partial
 | ||
| differential equations with mixed boundary conditions.  (For some r2r
 | ||
| kinds, notably the halfcomplex DFT and the DHT, such a separable product
 | ||
| is somewhat problematic in more than one dimension, however, as is
 | ||
| described below.)
 | ||
| 
 | ||
|    In the current version of FFTW, all r2r transforms except for the
 | ||
| halfcomplex type are computed via pre- or post-processing of halfcomplex
 | ||
| transforms, and they are therefore not as fast as they could be.  Since
 | ||
| most other general DCT/DST codes employ a similar algorithm, however,
 | ||
| FFTW's implementation should provide at least competitive performance.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: The Halfcomplex-format DFT,  Next: Real even/odd DFTs (cosine/sine transforms),  Prev: More DFTs of Real Data,  Up: More DFTs of Real Data
 | ||
| 
 | ||
| 2.5.1 The Halfcomplex-format DFT
 | ||
| --------------------------------
 | ||
| 
 | ||
| An r2r kind of 'FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
 | ||
| One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
 | ||
| output, and may sometimes be faster and/or more convenient than the
 | ||
| latter.  The inverse "hc2r" transform is of kind 'FFTW_HC2R'.  This
 | ||
| consists of the non-redundant half of the complex output for a 1d
 | ||
| real-input DFT of size 'n', stored as a sequence of 'n' real numbers
 | ||
| ('double') in the format:
 | ||
| 
 | ||
|    r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
 | ||
| 
 | ||
|    Here, rk is the real part of the kth output, and ik is the imaginary
 | ||
| part.  (Division by 2 is rounded down.)  For a halfcomplex array
 | ||
| 'hc[n]', the kth component thus has its real part in 'hc[k]' and its
 | ||
| imaginary part in 'hc[n-k]', with the exception of 'k' '==' '0' or 'n/2'
 | ||
| (the latter only if 'n' is even)--in these two cases, the imaginary part
 | ||
| is zero due to symmetries of the real-input DFT, and is not stored.
 | ||
| Thus, the r2hc transform of 'n' real values is a halfcomplex array of
 | ||
| length 'n', and vice versa for hc2r.
 | ||
| 
 | ||
|    Aside from the differing format, the output of
 | ||
| 'FFTW_R2HC'/'FFTW_HC2R' is otherwise exactly the same as for the
 | ||
| corresponding 1d r2c/c2r transform (i.e.  'FFTW_FORWARD'/'FFTW_BACKWARD'
 | ||
| transforms, respectively).  Recall that these transforms are
 | ||
| unnormalized, so r2hc followed by hc2r will result in the original data
 | ||
| multiplied by 'n'.  Furthermore, like the c2r transform, an out-of-place
 | ||
| hc2r transform will _destroy its input_ array.
 | ||
| 
 | ||
|    Although these halfcomplex transforms can be used with the
 | ||
| multi-dimensional r2r interface, the interpretation of such a separable
 | ||
| product of transforms along each dimension is problematic.  For example,
 | ||
| consider a two-dimensional 'n0' by 'n1', r2hc by r2hc transform planned
 | ||
| by 'fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC,
 | ||
| FFTW_MEASURE)'.  Conceptually, FFTW first transforms the rows (of size
 | ||
| 'n1') to produce halfcomplex rows, and then transforms the columns (of
 | ||
| size 'n0').  Half of these column transforms, however, are of imaginary
 | ||
| parts, and should therefore be multiplied by i and combined with the
 | ||
| r2hc transforms of the real columns to produce the 2d DFT amplitudes;
 | ||
| FFTW's r2r transform does _not_ perform this combination for you.  Thus,
 | ||
| if a multi-dimensional real-input/output DFT is required, we recommend
 | ||
| using the ordinary r2c/c2r interface (*note Multi-Dimensional DFTs of
 | ||
| Real Data::).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Real even/odd DFTs (cosine/sine transforms),  Next: The Discrete Hartley Transform,  Prev: The Halfcomplex-format DFT,  Up: More DFTs of Real Data
 | ||
| 
 | ||
| 2.5.2 Real even/odd DFTs (cosine/sine transforms)
 | ||
| -------------------------------------------------
 | ||
| 
 | ||
| The Fourier transform of a real-even function f(-x) = f(x) is real-even,
 | ||
| and i times the Fourier transform of a real-odd function f(-x) = -f(x)
 | ||
| is real-odd.  Similar results hold for a discrete Fourier transform, and
 | ||
| thus for these symmetries the need for complex inputs/outputs is
 | ||
| entirely eliminated.  Moreover, one gains a factor of two in speed/space
 | ||
| from the fact that the data are real, and an additional factor of two
 | ||
| from the even/odd symmetry: only the non-redundant (first) half of the
 | ||
| array need be stored.  The result is the real-even DFT ("REDFT") and the
 | ||
| real-odd DFT ("RODFT"), also known as the discrete cosine and sine
 | ||
| transforms ("DCT" and "DST"), respectively.
 | ||
| 
 | ||
|    (In this section, we describe the 1d transforms; multi-dimensional
 | ||
| transforms are just a separable product of these transforms operating
 | ||
| along each dimension.)
 | ||
| 
 | ||
|    Because of the discrete sampling, one has an additional choice: is
 | ||
| the data even/odd around a sampling point, or around the point halfway
 | ||
| between two samples?  The latter corresponds to _shifting_ the samples
 | ||
| by _half_ an interval, and gives rise to several transform variants
 | ||
| denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether
 | ||
| the input (a) and/or output (b) are shifted by half a sample (1 means it
 | ||
| is shifted).  These are also known as types I-IV of the DCT and DST, and
 | ||
| all four types are supported by FFTW's r2r interface.(1)
 | ||
| 
 | ||
|    The r2r kinds for the various REDFT and RODFT types supported by
 | ||
| FFTW, along with the boundary conditions at both ends of the _input_
 | ||
| array ('n' real numbers 'in[j=0..n-1]'), are:
 | ||
| 
 | ||
|    * 'FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
 | ||
| 
 | ||
|    * 'FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
 | ||
|      around j=n-0.5.
 | ||
| 
 | ||
|    * 'FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
 | ||
|      around j=n.
 | ||
| 
 | ||
|    * 'FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
 | ||
| 
 | ||
|    * 'FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
 | ||
| 
 | ||
|    * 'FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
 | ||
| 
 | ||
|    * 'FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
 | ||
| 
 | ||
|    * 'FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
 | ||
| 
 | ||
|    Note that these symmetries apply to the "logical" array being
 | ||
| transformed; *there are no constraints on your physical input data*.
 | ||
| So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
 | ||
| abcde, it corresponds to the DFT of the logical even array abcdedcb of
 | ||
| size 8.  A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
 | ||
| size-8 logical DFT of the even array abcddcba, shifted by half a sample.
 | ||
| 
 | ||
|    All of these transforms are invertible.  The inverse of R*DFT00 is
 | ||
| R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
 | ||
| simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
 | ||
| However, the transforms computed by FFTW are unnormalized, exactly like
 | ||
| the corresponding real and complex DFTs, so computing a transform
 | ||
| followed by its inverse yields the original array scaled by N, where N
 | ||
| is the _logical_ DFT size.  For REDFT00, N=2(n-1); for RODFT00,
 | ||
| N=2(n+1); otherwise, N=2n.
 | ||
| 
 | ||
|    Note that the boundary conditions of the transform output array are
 | ||
| given by the input boundary conditions of the inverse transform.  Thus,
 | ||
| the above transforms are all inequivalent in terms of input/output
 | ||
| boundary conditions, even neglecting the 0.5 shift difference.
 | ||
| 
 | ||
|    FFTW is most efficient when N is a product of small factors; note
 | ||
| that this _differs_ from the factorization of the physical size 'n' for
 | ||
| REDFT00 and RODFT00!  There is another oddity: 'n=1' REDFT00 transforms
 | ||
| correspond to N=0, and so are _not defined_ (the planner will return
 | ||
| 'NULL').  Otherwise, any positive 'n' is supported.
 | ||
| 
 | ||
|    For the precise mathematical definitions of these transforms as used
 | ||
| by FFTW, see *note What FFTW Really Computes::.  (For people accustomed
 | ||
| to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
 | ||
| the cos/sin functions so that they correspond precisely to an even/odd
 | ||
| DFT of size N. Some authors also include additional multiplicative
 | ||
| factors of sqrt(2) for selected inputs and outputs; this makes the
 | ||
| transform orthogonal, but sacrifices the direct equivalence to a
 | ||
| symmetric DFT.)
 | ||
| 
 | ||
| Which type do you need?
 | ||
| .......................
 | ||
| 
 | ||
| Since the required flavor of even/odd DFT depends upon your problem, you
 | ||
| are the best judge of this choice, but we can make a few comments on
 | ||
| relative efficiency to help you in your selection.  In particular,
 | ||
| R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
 | ||
| for odd sizes), while the R*DFT00 transforms are sometimes significantly
 | ||
| slower (especially for even sizes).(2)
 | ||
| 
 | ||
|    Thus, if only the boundary conditions on the transform inputs are
 | ||
| specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
 | ||
| R*DFT11 (unless the half-sample shift or the self-inverse property is
 | ||
| significant for your problem).
 | ||
| 
 | ||
|    If performance is important to you and you are using only small sizes
 | ||
| (say n<200), e.g.  for multi-dimensional transforms, then you might
 | ||
| consider generating hard-coded transforms of those sizes and types that
 | ||
| you are interested in (*note Generating your own code::).
 | ||
| 
 | ||
|    We are interested in hearing what types of symmetric transforms you
 | ||
| find most useful.
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) There are also type V-VIII transforms, which correspond to a
 | ||
| logical DFT of _odd_ size N, independent of whether the physical size
 | ||
| 'n' is odd, but we do not support these variants.
 | ||
| 
 | ||
|    (2) R*DFT00 is sometimes slower in FFTW because we discovered that
 | ||
| the standard algorithm for computing this by a pre/post-processed real
 | ||
| DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources
 | ||
| for decades now--has serious numerical problems: it already loses
 | ||
| several decimal places of accuracy for 16k sizes.  There seem to be only
 | ||
| two alternatives in the literature that do not suffer similarly: a
 | ||
| recursive decomposition into smaller DCTs, which would require a large
 | ||
| set of codelets for efficiency and generality, or sacrificing a factor
 | ||
| of 2 in speed to use a real DFT of twice the size.  We currently employ
 | ||
| the latter technique for general n, as well as a limited form of the
 | ||
| former method: a split-radix decomposition when n is odd (N a multiple
 | ||
| of 4).  For N containing many factors of 2, the split-radix method seems
 | ||
| to recover most of the speed of the standard algorithm without the
 | ||
| accuracy tradeoff.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: The Discrete Hartley Transform,  Prev: Real even/odd DFTs (cosine/sine transforms),  Up: More DFTs of Real Data
 | ||
| 
 | ||
| 2.5.3 The Discrete Hartley Transform
 | ||
| ------------------------------------
 | ||
| 
 | ||
| If you are planning to use the DHT because you've heard that it is
 | ||
| "faster" than the DFT (FFT), *stop here*.  The DHT is not faster than
 | ||
| the DFT. That story is an old but enduring misconception that was
 | ||
| debunked in 1987.
 | ||
| 
 | ||
|    The discrete Hartley transform (DHT) is an invertible linear
 | ||
| transform closely related to the DFT. In the DFT, one multiplies each
 | ||
| input by cos - i * sin (a complex exponential), whereas in the DHT each
 | ||
| input is multiplied by simply cos + sin.  Thus, the DHT transforms 'n'
 | ||
| real numbers to 'n' real numbers, and has the convenient property of
 | ||
| being its own inverse.  In FFTW, a DHT (of any positive 'n') can be
 | ||
| specified by an r2r kind of 'FFTW_DHT'.
 | ||
| 
 | ||
|    Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
 | ||
| size 'n' followed by another DHT of the same size will result in the
 | ||
| original array multiplied by 'n'.
 | ||
| 
 | ||
|    The DHT was originally proposed as a more efficient alternative to
 | ||
| the DFT for real data, but it was subsequently shown that a specialized
 | ||
| DFT (such as FFTW's r2hc or r2c transforms) could be just as fast.  In
 | ||
| FFTW, the DHT is actually computed by post-processing an r2hc transform,
 | ||
| so there is ordinarily no reason to prefer it from a performance
 | ||
| perspective.(1)  However, we have heard rumors that the DHT might be the
 | ||
| most appropriate transform in its own right for certain applications,
 | ||
| and we would be very interested to hear from anyone who finds it useful.
 | ||
| 
 | ||
|    If 'FFTW_DHT' is specified for multiple dimensions of a
 | ||
| multi-dimensional transform, FFTW computes the separable product of 1d
 | ||
| DHTs along each dimension.  Unfortunately, this is not quite the same
 | ||
| thing as a true multi-dimensional DHT; you can compute the latter, if
 | ||
| necessary, with at most 'rank-1' post-processing passes [see e.g.  H.
 | ||
| Hao and R. N. Bracewell, Proc.  IEEE 75, 264-266 (1987)].
 | ||
| 
 | ||
|    For the precise mathematical definition of the DHT as used by FFTW,
 | ||
| see *note What FFTW Really Computes::.
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) We provide the DHT mainly as a byproduct of some internal
 | ||
| algorithms.  FFTW computes a real input/output DFT of _prime_ size by
 | ||
| re-expressing it as a DHT plus post/pre-processing and then using
 | ||
| Rader's prime-DFT algorithm adapted to the DHT.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Other Important Topics,  Next: FFTW Reference,  Prev: Tutorial,  Up: Top
 | ||
| 
 | ||
| 3 Other Important Topics
 | ||
| ************************
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * SIMD alignment and fftw_malloc::
 | ||
| * Multi-dimensional Array Format::
 | ||
| * Words of Wisdom-Saving Plans::
 | ||
| * Caveats in Using Wisdom::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: SIMD alignment and fftw_malloc,  Next: Multi-dimensional Array Format,  Prev: Other Important Topics,  Up: Other Important Topics
 | ||
| 
 | ||
| 3.1 SIMD alignment and fftw_malloc
 | ||
| ==================================
 | ||
| 
 | ||
| SIMD, which stands for "Single Instruction Multiple Data," is a set of
 | ||
| special operations supported by some processors to perform a single
 | ||
| operation on several numbers (usually 2 or 4) simultaneously.  SIMD
 | ||
| floating-point instructions are available on several popular CPUs:
 | ||
| SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
 | ||
| VSX on some POWER/PowerPCs, NEON on some ARM models.  FFTW can be
 | ||
| compiled to support the SIMD instructions on any of these systems.
 | ||
| 
 | ||
|    A program linking to an FFTW library compiled with SIMD support can
 | ||
| obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
 | ||
| In order to obtain this speedup, however, the arrays of complex (or
 | ||
| real) data passed to FFTW must be specially aligned in memory (typically
 | ||
| 16-byte aligned), and often this alignment is more stringent than that
 | ||
| provided by the usual 'malloc' (etc.)  allocation routines.
 | ||
| 
 | ||
|    In order to guarantee proper alignment for SIMD, therefore, in case
 | ||
| your program is ever linked against a SIMD-using FFTW, we recommend
 | ||
| allocating your transform data with 'fftw_malloc' and de-allocating it
 | ||
| with 'fftw_free'.  These have exactly the same interface and behavior as
 | ||
| 'malloc'/'free', except that for a SIMD FFTW they ensure that the
 | ||
| returned pointer has the necessary alignment (by calling 'memalign' or
 | ||
| its equivalent on your OS).
 | ||
| 
 | ||
|    You are not _required_ to use 'fftw_malloc'.  You can allocate your
 | ||
| data in any way that you like, from 'malloc' to 'new' (in C++) to a
 | ||
| fixed-size array declaration.  If the array happens not to be properly
 | ||
| aligned, FFTW will not use the SIMD extensions.
 | ||
| 
 | ||
|    Since 'fftw_malloc' only ever needs to be used for real and complex
 | ||
| arrays, we provide two convenient wrapper routines 'fftw_alloc_real(N)'
 | ||
| and 'fftw_alloc_complex(N)' that are equivalent to
 | ||
| '(double*)fftw_malloc(sizeof(double) * N)' and
 | ||
| '(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively (or
 | ||
| their equivalents in other precisions).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Multi-dimensional Array Format,  Next: Words of Wisdom-Saving Plans,  Prev: SIMD alignment and fftw_malloc,  Up: Other Important Topics
 | ||
| 
 | ||
| 3.2 Multi-dimensional Array Format
 | ||
| ==================================
 | ||
| 
 | ||
| This section describes the format in which multi-dimensional arrays are
 | ||
| stored in FFTW. We felt that a detailed discussion of this topic was
 | ||
| necessary.  Since several different formats are common, this topic is
 | ||
| often a source of confusion.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Row-major Format::
 | ||
| * Column-major Format::
 | ||
| * Fixed-size Arrays in C::
 | ||
| * Dynamic Arrays in C::
 | ||
| * Dynamic Arrays in C-The Wrong Way::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Row-major Format,  Next: Column-major Format,  Prev: Multi-dimensional Array Format,  Up: Multi-dimensional Array Format
 | ||
| 
 | ||
| 3.2.1 Row-major Format
 | ||
| ----------------------
 | ||
| 
 | ||
| The multi-dimensional arrays passed to 'fftw_plan_dft' etcetera are
 | ||
| expected to be stored as a single contiguous block in "row-major" order
 | ||
| (sometimes called "C order").  Basically, this means that as you step
 | ||
| through adjacent memory locations, the first dimension's index varies
 | ||
| most slowly and the last dimension's index varies most quickly.
 | ||
| 
 | ||
|    To be more explicit, let us consider an array of rank d whose
 | ||
| dimensions are n[0] x n[1] x n[2] x ...  x n[d-1] .  Now, we specify a
 | ||
| location in the array by a sequence of d (zero-based) indices, one for
 | ||
| each dimension: (i[0], i[1], ..., i[d-1]).  If the array is stored in
 | ||
| row-major order, then this element is located at the position i[d-1] +
 | ||
| n[d-1] * (i[d-2] + n[d-2] * (...  + n[1] * i[0])).
 | ||
| 
 | ||
|    Note that, for the ordinary complex DFT, each element of the array
 | ||
| must be of type 'fftw_complex'; i.e.  a (real, imaginary) pair of
 | ||
| (double-precision) numbers.
 | ||
| 
 | ||
|    In the advanced FFTW interface, the physical dimensions n from which
 | ||
| the indices are computed can be different from (larger than) the logical
 | ||
| dimensions of the transform to be computed, in order to transform a
 | ||
| subset of a larger array.  Note also that, in the advanced interface,
 | ||
| the expression above is multiplied by a "stride" to get the actual array
 | ||
| index--this is useful in situations where each element of the
 | ||
| multi-dimensional array is actually a data structure (or another array),
 | ||
| and you just want to transform a single field.  In the basic interface,
 | ||
| however, the stride is 1.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Column-major Format,  Next: Fixed-size Arrays in C,  Prev: Row-major Format,  Up: Multi-dimensional Array Format
 | ||
| 
 | ||
| 3.2.2 Column-major Format
 | ||
| -------------------------
 | ||
| 
 | ||
| Readers from the Fortran world are used to arrays stored in
 | ||
| "column-major" order (sometimes called "Fortran order").  This is
 | ||
| essentially the exact opposite of row-major order in that, here, the
 | ||
| _first_ dimension's index varies most quickly.
 | ||
| 
 | ||
|    If you have an array stored in column-major order and wish to
 | ||
| transform it using FFTW, it is quite easy to do.  When creating the
 | ||
| plan, simply pass the dimensions of the array to the planner in _reverse
 | ||
| order_.  For example, if your array is a rank three 'N x M x L' matrix
 | ||
| in column-major order, you should pass the dimensions of the array as if
 | ||
| it were an 'L x M x N' matrix (which it is, from the perspective of
 | ||
| FFTW). This is done for you _automatically_ by the FFTW legacy-Fortran
 | ||
| interface (*note Calling FFTW from Legacy Fortran::), but you must do it
 | ||
| manually with the modern Fortran interface (*note Reversing array
 | ||
| dimensions::).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Fixed-size Arrays in C,  Next: Dynamic Arrays in C,  Prev: Column-major Format,  Up: Multi-dimensional Array Format
 | ||
| 
 | ||
| 3.2.3 Fixed-size Arrays in C
 | ||
| ----------------------------
 | ||
| 
 | ||
| A multi-dimensional array whose size is declared at compile time in C is
 | ||
| _already_ in row-major order.  You don't have to do anything special to
 | ||
| transform it.  For example:
 | ||
| 
 | ||
|      {
 | ||
|           fftw_complex data[N0][N1][N2];
 | ||
|           fftw_plan plan;
 | ||
|           ...
 | ||
|           plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
 | ||
|                                   FFTW_FORWARD, FFTW_ESTIMATE);
 | ||
|           ...
 | ||
|      }
 | ||
| 
 | ||
|    This will plan a 3d in-place transform of size 'N0 x N1 x N2'.
 | ||
| Notice how we took the address of the zero-th element to pass to the
 | ||
| planner (we could also have used a typecast).
 | ||
| 
 | ||
|    However, we tend to _discourage_ users from declaring their arrays in
 | ||
| this way, for two reasons.  First, this allocates the array on the stack
 | ||
| ("automatic" storage), which has a very limited size on most operating
 | ||
| systems (declaring an array with more than a few thousand elements will
 | ||
| often cause a crash).  (You can get around this limitation on many
 | ||
| systems by declaring the array as 'static' and/or global, but that has
 | ||
| its own drawbacks.)  Second, it may not optimally align the array for
 | ||
| use with a SIMD FFTW (*note SIMD alignment and fftw_malloc::).  Instead,
 | ||
| we recommend using 'fftw_malloc', as described below.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Dynamic Arrays in C,  Next: Dynamic Arrays in C-The Wrong Way,  Prev: Fixed-size Arrays in C,  Up: Multi-dimensional Array Format
 | ||
| 
 | ||
| 3.2.4 Dynamic Arrays in C
 | ||
| -------------------------
 | ||
| 
 | ||
| We recommend allocating most arrays dynamically, with 'fftw_malloc'.
 | ||
| This isn't too hard to do, although it is not as straightforward for
 | ||
| multi-dimensional arrays as it is for one-dimensional arrays.
 | ||
| 
 | ||
|    Creating the array is simple: using a dynamic-allocation routine like
 | ||
| 'fftw_malloc', allocate an array big enough to store N 'fftw_complex'
 | ||
| values (for a complex DFT), where N is the product of the sizes of the
 | ||
| array dimensions (i.e.  the total number of complex values in the
 | ||
| array).  For example, here is code to allocate a 5 x 12 x 27 rank-3
 | ||
| array:
 | ||
| 
 | ||
|      fftw_complex *an_array;
 | ||
|      an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
 | ||
| 
 | ||
|    Accessing the array elements, however, is more tricky--you can't
 | ||
| simply use multiple applications of the '[]' operator like you could for
 | ||
| fixed-size arrays.  Instead, you have to explicitly compute the offset
 | ||
| into the array using the formula given earlier for row-major arrays.
 | ||
| For example, to reference the (i,j,k)-th element of the array allocated
 | ||
| above, you would use the expression 'an_array[k + 27 * (j + 12 * i)]'.
 | ||
| 
 | ||
|    This pain can be alleviated somewhat by defining appropriate macros,
 | ||
| or, in C++, creating a class and overloading the '()' operator.  The
 | ||
| recent C99 standard provides a way to reinterpret the dynamic array as a
 | ||
| "variable-length" multi-dimensional array amenable to '[]', but this
 | ||
| feature is not yet widely supported by compilers.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Dynamic Arrays in C-The Wrong Way,  Prev: Dynamic Arrays in C,  Up: Multi-dimensional Array Format
 | ||
| 
 | ||
| 3.2.5 Dynamic Arrays in C--The Wrong Way
 | ||
| ----------------------------------------
 | ||
| 
 | ||
| A different method for allocating multi-dimensional arrays in C is often
 | ||
| suggested that is incompatible with FFTW: _using it will cause FFTW to
 | ||
| die a painful death_.  We discuss the technique here, however, because
 | ||
| it is so commonly known and used.  This method is to create arrays of
 | ||
| pointers of arrays of pointers of ...etcetera.  For example, the
 | ||
| analogue in this method to the example above is:
 | ||
| 
 | ||
|      int i,j;
 | ||
|      fftw_complex ***a_bad_array;  /* another way to make a 5x12x27 array */
 | ||
| 
 | ||
|      a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
 | ||
|      for (i = 0; i < 5; ++i) {
 | ||
|           a_bad_array[i] =
 | ||
|              (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
 | ||
|           for (j = 0; j < 12; ++j)
 | ||
|                a_bad_array[i][j] =
 | ||
|                      (fftw_complex *) malloc(27 * sizeof(fftw_complex));
 | ||
|      }
 | ||
| 
 | ||
|    As you can see, this sort of array is inconvenient to allocate (and
 | ||
| deallocate).  On the other hand, it has the advantage that the
 | ||
| (i,j,k)-th element can be referenced simply by 'a_bad_array[i][j][k]'.
 | ||
| 
 | ||
|    If you like this technique and want to maximize convenience in
 | ||
| accessing the array, but still want to pass the array to FFTW, you can
 | ||
| use a hybrid method.  Allocate the array as one contiguous block, but
 | ||
| also declare an array of arrays of pointers that point to appropriate
 | ||
| places in the block.  That sort of trick is beyond the scope of this
 | ||
| documentation; for more information on multi-dimensional arrays in C,
 | ||
| see the 'comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Words of Wisdom-Saving Plans,  Next: Caveats in Using Wisdom,  Prev: Multi-dimensional Array Format,  Up: Other Important Topics
 | ||
| 
 | ||
| 3.3 Words of Wisdom--Saving Plans
 | ||
| =================================
 | ||
| 
 | ||
| FFTW implements a method for saving plans to disk and restoring them.
 | ||
| In fact, what FFTW does is more general than just saving and loading
 | ||
| plans.  The mechanism is called "wisdom".  Here, we describe this
 | ||
| feature at a high level.  *Note FFTW Reference::, for a less casual but
 | ||
| more complete discussion of how to use wisdom in FFTW.
 | ||
| 
 | ||
|    Plans created with the 'FFTW_MEASURE', 'FFTW_PATIENT', or
 | ||
| 'FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may
 | ||
| require a long time to compute because FFTW must measure the runtime of
 | ||
| many possible plans and select the best one.  This setup is designed for
 | ||
| the situations where so many transforms of the same size must be
 | ||
| computed that the start-up time is irrelevant.  For short initialization
 | ||
| times, but slower transforms, we have provided 'FFTW_ESTIMATE'.  The
 | ||
| 'wisdom' mechanism is a way to get the best of both worlds: you compute
 | ||
| a good plan once, save it to disk, and later reload it as many times as
 | ||
| necessary.  The wisdom mechanism can actually save and reload many plans
 | ||
| at once, not just one.
 | ||
| 
 | ||
|    Whenever you create a plan, the FFTW planner accumulates wisdom,
 | ||
| which is information sufficient to reconstruct the plan.  After
 | ||
| planning, you can save this information to disk by means of the
 | ||
| function:
 | ||
|      int fftw_export_wisdom_to_filename(const char *filename);
 | ||
|    (This function returns non-zero on success.)
 | ||
| 
 | ||
|    The next time you run the program, you can restore the wisdom with
 | ||
| 'fftw_import_wisdom_from_filename' (which also returns non-zero on
 | ||
| success), and then recreate the plan using the same flags as before.
 | ||
|      int fftw_import_wisdom_from_filename(const char *filename);
 | ||
| 
 | ||
|    Wisdom is automatically used for any size to which it is applicable,
 | ||
| as long as the planner flags are not more "patient" than those with
 | ||
| which the wisdom was created.  For example, wisdom created with
 | ||
| 'FFTW_MEASURE' can be used if you later plan with 'FFTW_ESTIMATE' or
 | ||
| 'FFTW_MEASURE', but not with 'FFTW_PATIENT'.
 | ||
| 
 | ||
|    The 'wisdom' is cumulative, and is stored in a global, private data
 | ||
| structure managed internally by FFTW. The storage space required is
 | ||
| minimal, proportional to the logarithm of the sizes the wisdom was
 | ||
| generated from.  If memory usage is a concern, however, the wisdom can
 | ||
| be forgotten and its associated memory freed by calling:
 | ||
|      void fftw_forget_wisdom(void);
 | ||
| 
 | ||
|    Wisdom can be exported to a file, a string, or any other medium.  For
 | ||
| details, see *note Wisdom::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Caveats in Using Wisdom,  Prev: Words of Wisdom-Saving Plans,  Up: Other Important Topics
 | ||
| 
 | ||
| 3.4 Caveats in Using Wisdom
 | ||
| ===========================
 | ||
| 
 | ||
|      For in much wisdom is much grief, and he that increaseth knowledge
 | ||
|      increaseth sorrow.  [Ecclesiastes 1:18]
 | ||
| 
 | ||
|    There are pitfalls to using wisdom, in that it can negate FFTW's
 | ||
| ability to adapt to changing hardware and other conditions.  For
 | ||
| example, it would be perfectly possible to export wisdom from a program
 | ||
| running on one processor and import it into a program running on another
 | ||
| processor.  Doing so, however, would mean that the second program would
 | ||
| use plans optimized for the first processor, instead of the one it is
 | ||
| running on.
 | ||
| 
 | ||
|    It should be safe to reuse wisdom as long as the hardware and program
 | ||
| binaries remain unchanged.  (Actually, the optimal plan may change even
 | ||
| between runs of the same binary on identical hardware, due to
 | ||
| differences in the virtual memory environment, etcetera.  Users
 | ||
| seriously interested in performance should worry about this problem,
 | ||
| too.)  It is likely that, if the same wisdom is used for two different
 | ||
| program binaries, even running on the same machine, the plans may be
 | ||
| sub-optimal because of differing code alignments.  It is therefore wise
 | ||
| to recreate wisdom every time an application is recompiled.  The more
 | ||
| the underlying hardware and software changes between the creation of
 | ||
| wisdom and its use, the greater grows the risk of sub-optimal plans.
 | ||
| 
 | ||
|    Nevertheless, if the choice is between using 'FFTW_ESTIMATE' or using
 | ||
| possibly-suboptimal wisdom (created on the same machine, but for a
 | ||
| different binary), the wisdom is likely to be better.  For this reason,
 | ||
| we provide a function to import wisdom from a standard system-wide
 | ||
| location ('/etc/fftw/wisdom' on Unix):
 | ||
| 
 | ||
|      int fftw_import_system_wisdom(void);
 | ||
| 
 | ||
|    FFTW also provides a standalone program, 'fftw-wisdom' (described by
 | ||
| its own 'man' page on Unix) with which users can create wisdom, e.g.
 | ||
| for a canonical set of sizes to store in the system wisdom file.  *Note
 | ||
| Wisdom Utilities::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW Reference,  Next: Multi-threaded FFTW,  Prev: Other Important Topics,  Up: Top
 | ||
| 
 | ||
| 4 FFTW Reference
 | ||
| ****************
 | ||
| 
 | ||
| This chapter provides a complete reference for all sequential (i.e.,
 | ||
| one-processor) FFTW functions.  Parallel transforms are described in
 | ||
| later chapters.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Data Types and Files::
 | ||
| * Using Plans::
 | ||
| * Basic Interface::
 | ||
| * Advanced Interface::
 | ||
| * Guru Interface::
 | ||
| * New-array Execute Functions::
 | ||
| * Wisdom::
 | ||
| * What FFTW Really Computes::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Data Types and Files,  Next: Using Plans,  Prev: FFTW Reference,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.1 Data Types and Files
 | ||
| ========================
 | ||
| 
 | ||
| All programs using FFTW should include its header file:
 | ||
| 
 | ||
|      #include <fftw3.h>
 | ||
| 
 | ||
|    You must also link to the FFTW library.  On Unix, this means adding
 | ||
| '-lfftw3 -lm' at the _end_ of the link command.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Complex numbers::
 | ||
| * Precision::
 | ||
| * Memory Allocation::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Complex numbers,  Next: Precision,  Prev: Data Types and Files,  Up: Data Types and Files
 | ||
| 
 | ||
| 4.1.1 Complex numbers
 | ||
| ---------------------
 | ||
| 
 | ||
| The default FFTW interface uses 'double' precision for all
 | ||
| floating-point numbers, and defines a 'fftw_complex' type to hold
 | ||
| complex numbers as:
 | ||
| 
 | ||
|      typedef double fftw_complex[2];
 | ||
| 
 | ||
|    Here, the '[0]' element holds the real part and the '[1]' element
 | ||
| holds the imaginary part.
 | ||
| 
 | ||
|    Alternatively, if you have a C compiler (such as 'gcc') that supports
 | ||
| the C99 revision of the ANSI C standard, you can use C's new native
 | ||
| complex type (which is binary-compatible with the typedef above).  In
 | ||
| particular, if you '#include <complex.h>' _before_ '<fftw3.h>', then
 | ||
| 'fftw_complex' is defined to be the native complex type and you can
 | ||
| manipulate it with ordinary arithmetic (e.g.  'x = y * (3+4*I)', where
 | ||
| 'x' and 'y' are 'fftw_complex' and 'I' is the standard symbol for the
 | ||
| imaginary unit);
 | ||
| 
 | ||
|    C++ has its own 'complex<T>' template class, defined in the standard
 | ||
| '<complex>' header file.  Reportedly, the C++ standards committee has
 | ||
| recently agreed to mandate that the storage format used for this type be
 | ||
| binary-compatible with the C99 type, i.e.  an array 'T[2]' with
 | ||
| consecutive real '[0]' and imaginary '[1]' parts.  (See report
 | ||
| <http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf
 | ||
| WG21/N1388>.)  Although not part of the official standard as of this
 | ||
| writing, the proposal stated that: "This solution has been tested with
 | ||
| all current major implementations of the standard library and shown to
 | ||
| be working."  To the extent that this is true, if you have a variable
 | ||
| 'complex<double> *x', you can pass it directly to FFTW via
 | ||
| 'reinterpret_cast<fftw_complex*>(x)'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Precision,  Next: Memory Allocation,  Prev: Complex numbers,  Up: Data Types and Files
 | ||
| 
 | ||
| 4.1.2 Precision
 | ||
| ---------------
 | ||
| 
 | ||
| You can install single and long-double precision versions of FFTW, which
 | ||
| replace 'double' with 'float' and 'long double', respectively (*note
 | ||
| Installation and Customization::).  To use these interfaces, you:
 | ||
| 
 | ||
|    * Link to the single/long-double libraries; on Unix, '-lfftw3f' or
 | ||
|      '-lfftw3l' instead of (or in addition to) '-lfftw3'.  (You can link
 | ||
|      to the different-precision libraries simultaneously.)
 | ||
| 
 | ||
|    * Include the _same_ '<fftw3.h>' header file.
 | ||
| 
 | ||
|    * Replace all lowercase instances of 'fftw_' with 'fftwf_' or
 | ||
|      'fftwl_' for single or long-double precision, respectively.
 | ||
|      ('fftw_complex' becomes 'fftwf_complex', 'fftw_execute' becomes
 | ||
|      'fftwf_execute', etcetera.)
 | ||
| 
 | ||
|    * Uppercase names, i.e.  names beginning with 'FFTW_', remain the
 | ||
|      same.
 | ||
| 
 | ||
|    * Replace 'double' with 'float' or 'long double' for subroutine
 | ||
|      parameters.
 | ||
| 
 | ||
|    Depending upon your compiler and/or hardware, 'long double' may not
 | ||
| be any more precise than 'double' (or may not be supported at all,
 | ||
| although it is standard in C99).
 | ||
| 
 | ||
|    We also support using the nonstandard '__float128'
 | ||
| quadruple-precision type provided by recent versions of 'gcc' on 32- and
 | ||
| 64-bit x86 hardware (*note Installation and Customization::).  To use
 | ||
| this type, link with '-lfftw3q -lquadmath -lm' (the 'libquadmath'
 | ||
| library provided by 'gcc' is needed for quadruple-precision
 | ||
| trigonometric functions) and use 'fftwq_' identifiers.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Memory Allocation,  Prev: Precision,  Up: Data Types and Files
 | ||
| 
 | ||
| 4.1.3 Memory Allocation
 | ||
| -----------------------
 | ||
| 
 | ||
|      void *fftw_malloc(size_t n);
 | ||
|      void fftw_free(void *p);
 | ||
| 
 | ||
|    These are functions that behave identically to 'malloc' and 'free',
 | ||
| except that they guarantee that the returned pointer obeys any special
 | ||
| alignment restrictions imposed by any algorithm in FFTW (e.g.  for SIMD
 | ||
| acceleration).  *Note SIMD alignment and fftw_malloc::.
 | ||
| 
 | ||
|    Data allocated by 'fftw_malloc' _must_ be deallocated by 'fftw_free'
 | ||
| and not by the ordinary 'free'.
 | ||
| 
 | ||
|    These routines simply call through to your operating system's
 | ||
| 'malloc' or, if necessary, its aligned equivalent (e.g.  'memalign'), so
 | ||
| you normally need not worry about any significant time or space
 | ||
| overhead.  You are _not required_ to use them to allocate your data, but
 | ||
| we strongly recommend it.
 | ||
| 
 | ||
|    Note: in C++, just as with ordinary 'malloc', you must typecast the
 | ||
| output of 'fftw_malloc' to whatever pointer type you are allocating.
 | ||
| 
 | ||
|    We also provide the following two convenience functions to allocate
 | ||
| real and complex arrays with 'n' elements, which are equivalent to
 | ||
| '(double *) fftw_malloc(sizeof(double) * n)' and '(fftw_complex *)
 | ||
| fftw_malloc(sizeof(fftw_complex) * n)', respectively:
 | ||
| 
 | ||
|      double *fftw_alloc_real(size_t n);
 | ||
|      fftw_complex *fftw_alloc_complex(size_t n);
 | ||
| 
 | ||
|    The equivalent functions in other precisions allocate arrays of 'n'
 | ||
| elements in that precision.  e.g.  'fftwf_alloc_real(n)' is equivalent
 | ||
| to '(float *) fftwf_malloc(sizeof(float) * n)'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Using Plans,  Next: Basic Interface,  Prev: Data Types and Files,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.2 Using Plans
 | ||
| ===============
 | ||
| 
 | ||
| Plans for all transform types in FFTW are stored as type 'fftw_plan' (an
 | ||
| opaque pointer type), and are created by one of the various planning
 | ||
| routines described in the following sections.  An 'fftw_plan' contains
 | ||
| all information necessary to compute the transform, including the
 | ||
| pointers to the input and output arrays.
 | ||
| 
 | ||
|      void fftw_execute(const fftw_plan plan);
 | ||
| 
 | ||
|    This executes the 'plan', to compute the corresponding transform on
 | ||
| the arrays for which it was planned (which must still exist).  The plan
 | ||
| is not modified, and 'fftw_execute' can be called as many times as
 | ||
| desired.
 | ||
| 
 | ||
|    To apply a given plan to a different array, you can use the new-array
 | ||
| execute interface.  *Note New-array Execute Functions::.
 | ||
| 
 | ||
|    'fftw_execute' (and equivalents) is the only function in FFTW
 | ||
| guaranteed to be thread-safe; see *note Thread safety::.
 | ||
| 
 | ||
|    This function:
 | ||
|      void fftw_destroy_plan(fftw_plan plan);
 | ||
|    deallocates the 'plan' and all its associated data.
 | ||
| 
 | ||
|    FFTW's planner saves some other persistent data, such as the
 | ||
| accumulated wisdom and a list of algorithms available in the current
 | ||
| configuration.  If you want to deallocate all of that and reset FFTW to
 | ||
| the pristine state it was in when you started your program, you can
 | ||
| call:
 | ||
| 
 | ||
|      void fftw_cleanup(void);
 | ||
| 
 | ||
|    After calling 'fftw_cleanup', all existing plans become undefined,
 | ||
| and you should not attempt to execute them nor to destroy them.  You can
 | ||
| however create and execute/destroy new plans, in which case FFTW starts
 | ||
| accumulating wisdom information again.
 | ||
| 
 | ||
|    'fftw_cleanup' does not deallocate your plans, however.  To prevent
 | ||
| memory leaks, you must still call 'fftw_destroy_plan' before executing
 | ||
| 'fftw_cleanup'.
 | ||
| 
 | ||
|    Occasionally, it may useful to know FFTW's internal "cost" metric
 | ||
| that it uses to compare plans to one another; this cost is proportional
 | ||
| to an execution time of the plan, in undocumented units, if the plan was
 | ||
| created with the 'FFTW_MEASURE' or other timing-based options, or
 | ||
| alternatively is a heuristic cost function for 'FFTW_ESTIMATE' plans.
 | ||
| (The cost values of measured and estimated plans are not comparable,
 | ||
| being in different units.  Also, costs from different FFTW versions or
 | ||
| the same version compiled differently may not be in the same units.
 | ||
| Plans created from wisdom have a cost of 0 since no timing measurement
 | ||
| is performed for them.  Finally, certain problems for which only one
 | ||
| top-level algorithm was possible may have required no measurements of
 | ||
| the cost of the whole plan, in which case 'fftw_cost' will also return
 | ||
| 0.)  The cost metric for a given plan is returned by:
 | ||
| 
 | ||
|      double fftw_cost(const fftw_plan plan);
 | ||
| 
 | ||
|    The following two routines are provided purely for academic purposes
 | ||
| (that is, for entertainment).
 | ||
| 
 | ||
|      void fftw_flops(const fftw_plan plan,
 | ||
|                      double *add, double *mul, double *fma);
 | ||
| 
 | ||
|    Given a 'plan', set 'add', 'mul', and 'fma' to an exact count of the
 | ||
| number of floating-point additions, multiplications, and fused
 | ||
| multiply-add operations involved in the plan's execution.  The total
 | ||
| number of floating-point operations (flops) is 'add + mul + 2*fma', or
 | ||
| 'add + mul + fma' if the hardware supports fused multiply-add
 | ||
| instructions (although the number of FMA operations is only approximate
 | ||
| because of compiler voodoo).  (The number of operations should be an
 | ||
| integer, but we use 'double' to avoid overflowing 'int' for large
 | ||
| transforms; the arguments are of type 'double' even for single and
 | ||
| long-double precision versions of FFTW.)
 | ||
| 
 | ||
|      void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
 | ||
|      void fftw_print_plan(const fftw_plan plan);
 | ||
|      char *fftw_sprint_plan(const fftw_plan plan);
 | ||
| 
 | ||
|    This outputs a "nerd-readable" representation of the 'plan' to the
 | ||
| given file, to 'stdout', or two a newly allocated NUL-terminated string
 | ||
| (which the caller is responsible for deallocating with 'free'),
 | ||
| respectively.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Basic Interface,  Next: Advanced Interface,  Prev: Using Plans,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.3 Basic Interface
 | ||
| ===================
 | ||
| 
 | ||
| Recall that the FFTW API is divided into three parts(1): the "basic
 | ||
| interface" computes a single transform of contiguous data, the "advanced
 | ||
| interface" computes transforms of multiple or strided arrays, and the
 | ||
| "guru interface" supports the most general data layouts, multiplicities,
 | ||
| and strides.  This section describes the basic interface, which we
 | ||
| expect to satisfy the needs of most users.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Complex DFTs::
 | ||
| * Planner Flags::
 | ||
| * Real-data DFTs::
 | ||
| * Real-data DFT Array Format::
 | ||
| * Real-to-Real Transforms::
 | ||
| * Real-to-Real Transform Kinds::
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) Gallia est omnis divisa in partes tres (Julius Caesar).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Complex DFTs,  Next: Planner Flags,  Prev: Basic Interface,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.1 Complex DFTs
 | ||
| ------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_1d(int n0,
 | ||
|                                 fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_2d(int n0, int n1,
 | ||
|                                 fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
 | ||
|                                 fftw_complex *in, fftw_complex *out,
 | ||
|                                 int sign, unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft(int rank, const int *n,
 | ||
|                              fftw_complex *in, fftw_complex *out,
 | ||
|                              int sign, unsigned flags);
 | ||
| 
 | ||
|    Plan a complex input/output discrete Fourier transform (DFT) in zero
 | ||
| or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
 | ||
| 
 | ||
|    Once you have created a plan for a certain transform type and
 | ||
| parameters, then creating another plan of the same type and parameters,
 | ||
| but for different arrays, is fast and shares constant data with the
 | ||
| first plan (if it still exists).
 | ||
| 
 | ||
|    The planner returns 'NULL' if the plan cannot be created.  In the
 | ||
| standard FFTW distribution, the basic interface is guaranteed to return
 | ||
| a non-'NULL' plan.  A plan may be 'NULL', however, if you are using a
 | ||
| customized FFTW configuration supporting a restricted set of transforms.
 | ||
| 
 | ||
| Arguments
 | ||
| .........
 | ||
| 
 | ||
|    * 'rank' is the rank of the transform (it should be the size of the
 | ||
|      array '*n'), and can be any non-negative integer.  (*Note Complex
 | ||
|      Multi-Dimensional DFTs::, for the definition of "rank".)  The
 | ||
|      '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
 | ||
|      '2', and '3', respectively.  The rank may be zero, which is
 | ||
|      equivalent to a rank-1 transform of size 1, i.e.  a copy of one
 | ||
|      number from input to output.
 | ||
| 
 | ||
|    * 'n0', 'n1', 'n2', or 'n[0..rank-1]' (as appropriate for each
 | ||
|      routine) specify the size of the transform dimensions.  They can be
 | ||
|      any positive integer.
 | ||
| 
 | ||
|         - Multi-dimensional arrays are stored in row-major order with
 | ||
|           dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
 | ||
|           'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
 | ||
|           Format::.
 | ||
|         - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
 | ||
|           11^e 13^f, where e+f is either 0 or 1, and the other exponents
 | ||
|           are arbitrary.  Other sizes are computed by means of a slow,
 | ||
|           general-purpose algorithm (which nevertheless retains O(n log
 | ||
|           n) performance even for prime sizes).  It is possible to
 | ||
|           customize FFTW for different array sizes; see *note
 | ||
|           Installation and Customization::.  Transforms whose sizes are
 | ||
|           powers of 2 are especially fast.
 | ||
| 
 | ||
|    * 'in' and 'out' point to the input and output arrays of the
 | ||
|      transform, which may be the same (yielding an in-place transform).
 | ||
|      These arrays are overwritten during planning, unless
 | ||
|      'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
 | ||
|      initialized, but they must be allocated.)
 | ||
| 
 | ||
|      If 'in == out', the transform is "in-place" and the input array is
 | ||
|      overwritten.  If 'in != out', the two arrays must not overlap (but
 | ||
|      FFTW does not check for this condition).
 | ||
| 
 | ||
|    * 'sign' is the sign of the exponent in the formula that defines the
 | ||
|      Fourier transform.  It can be -1 (= 'FFTW_FORWARD') or +1 (=
 | ||
|      'FFTW_BACKWARD').
 | ||
| 
 | ||
|    * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
|      defined in *note Planner Flags::.
 | ||
| 
 | ||
|    FFTW computes an unnormalized transform: computing a forward followed
 | ||
| by a backward transform (or vice versa) will result in the original data
 | ||
| multiplied by the size of the transform (the product of the dimensions).
 | ||
| For more information, see *note What FFTW Really Computes::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Planner Flags,  Next: Real-data DFTs,  Prev: Complex DFTs,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.2 Planner Flags
 | ||
| -------------------
 | ||
| 
 | ||
| All of the planner routines in FFTW accept an integer 'flags' argument,
 | ||
| which is a bitwise OR ('|') of zero or more of the flag constants
 | ||
| defined below.  These flags control the rigor (and time) of the planning
 | ||
| process, and can also impose (or lift) restrictions on the type of
 | ||
| transform algorithm that is employed.
 | ||
| 
 | ||
|    _Important:_ the planner overwrites the input array during planning
 | ||
| unless a saved plan (*note Wisdom::) is available for that problem, so
 | ||
| you should initialize your input data after creating the plan.  The only
 | ||
| exceptions to this are the 'FFTW_ESTIMATE' and 'FFTW_WISDOM_ONLY' flags,
 | ||
| as mentioned below.
 | ||
| 
 | ||
|    In all cases, if wisdom is available for the given problem that was
 | ||
| created with equal-or-greater planning rigor, then the more rigorous
 | ||
| wisdom is used.  For example, in 'FFTW_ESTIMATE' mode any available
 | ||
| wisdom is used, whereas in 'FFTW_PATIENT' mode only wisdom created in
 | ||
| patient or exhaustive mode can be used.  *Note Words of Wisdom-Saving
 | ||
| Plans::.
 | ||
| 
 | ||
| Planning-rigor flags
 | ||
| ....................
 | ||
| 
 | ||
|    * 'FFTW_ESTIMATE' specifies that, instead of actual measurements of
 | ||
|      different algorithms, a simple heuristic is used to pick a
 | ||
|      (probably sub-optimal) plan quickly.  With this flag, the
 | ||
|      input/output arrays are not overwritten during planning.
 | ||
| 
 | ||
|    * 'FFTW_MEASURE' tells FFTW to find an optimized plan by actually
 | ||
|      _computing_ several FFTs and measuring their execution time.
 | ||
|      Depending on your machine, this can take some time (often a few
 | ||
|      seconds).  'FFTW_MEASURE' is the default planning option.
 | ||
| 
 | ||
|    * 'FFTW_PATIENT' is like 'FFTW_MEASURE', but considers a wider range
 | ||
|      of algorithms and often produces a "more optimal" plan (especially
 | ||
|      for large transforms), but at the expense of several times longer
 | ||
|      planning time (especially for large transforms).
 | ||
| 
 | ||
|    * 'FFTW_EXHAUSTIVE' is like 'FFTW_PATIENT', but considers an even
 | ||
|      wider range of algorithms, including many that we think are
 | ||
|      unlikely to be fast, to produce the most optimal plan but with a
 | ||
|      substantially increased planning time.
 | ||
| 
 | ||
|    * 'FFTW_WISDOM_ONLY' is a special planning mode in which the plan is
 | ||
|      only created if wisdom is available for the given problem, and
 | ||
|      otherwise a 'NULL' plan is returned.  This can be combined with
 | ||
|      other flags, e.g.  'FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan
 | ||
|      only if wisdom is available that was created in 'FFTW_PATIENT' or
 | ||
|      'FFTW_EXHAUSTIVE' mode.  The 'FFTW_WISDOM_ONLY' flag is intended
 | ||
|      for users who need to detect whether wisdom is available; for
 | ||
|      example, if wisdom is not available one may wish to allocate new
 | ||
|      arrays for planning so that user data is not overwritten.
 | ||
| 
 | ||
| Algorithm-restriction flags
 | ||
| ...........................
 | ||
| 
 | ||
|    * 'FFTW_DESTROY_INPUT' specifies that an out-of-place transform is
 | ||
|      allowed to _overwrite its input_ array with arbitrary data; this
 | ||
|      can sometimes allow more efficient algorithms to be employed.
 | ||
| 
 | ||
|    * 'FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must
 | ||
|      _not change its input_ array.  This is ordinarily the _default_,
 | ||
|      except for c2r and hc2r (i.e.  complex-to-real) transforms for
 | ||
|      which 'FFTW_DESTROY_INPUT' is the default.  In the latter cases,
 | ||
|      passing 'FFTW_PRESERVE_INPUT' will attempt to use algorithms that
 | ||
|      do not destroy the input, at the expense of worse performance; for
 | ||
|      multi-dimensional c2r transforms, however, no input-preserving
 | ||
|      algorithms are implemented and the planner will return 'NULL' if
 | ||
|      one is requested.
 | ||
| 
 | ||
|    * 'FFTW_UNALIGNED' specifies that the algorithm may not impose any
 | ||
|      unusual alignment requirements on the input/output arrays (i.e.  no
 | ||
|      SIMD may be used).  This flag is normally _not necessary_, since
 | ||
|      the planner automatically detects misaligned arrays.  The only use
 | ||
|      for this flag is if you want to use the new-array execute interface
 | ||
|      to execute a given plan on a different array that may not be
 | ||
|      aligned like the original.  (Using 'fftw_malloc' makes this flag
 | ||
|      unnecessary even then.  You can also use 'fftw_alignment_of' to
 | ||
|      detect whether two arrays are equivalently aligned.)
 | ||
| 
 | ||
| Limiting planning time
 | ||
| ......................
 | ||
| 
 | ||
|      extern void fftw_set_timelimit(double seconds);
 | ||
| 
 | ||
|    This function instructs FFTW to spend at most 'seconds' seconds
 | ||
| (approximately) in the planner.  If 'seconds == FFTW_NO_TIMELIMIT' (the
 | ||
| default value, which is negative), then planning time is unbounded.
 | ||
| Otherwise, FFTW plans with a progressively wider range of algorithms
 | ||
| until the given time limit is reached or the given range of algorithms
 | ||
| is explored, returning the best available plan.
 | ||
| 
 | ||
|    For example, specifying 'FFTW_PATIENT' first plans in 'FFTW_ESTIMATE'
 | ||
| mode, then in 'FFTW_MEASURE' mode, then finally (time permitting) in
 | ||
| 'FFTW_PATIENT'.  If 'FFTW_EXHAUSTIVE' is specified instead, the planner
 | ||
| will further progress to 'FFTW_EXHAUSTIVE' mode.
 | ||
| 
 | ||
|    Note that the 'seconds' argument specifies only a rough limit; in
 | ||
| practice, the planner may use somewhat more time if the time limit is
 | ||
| reached when the planner is in the middle of an operation that cannot be
 | ||
| interrupted.  At the very least, the planner will complete planning in
 | ||
| 'FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Real-data DFTs,  Next: Real-data DFT Array Format,  Prev: Planner Flags,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.3 Real-data DFTs
 | ||
| --------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_r2c_1d(int n0,
 | ||
|                                     double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
 | ||
|                                     double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
 | ||
|                                     double *in, fftw_complex *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
 | ||
|                                  double *in, fftw_complex *out,
 | ||
|                                  unsigned flags);
 | ||
| 
 | ||
|    Plan a real-input/complex-output discrete Fourier transform (DFT) in
 | ||
| zero or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
 | ||
| 
 | ||
|    Once you have created a plan for a certain transform type and
 | ||
| parameters, then creating another plan of the same type and parameters,
 | ||
| but for different arrays, is fast and shares constant data with the
 | ||
| first plan (if it still exists).
 | ||
| 
 | ||
|    The planner returns 'NULL' if the plan cannot be created.  A
 | ||
| non-'NULL' plan is always returned by the basic interface unless you are
 | ||
| using a customized FFTW configuration supporting a restricted set of
 | ||
| transforms, or if you use the 'FFTW_PRESERVE_INPUT' flag with a
 | ||
| multi-dimensional out-of-place c2r transform (see below).
 | ||
| 
 | ||
| Arguments
 | ||
| .........
 | ||
| 
 | ||
|    * 'rank' is the rank of the transform (it should be the size of the
 | ||
|      array '*n'), and can be any non-negative integer.  (*Note Complex
 | ||
|      Multi-Dimensional DFTs::, for the definition of "rank".)  The
 | ||
|      '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
 | ||
|      '2', and '3', respectively.  The rank may be zero, which is
 | ||
|      equivalent to a rank-1 transform of size 1, i.e.  a copy of one
 | ||
|      real number (with zero imaginary part) from input to output.
 | ||
| 
 | ||
|    * 'n0', 'n1', 'n2', or 'n[0..rank-1]', (as appropriate for each
 | ||
|      routine) specify the size of the transform dimensions.  They can be
 | ||
|      any positive integer.  This is different in general from the
 | ||
|      _physical_ array dimensions, which are described in *note Real-data
 | ||
|      DFT Array Format::.
 | ||
| 
 | ||
|         - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
 | ||
|           11^e 13^f, where e+f is either 0 or 1, and the other exponents
 | ||
|           are arbitrary.  Other sizes are computed by means of a slow,
 | ||
|           general-purpose algorithm (which nevertheless retains O(n log
 | ||
|           n) performance even for prime sizes).  (It is possible to
 | ||
|           customize FFTW for different array sizes; see *note
 | ||
|           Installation and Customization::.)  Transforms whose sizes are
 | ||
|           powers of 2 are especially fast, and it is generally
 | ||
|           beneficial for the _last_ dimension of an r2c/c2r transform to
 | ||
|           be _even_.
 | ||
| 
 | ||
|    * 'in' and 'out' point to the input and output arrays of the
 | ||
|      transform, which may be the same (yielding an in-place transform).
 | ||
|      These arrays are overwritten during planning, unless
 | ||
|      'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
 | ||
|      initialized, but they must be allocated.)  For an in-place
 | ||
|      transform, it is important to remember that the real array will
 | ||
|      require padding, described in *note Real-data DFT Array Format::.
 | ||
| 
 | ||
|    * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
|      defined in *note Planner Flags::.
 | ||
| 
 | ||
|    The inverse transforms, taking complex input (storing the
 | ||
| non-redundant half of a logically Hermitian array) to real output, are
 | ||
| given by:
 | ||
| 
 | ||
|      fftw_plan fftw_plan_dft_c2r_1d(int n0,
 | ||
|                                     fftw_complex *in, double *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
 | ||
|                                     fftw_complex *in, double *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
 | ||
|                                     fftw_complex *in, double *out,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
 | ||
|                                  fftw_complex *in, double *out,
 | ||
|                                  unsigned flags);
 | ||
| 
 | ||
|    The arguments are the same as for the r2c transforms, except that the
 | ||
| input and output data formats are reversed.
 | ||
| 
 | ||
|    FFTW computes an unnormalized transform: computing an r2c followed by
 | ||
| a c2r transform (or vice versa) will result in the original data
 | ||
| multiplied by the size of the transform (the product of the logical
 | ||
| dimensions).  An r2c transform produces the same output as a
 | ||
| 'FFTW_FORWARD' complex DFT of the same input, and a c2r transform is
 | ||
| correspondingly equivalent to 'FFTW_BACKWARD'.  For more information,
 | ||
| see *note What FFTW Really Computes::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Real-data DFT Array Format,  Next: Real-to-Real Transforms,  Prev: Real-data DFTs,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.4 Real-data DFT Array Format
 | ||
| --------------------------------
 | ||
| 
 | ||
| The output of a DFT of real data (r2c) contains symmetries that, in
 | ||
| principle, make half of the outputs redundant (*note What FFTW Really
 | ||
| Computes::).  (Similarly for the input of an inverse c2r transform.)  In
 | ||
| practice, it is not possible to entirely realize these savings in an
 | ||
| efficient and understandable format that generalizes to
 | ||
| multi-dimensional transforms.  Instead, the output of the r2c transforms
 | ||
| is _slightly_ over half of the output of the corresponding complex
 | ||
| transform.  We do not "pack" the data in any way, but store it as an
 | ||
| ordinary array of 'fftw_complex' values.  In fact, this data is simply a
 | ||
| subsection of what would be the array in the corresponding complex
 | ||
| transform.
 | ||
| 
 | ||
|    Specifically, for a real transform of d (= 'rank') dimensions n[0] x
 | ||
| n[1] x n[2] x ...  x n[d-1] , the complex data is an n[0] x n[1] x n[2]
 | ||
| x ...  x (n[d-1]/2 + 1) array of 'fftw_complex' values in row-major
 | ||
| order (with the division rounded down).  That is, we only store the
 | ||
| _lower_ half (non-negative frequencies), plus one element, of the last
 | ||
| dimension of the data from the ordinary complex transform.  (We could
 | ||
| have instead taken half of any other dimension, but implementation turns
 | ||
| out to be simpler if the last, contiguous, dimension is used.)
 | ||
| 
 | ||
|    For an out-of-place transform, the real data is simply an array with
 | ||
| physical dimensions n[0] x n[1] x n[2] x ...  x n[d-1] in row-major
 | ||
| order.
 | ||
| 
 | ||
|    For an in-place transform, some complications arise since the complex
 | ||
| data is slightly larger than the real data.  In this case, the final
 | ||
| dimension of the real data must be _padded_ with extra values to
 | ||
| accommodate the size of the complex data--two extra if the last
 | ||
| dimension is even and one if it is odd.  That is, the last dimension of
 | ||
| the real data must physically contain 2 * (n[d-1]/2+1) 'double' values
 | ||
| (exactly enough to hold the complex data).  This physical array size
 | ||
| does not, however, change the _logical_ array size--only n[d-1] values
 | ||
| are actually stored in the last dimension, and n[d-1] is the last
 | ||
| dimension passed to the planner.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Real-to-Real Transforms,  Next: Real-to-Real Transform Kinds,  Prev: Real-data DFT Array Format,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.5 Real-to-Real Transforms
 | ||
| -----------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind, unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
 | ||
|                                 unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
 | ||
|                                 double *in, double *out,
 | ||
|                                 fftw_r2r_kind kind0,
 | ||
|                                 fftw_r2r_kind kind1,
 | ||
|                                 fftw_r2r_kind kind2,
 | ||
|                                 unsigned flags);
 | ||
|      fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
 | ||
|                              const fftw_r2r_kind *kind, unsigned flags);
 | ||
| 
 | ||
|    Plan a real input/output (r2r) transform of various kinds in zero or
 | ||
| more dimensions, returning an 'fftw_plan' (*note Using Plans::).
 | ||
| 
 | ||
|    Once you have created a plan for a certain transform type and
 | ||
| parameters, then creating another plan of the same type and parameters,
 | ||
| but for different arrays, is fast and shares constant data with the
 | ||
| first plan (if it still exists).
 | ||
| 
 | ||
|    The planner returns 'NULL' if the plan cannot be created.  A
 | ||
| non-'NULL' plan is always returned by the basic interface unless you are
 | ||
| using a customized FFTW configuration supporting a restricted set of
 | ||
| transforms, or for size-1 'FFTW_REDFT00' kinds (which are not defined).
 | ||
| 
 | ||
| Arguments
 | ||
| .........
 | ||
| 
 | ||
|    * 'rank' is the dimensionality of the transform (it should be the
 | ||
|      size of the arrays '*n' and '*kind'), and can be any non-negative
 | ||
|      integer.  The '_1d', '_2d', and '_3d' planners correspond to a
 | ||
|      'rank' of '1', '2', and '3', respectively.  A 'rank' of zero is
 | ||
|      equivalent to a copy of one number from input to output.
 | ||
| 
 | ||
|    * 'n', or 'n0'/'n1'/'n2', or 'n[rank]', respectively, gives the
 | ||
|      (physical) size of the transform dimensions.  They can be any
 | ||
|      positive integer.
 | ||
| 
 | ||
|         - Multi-dimensional arrays are stored in row-major order with
 | ||
|           dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
 | ||
|           'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
 | ||
|           Format::.
 | ||
|         - FFTW is generally best at handling sizes of the form 2^a 3^b
 | ||
|           5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other
 | ||
|           exponents are arbitrary.  Other sizes are computed by means of
 | ||
|           a slow, general-purpose algorithm (which nevertheless retains
 | ||
|           O(n log n) performance even for prime sizes).  (It is possible
 | ||
|           to customize FFTW for different array sizes; see *note
 | ||
|           Installation and Customization::.)  Transforms whose sizes are
 | ||
|           powers of 2 are especially fast.
 | ||
|         - For a 'REDFT00' or 'RODFT00' transform kind in a dimension of
 | ||
|           size n, it is n-1 or n+1, respectively, that should be
 | ||
|           factorizable in the above form.
 | ||
| 
 | ||
|    * 'in' and 'out' point to the input and output arrays of the
 | ||
|      transform, which may be the same (yielding an in-place transform).
 | ||
|      These arrays are overwritten during planning, unless
 | ||
|      'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
 | ||
|      initialized, but they must be allocated.)
 | ||
| 
 | ||
|    * 'kind', or 'kind0'/'kind1'/'kind2', or 'kind[rank]', is the kind of
 | ||
|      r2r transform used for the corresponding dimension.  The valid kind
 | ||
|      constants are described in *note Real-to-Real Transform Kinds::.
 | ||
|      In a multi-dimensional transform, what is computed is the separable
 | ||
|      product formed by taking each transform kind along the
 | ||
|      corresponding dimension, one dimension after another.
 | ||
| 
 | ||
|    * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
|      defined in *note Planner Flags::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Real-to-Real Transform Kinds,  Prev: Real-to-Real Transforms,  Up: Basic Interface
 | ||
| 
 | ||
| 4.3.6 Real-to-Real Transform Kinds
 | ||
| ----------------------------------
 | ||
| 
 | ||
| FFTW currently supports 11 different r2r transform kinds, specified by
 | ||
| one of the constants below.  For the precise definitions of these
 | ||
| transforms, see *note What FFTW Really Computes::.  For a more
 | ||
| colloquial introduction to these transform kinds, see *note More DFTs of
 | ||
| Real Data::.
 | ||
| 
 | ||
|    For dimension of size 'n', there is a corresponding "logical"
 | ||
| dimension 'N' that determines the normalization (and the optimal
 | ||
| factorization); the formula for 'N' is given for each kind below.  Also,
 | ||
| with each transform kind is listed its corrsponding inverse transform.
 | ||
| FFTW computes unnormalized transforms: a transform followed by its
 | ||
| inverse will result in the original data multiplied by 'N' (or the
 | ||
| product of the 'N''s for each dimension, in multi-dimensions).
 | ||
| 
 | ||
|    * 'FFTW_R2HC' computes a real-input DFT with output in "halfcomplex"
 | ||
|      format, i.e.  real and imaginary parts for a transform of size 'n'
 | ||
|      stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical
 | ||
|      'N=n', inverse is 'FFTW_HC2R'.)
 | ||
| 
 | ||
|    * 'FFTW_HC2R' computes the reverse of 'FFTW_R2HC', above.  (Logical
 | ||
|      'N=n', inverse is 'FFTW_R2HC'.)
 | ||
| 
 | ||
|    * 'FFTW_DHT' computes a discrete Hartley transform.  (Logical 'N=n',
 | ||
|      inverse is 'FFTW_DHT'.)
 | ||
| 
 | ||
|    * 'FFTW_REDFT00' computes an REDFT00 transform, i.e.  a DCT-I.
 | ||
|      (Logical 'N=2*(n-1)', inverse is 'FFTW_REDFT00'.)
 | ||
| 
 | ||
|    * 'FFTW_REDFT10' computes an REDFT10 transform, i.e.  a DCT-II
 | ||
|      (sometimes called "the" DCT). (Logical 'N=2*n', inverse is
 | ||
|      'FFTW_REDFT01'.)
 | ||
| 
 | ||
|    * 'FFTW_REDFT01' computes an REDFT01 transform, i.e.  a DCT-III
 | ||
|      (sometimes called "the" IDCT, being the inverse of DCT-II).
 | ||
|      (Logical 'N=2*n', inverse is 'FFTW_REDFT=10'.)
 | ||
| 
 | ||
|    * 'FFTW_REDFT11' computes an REDFT11 transform, i.e.  a DCT-IV.
 | ||
|      (Logical 'N=2*n', inverse is 'FFTW_REDFT11'.)
 | ||
| 
 | ||
|    * 'FFTW_RODFT00' computes an RODFT00 transform, i.e.  a DST-I.
 | ||
|      (Logical 'N=2*(n+1)', inverse is 'FFTW_RODFT00'.)
 | ||
| 
 | ||
|    * 'FFTW_RODFT10' computes an RODFT10 transform, i.e.  a DST-II.
 | ||
|      (Logical 'N=2*n', inverse is 'FFTW_RODFT01'.)
 | ||
| 
 | ||
|    * 'FFTW_RODFT01' computes an RODFT01 transform, i.e.  a DST-III.
 | ||
|      (Logical 'N=2*n', inverse is 'FFTW_RODFT=10'.)
 | ||
| 
 | ||
|    * 'FFTW_RODFT11' computes an RODFT11 transform, i.e.  a DST-IV.
 | ||
|      (Logical 'N=2*n', inverse is 'FFTW_RODFT11'.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Advanced Interface,  Next: Guru Interface,  Prev: Basic Interface,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.4 Advanced Interface
 | ||
| ======================
 | ||
| 
 | ||
| FFTW's "advanced" interface supplements the basic interface with four
 | ||
| new planner routines, providing a new level of flexibility: you can plan
 | ||
| a transform of multiple arrays simultaneously, operate on non-contiguous
 | ||
| (strided) data, and transform a subset of a larger multi-dimensional
 | ||
| array.  Other than these additional features, the planner operates in
 | ||
| the same fashion as in the basic interface, and the resulting
 | ||
| 'fftw_plan' is used in the same way (*note Using Plans::).
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Advanced Complex DFTs::
 | ||
| * Advanced Real-data DFTs::
 | ||
| * Advanced Real-to-real Transforms::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Advanced Complex DFTs,  Next: Advanced Real-data DFTs,  Prev: Advanced Interface,  Up: Advanced Interface
 | ||
| 
 | ||
| 4.4.1 Advanced Complex DFTs
 | ||
| ---------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
 | ||
|                                   fftw_complex *in, const int *inembed,
 | ||
|                                   int istride, int idist,
 | ||
|                                   fftw_complex *out, const int *onembed,
 | ||
|                                   int ostride, int odist,
 | ||
|                                   int sign, unsigned flags);
 | ||
| 
 | ||
|    This routine plans multiple multidimensional complex DFTs, and it
 | ||
| extends the 'fftw_plan_dft' routine (*note Complex DFTs::) to compute
 | ||
| 'howmany' transforms, each having rank 'rank' and size 'n'.  In
 | ||
| addition, the transform data need not be contiguous, but it may be laid
 | ||
| out in memory with an arbitrary stride.  To account for these
 | ||
| possibilities, 'fftw_plan_many_dft' adds the new parameters 'howmany',
 | ||
| {'i','o'}'nembed', {'i','o'}'stride', and {'i','o'}'dist'.  The FFTW
 | ||
| basic interface (*note Complex DFTs::) provides routines specialized for
 | ||
| ranks 1, 2, and 3, but the advanced interface handles only the
 | ||
| general-rank case.
 | ||
| 
 | ||
|    'howmany' is the (nonnegative) number of transforms to compute.  The
 | ||
| resulting plan computes 'howmany' transforms, where the input of the
 | ||
| 'k'-th transform is at location 'in+k*idist' (in C pointer arithmetic),
 | ||
| and its output is at location 'out+k*odist'.  Plans obtained in this way
 | ||
| can often be faster than calling FFTW multiple times for the individual
 | ||
| transforms.  The basic 'fftw_plan_dft' interface corresponds to
 | ||
| 'howmany=1' (in which case the 'dist' parameters are ignored).
 | ||
| 
 | ||
|    Each of the 'howmany' transforms has rank 'rank' and size 'n', as in
 | ||
| the basic interface.  In addition, the advanced interface allows the
 | ||
| input and output arrays of each transform to be row-major subarrays of
 | ||
| larger rank-'rank' arrays, described by 'inembed' and 'onembed'
 | ||
| parameters, respectively.  {'i','o'}'nembed' must be arrays of length
 | ||
| 'rank', and 'n' should be elementwise less than or equal to
 | ||
| {'i','o'}'nembed'.  Passing 'NULL' for an 'nembed' parameter is
 | ||
| equivalent to passing 'n' (i.e.  same physical and logical dimensions,
 | ||
| as in the basic interface.)
 | ||
| 
 | ||
|    The 'stride' parameters indicate that the 'j'-th element of the input
 | ||
| or output arrays is located at 'j*istride' or 'j*ostride', respectively.
 | ||
| (For a multi-dimensional array, 'j' is the ordinary row-major index.)
 | ||
| When combined with the 'k'-th transform in a 'howmany' loop, from above,
 | ||
| this means that the ('j','k')-th element is at 'j*stride+k*dist'.  (The
 | ||
| basic 'fftw_plan_dft' interface corresponds to a stride of 1.)
 | ||
| 
 | ||
|    For in-place transforms, the input and output 'stride' and 'dist'
 | ||
| parameters should be the same; otherwise, the planner may return 'NULL'.
 | ||
| 
 | ||
|    Arrays 'n', 'inembed', and 'onembed' are not used after this function
 | ||
| returns.  You can safely free or reuse them.
 | ||
| 
 | ||
|    *Examples*: One transform of one 5 by 6 array contiguous in memory:
 | ||
|         int rank = 2;
 | ||
|         int n[] = {5, 6};
 | ||
|         int howmany = 1;
 | ||
|         int idist = odist = 0; /* unused because howmany = 1 */
 | ||
|         int istride = ostride = 1; /* array is contiguous in memory */
 | ||
|         int *inembed = n, *onembed = n;
 | ||
| 
 | ||
|    Transform of three 5 by 6 arrays, each contiguous in memory, stored
 | ||
| in memory one after another:
 | ||
|         int rank = 2;
 | ||
|         int n[] = {5, 6};
 | ||
|         int howmany = 3;
 | ||
|         int idist = odist = n[0]*n[1]; /* = 30, the distance in memory
 | ||
|                                           between the first element
 | ||
|                                           of the first array and the
 | ||
|                                           first element of the second array */
 | ||
|         int istride = ostride = 1; /* array is contiguous in memory */
 | ||
|         int *inembed = n, *onembed = n;
 | ||
| 
 | ||
|    Transform each column of a 2d array with 10 rows and 3 columns:
 | ||
|         int rank = 1; /* not 2: we are computing 1d transforms */
 | ||
|         int n[] = {10}; /* 1d transforms of length 10 */
 | ||
|         int howmany = 3;
 | ||
|         int idist = odist = 1;
 | ||
|         int istride = ostride = 3; /* distance between two elements in
 | ||
|                                       the same column */
 | ||
|         int *inembed = n, *onembed = n;
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Advanced Real-data DFTs,  Next: Advanced Real-to-real Transforms,  Prev: Advanced Complex DFTs,  Up: Advanced Interface
 | ||
| 
 | ||
| 4.4.2 Advanced Real-data DFTs
 | ||
| -----------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
 | ||
|                                       double *in, const int *inembed,
 | ||
|                                       int istride, int idist,
 | ||
|                                       fftw_complex *out, const int *onembed,
 | ||
|                                       int ostride, int odist,
 | ||
|                                       unsigned flags);
 | ||
|      fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
 | ||
|                                       fftw_complex *in, const int *inembed,
 | ||
|                                       int istride, int idist,
 | ||
|                                       double *out, const int *onembed,
 | ||
|                                       int ostride, int odist,
 | ||
|                                       unsigned flags);
 | ||
| 
 | ||
|    Like 'fftw_plan_many_dft', these two functions add 'howmany',
 | ||
| 'nembed', 'stride', and 'dist' parameters to the 'fftw_plan_dft_r2c' and
 | ||
| 'fftw_plan_dft_c2r' functions, but otherwise behave the same as the
 | ||
| basic interface.
 | ||
| 
 | ||
|    The interpretation of 'howmany', 'stride', and 'dist' are the same as
 | ||
| for 'fftw_plan_many_dft', above.  Note that the 'stride' and 'dist' for
 | ||
| the real array are in units of 'double', and for the complex array are
 | ||
| in units of 'fftw_complex'.
 | ||
| 
 | ||
|    If an 'nembed' parameter is 'NULL', it is interpreted as what it
 | ||
| would be in the basic interface, as described in *note Real-data DFT
 | ||
| Array Format::.  That is, for the complex array the size is assumed to
 | ||
| be the same as 'n', but with the last dimension cut roughly in half.
 | ||
| For the real array, the size is assumed to be 'n' if the transform is
 | ||
| out-of-place, or 'n' with the last dimension "padded" if the transform
 | ||
| is in-place.
 | ||
| 
 | ||
|    If an 'nembed' parameter is non-'NULL', it is interpreted as the
 | ||
| physical size of the corresponding array, in row-major order, just as
 | ||
| for 'fftw_plan_many_dft'.  In this case, each dimension of 'nembed'
 | ||
| should be '>=' what it would be in the basic interface (e.g.  the halved
 | ||
| or padded 'n').
 | ||
| 
 | ||
|    Arrays 'n', 'inembed', and 'onembed' are not used after this function
 | ||
| returns.  You can safely free or reuse them.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Advanced Real-to-real Transforms,  Prev: Advanced Real-data DFTs,  Up: Advanced Interface
 | ||
| 
 | ||
| 4.4.3 Advanced Real-to-real Transforms
 | ||
| --------------------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
 | ||
|                                   double *in, const int *inembed,
 | ||
|                                   int istride, int idist,
 | ||
|                                   double *out, const int *onembed,
 | ||
|                                   int ostride, int odist,
 | ||
|                                   const fftw_r2r_kind *kind, unsigned flags);
 | ||
| 
 | ||
|    Like 'fftw_plan_many_dft', this functions adds 'howmany', 'nembed',
 | ||
| 'stride', and 'dist' parameters to the 'fftw_plan_r2r' function, but
 | ||
| otherwise behave the same as the basic interface.  The interpretation of
 | ||
| those additional parameters are the same as for 'fftw_plan_many_dft'.
 | ||
| (Of course, the 'stride' and 'dist' parameters are now in units of
 | ||
| 'double', not 'fftw_complex'.)
 | ||
| 
 | ||
|    Arrays 'n', 'inembed', 'onembed', and 'kind' are not used after this
 | ||
| function returns.  You can safely free or reuse them.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Guru Interface,  Next: New-array Execute Functions,  Prev: Advanced Interface,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.5 Guru Interface
 | ||
| ==================
 | ||
| 
 | ||
| The "guru" interface to FFTW is intended to expose as much as possible
 | ||
| of the flexibility in the underlying FFTW architecture.  It allows one
 | ||
| to compute multi-dimensional "vectors" (loops) of multi-dimensional
 | ||
| transforms, where each vector/transform dimension has an independent
 | ||
| size and stride.  One can also use more general complex-number formats,
 | ||
| e.g.  separate real and imaginary arrays.
 | ||
| 
 | ||
|    For those users who require the flexibility of the guru interface, it
 | ||
| is important that they pay special attention to the documentation lest
 | ||
| they shoot themselves in the foot.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Interleaved and split arrays::
 | ||
| * Guru vector and transform sizes::
 | ||
| * Guru Complex DFTs::
 | ||
| * Guru Real-data DFTs::
 | ||
| * Guru Real-to-real Transforms::
 | ||
| * 64-bit Guru Interface::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Interleaved and split arrays,  Next: Guru vector and transform sizes,  Prev: Guru Interface,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.1 Interleaved and split arrays
 | ||
| ----------------------------------
 | ||
| 
 | ||
| The guru interface supports two representations of complex numbers,
 | ||
| which we call the interleaved and the split format.
 | ||
| 
 | ||
|    The "interleaved" format is the same one used by the basic and
 | ||
| advanced interfaces, and it is documented in *note Complex numbers::.
 | ||
| In the interleaved format, you provide pointers to the real part of a
 | ||
| complex number, and the imaginary part understood to be stored in the
 | ||
| next memory location.
 | ||
| 
 | ||
|    The "split" format allows separate pointers to the real and imaginary
 | ||
| parts of a complex array.
 | ||
| 
 | ||
|    Technically, the interleaved format is redundant, because you can
 | ||
| always express an interleaved array in terms of a split array with
 | ||
| appropriate pointers and strides.  On the other hand, the interleaved
 | ||
| format is simpler to use, and it is common in practice.  Hence, FFTW
 | ||
| supports it as a special case.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Guru vector and transform sizes,  Next: Guru Complex DFTs,  Prev: Interleaved and split arrays,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.2 Guru vector and transform sizes
 | ||
| -------------------------------------
 | ||
| 
 | ||
| The guru interface introduces one basic new data structure,
 | ||
| 'fftw_iodim', that is used to specify sizes and strides for
 | ||
| multi-dimensional transforms and vectors:
 | ||
| 
 | ||
|      typedef struct {
 | ||
|           int n;
 | ||
|           int is;
 | ||
|           int os;
 | ||
|      } fftw_iodim;
 | ||
| 
 | ||
|    Here, 'n' is the size of the dimension, and 'is' and 'os' are the
 | ||
| strides of that dimension for the input and output arrays.  (The stride
 | ||
| is the separation of consecutive elements along this dimension.)
 | ||
| 
 | ||
|    The meaning of the stride parameter depends on the type of the array
 | ||
| that the stride refers to.  _If the array is interleaved complex,
 | ||
| strides are expressed in units of complex numbers ('fftw_complex').  If
 | ||
| the array is split complex or real, strides are expressed in units of
 | ||
| real numbers ('double')._  This convention is consistent with the usual
 | ||
| pointer arithmetic in the C language.  An interleaved array is denoted
 | ||
| by a pointer 'p' to 'fftw_complex', so that 'p+1' points to the next
 | ||
| complex number.  Split arrays are denoted by pointers to 'double', in
 | ||
| which case pointer arithmetic operates in units of 'sizeof(double)'.
 | ||
| 
 | ||
|    The guru planner interfaces all take a ('rank', 'dims[rank]') pair
 | ||
| describing the transform size, and a ('howmany_rank',
 | ||
| 'howmany_dims[howmany_rank]') pair describing the "vector" size (a
 | ||
| multi-dimensional loop of transforms to perform), where 'dims' and
 | ||
| 'howmany_dims' are arrays of 'fftw_iodim'.  Each 'n' field must be
 | ||
| positive for 'dims' and nonnegative for 'howmany_dims', while both
 | ||
| 'rank' and 'howmany_rank' must be nonnegative.
 | ||
| 
 | ||
|    For example, the 'howmany' parameter in the advanced complex-DFT
 | ||
| interface corresponds to 'howmany_rank' = 1, 'howmany_dims[0].n' =
 | ||
| 'howmany', 'howmany_dims[0].is' = 'idist', and 'howmany_dims[0].os' =
 | ||
| 'odist'.  (To compute a single transform, you can just use
 | ||
| 'howmany_rank' = 0.)
 | ||
| 
 | ||
|    A row-major multidimensional array with dimensions 'n[rank]' (*note
 | ||
| Row-major Format::) corresponds to 'dims[i].n' = 'n[i]' and the
 | ||
| recurrence 'dims[i].is' = 'n[i+1] * dims[i+1].is' (similarly for 'os').
 | ||
| The stride of the last ('i=rank-1') dimension is the overall stride of
 | ||
| the array.  e.g.  to be equivalent to the advanced complex-DFT
 | ||
| interface, you would have 'dims[rank-1].is' = 'istride' and
 | ||
| 'dims[rank-1].os' = 'ostride'.
 | ||
| 
 | ||
|    In general, we only guarantee FFTW to return a non-'NULL' plan if the
 | ||
| vector and transform dimensions correspond to a set of distinct indices,
 | ||
| and for in-place transforms the input/output strides should be the same.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Guru Complex DFTs,  Next: Guru Real-data DFTs,  Prev: Guru vector and transform sizes,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.3 Guru Complex DFTs
 | ||
| -----------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_dft(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           fftw_complex *in, fftw_complex *out,
 | ||
|           int sign, unsigned flags);
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_split_dft(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           double *ri, double *ii, double *ro, double *io,
 | ||
|           unsigned flags);
 | ||
| 
 | ||
|    These two functions plan a complex-data, multi-dimensional DFT for
 | ||
| the interleaved and split format, respectively.  Transform dimensions
 | ||
| are given by ('rank', 'dims') over a multi-dimensional vector (loop) of
 | ||
| dimensions ('howmany_rank', 'howmany_dims').  'dims' and 'howmany_dims'
 | ||
| should point to 'fftw_iodim' arrays of length 'rank' and 'howmany_rank',
 | ||
| respectively.
 | ||
| 
 | ||
|    'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
| defined in *note Planner Flags::.
 | ||
| 
 | ||
|    In the 'fftw_plan_guru_dft' function, the pointers 'in' and 'out'
 | ||
| point to the interleaved input and output arrays, respectively.  The
 | ||
| sign can be either -1 (= 'FFTW_FORWARD') or +1 (= 'FFTW_BACKWARD').  If
 | ||
| the pointers are equal, the transform is in-place.
 | ||
| 
 | ||
|    In the 'fftw_plan_guru_split_dft' function, 'ri' and 'ii' point to
 | ||
| the real and imaginary input arrays, and 'ro' and 'io' point to the real
 | ||
| and imaginary output arrays.  The input and output pointers may be the
 | ||
| same, indicating an in-place transform.  For example, for 'fftw_complex'
 | ||
| pointers 'in' and 'out', the corresponding parameters are:
 | ||
| 
 | ||
|      ri = (double *) in;
 | ||
|      ii = (double *) in + 1;
 | ||
|      ro = (double *) out;
 | ||
|      io = (double *) out + 1;
 | ||
| 
 | ||
|    Because 'fftw_plan_guru_split_dft' accepts split arrays, strides are
 | ||
| expressed in units of 'double'.  For a contiguous 'fftw_complex' array,
 | ||
| the overall stride of the transform should be 2, the distance between
 | ||
| consecutive real parts or between consecutive imaginary parts; see *note
 | ||
| Guru vector and transform sizes::.  Note that the dimension strides are
 | ||
| applied equally to the real and imaginary parts; real and imaginary
 | ||
| arrays with different strides are not supported.
 | ||
| 
 | ||
|    There is no 'sign' parameter in 'fftw_plan_guru_split_dft'.  This
 | ||
| function always plans for an 'FFTW_FORWARD' transform.  To plan for an
 | ||
| 'FFTW_BACKWARD' transform, you can exploit the identity that the
 | ||
| backwards DFT is equal to the forwards DFT with the real and imaginary
 | ||
| parts swapped.  For example, in the case of the 'fftw_complex' arrays
 | ||
| above, the 'FFTW_BACKWARD' transform is computed by the parameters:
 | ||
| 
 | ||
|      ri = (double *) in + 1;
 | ||
|      ii = (double *) in;
 | ||
|      ro = (double *) out + 1;
 | ||
|      io = (double *) out;
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Guru Real-data DFTs,  Next: Guru Real-to-real Transforms,  Prev: Guru Complex DFTs,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.4 Guru Real-data DFTs
 | ||
| -------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_dft_r2c(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           double *in, fftw_complex *out,
 | ||
|           unsigned flags);
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_split_dft_r2c(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           double *in, double *ro, double *io,
 | ||
|           unsigned flags);
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_dft_c2r(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           fftw_complex *in, double *out,
 | ||
|           unsigned flags);
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_split_dft_c2r(
 | ||
|           int rank, const fftw_iodim *dims,
 | ||
|           int howmany_rank, const fftw_iodim *howmany_dims,
 | ||
|           double *ri, double *ii, double *out,
 | ||
|           unsigned flags);
 | ||
| 
 | ||
|    Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
 | ||
| with transform dimensions given by ('rank', 'dims') over a
 | ||
| multi-dimensional vector (loop) of dimensions ('howmany_rank',
 | ||
| 'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
 | ||
| arrays of length 'rank' and 'howmany_rank', respectively.  As for the
 | ||
| basic and advanced interfaces, an r2c transform is 'FFTW_FORWARD' and a
 | ||
| c2r transform is 'FFTW_BACKWARD'.
 | ||
| 
 | ||
|    The _last_ dimension of 'dims' is interpreted specially: that
 | ||
| dimension of the real array has size 'dims[rank-1].n', but that
 | ||
| dimension of the complex array has size 'dims[rank-1].n/2+1' (division
 | ||
| rounded down).  The strides, on the other hand, are taken to be exactly
 | ||
| as specified.  It is up to the user to specify the strides appropriately
 | ||
| for the peculiar dimensions of the data, and we do not guarantee that
 | ||
| the planner will succeed (return non-'NULL') for any dimensions other
 | ||
| than those described in *note Real-data DFT Array Format:: and
 | ||
| generalized in *note Advanced Real-data DFTs::.  (That is, for an
 | ||
| in-place transform, each individual dimension should be able to operate
 | ||
| in place.)
 | ||
| 
 | ||
|    'in' and 'out' point to the input and output arrays for r2c and c2r
 | ||
| transforms, respectively.  For split arrays, 'ri' and 'ii' point to the
 | ||
| real and imaginary input arrays for a c2r transform, and 'ro' and 'io'
 | ||
| point to the real and imaginary output arrays for an r2c transform.
 | ||
| 'in' and 'ro' or 'ri' and 'out' may be the same, indicating an in-place
 | ||
| transform.  (In-place transforms where 'in' and 'io' or 'ii' and 'out'
 | ||
| are the same are not currently supported.)
 | ||
| 
 | ||
|    'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
| defined in *note Planner Flags::.
 | ||
| 
 | ||
|    In-place transforms of rank greater than 1 are currently only
 | ||
| supported for interleaved arrays.  For split arrays, the planner will
 | ||
| return 'NULL'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Guru Real-to-real Transforms,  Next: 64-bit Guru Interface,  Prev: Guru Real-data DFTs,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.5 Guru Real-to-real Transforms
 | ||
| ----------------------------------
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
 | ||
|                                   int howmany_rank,
 | ||
|                                   const fftw_iodim *howmany_dims,
 | ||
|                                   double *in, double *out,
 | ||
|                                   const fftw_r2r_kind *kind,
 | ||
|                                   unsigned flags);
 | ||
| 
 | ||
|    Plan a real-to-real (r2r) multi-dimensional 'FFTW_FORWARD' transform
 | ||
| with transform dimensions given by ('rank', 'dims') over a
 | ||
| multi-dimensional vector (loop) of dimensions ('howmany_rank',
 | ||
| 'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
 | ||
| arrays of length 'rank' and 'howmany_rank', respectively.
 | ||
| 
 | ||
|    The transform kind of each dimension is given by the 'kind'
 | ||
| parameter, which should point to an array of length 'rank'.  Valid
 | ||
| 'fftw_r2r_kind' constants are given in *note Real-to-Real Transform
 | ||
| Kinds::.
 | ||
| 
 | ||
|    'in' and 'out' point to the real input and output arrays; they may be
 | ||
| the same, indicating an in-place transform.
 | ||
| 
 | ||
|    'flags' is a bitwise OR ('|') of zero or more planner flags, as
 | ||
| defined in *note Planner Flags::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: 64-bit Guru Interface,  Prev: Guru Real-to-real Transforms,  Up: Guru Interface
 | ||
| 
 | ||
| 4.5.6 64-bit Guru Interface
 | ||
| ---------------------------
 | ||
| 
 | ||
| When compiled in 64-bit mode on a 64-bit architecture (where addresses
 | ||
| are 64 bits wide), FFTW uses 64-bit quantities internally for all
 | ||
| transform sizes, strides, and so on--you don't have to do anything
 | ||
| special to exploit this.  However, in the ordinary FFTW interfaces, you
 | ||
| specify the transform size by an 'int' quantity, which is normally only
 | ||
| 32 bits wide.  This means that, even though FFTW is using 64-bit sizes
 | ||
| internally, you cannot specify a single transform dimension larger than
 | ||
| 2^31-1 numbers.
 | ||
| 
 | ||
|    We expect that few users will require transforms larger than this,
 | ||
| but, for those who do, we provide a 64-bit version of the guru interface
 | ||
| in which all sizes are specified as integers of type 'ptrdiff_t' instead
 | ||
| of 'int'.  ('ptrdiff_t' is a signed integer type defined by the C
 | ||
| standard to be wide enough to represent address differences, and thus
 | ||
| must be at least 64 bits wide on a 64-bit machine.)  We stress that
 | ||
| there is _no performance advantage_ to using this interface--the same
 | ||
| internal FFTW code is employed regardless--and it is only necessary if
 | ||
| you want to specify very large transform sizes.
 | ||
| 
 | ||
|    In particular, the 64-bit guru interface is a set of planner routines
 | ||
| that are exactly the same as the guru planner routines, except that they
 | ||
| are named with 'guru64' instead of 'guru' and they take arguments of
 | ||
| type 'fftw_iodim64' instead of 'fftw_iodim'.  For example, instead of
 | ||
| 'fftw_plan_guru_dft', we have 'fftw_plan_guru64_dft'.
 | ||
| 
 | ||
|      fftw_plan fftw_plan_guru64_dft(
 | ||
|           int rank, const fftw_iodim64 *dims,
 | ||
|           int howmany_rank, const fftw_iodim64 *howmany_dims,
 | ||
|           fftw_complex *in, fftw_complex *out,
 | ||
|           int sign, unsigned flags);
 | ||
| 
 | ||
|    The 'fftw_iodim64' type is similar to 'fftw_iodim', with the same
 | ||
| interpretation, except that it uses type 'ptrdiff_t' instead of type
 | ||
| 'int'.
 | ||
| 
 | ||
|      typedef struct {
 | ||
|           ptrdiff_t n;
 | ||
|           ptrdiff_t is;
 | ||
|           ptrdiff_t os;
 | ||
|      } fftw_iodim64;
 | ||
| 
 | ||
|    Every other 'fftw_plan_guru' function also has a 'fftw_plan_guru64'
 | ||
| equivalent, but we do not repeat their documentation here since they are
 | ||
| identical to the 32-bit versions except as noted above.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: New-array Execute Functions,  Next: Wisdom,  Prev: Guru Interface,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.6 New-array Execute Functions
 | ||
| ===============================
 | ||
| 
 | ||
| Normally, one executes a plan for the arrays with which the plan was
 | ||
| created, by calling 'fftw_execute(plan)' as described in *note Using
 | ||
| Plans::.  However, it is possible for sophisticated users to apply a
 | ||
| given plan to a _different_ array using the "new-array execute"
 | ||
| functions detailed below, provided that the following conditions are
 | ||
| met:
 | ||
| 
 | ||
|    * The array size, strides, etcetera are the same (since those are set
 | ||
|      by the plan).
 | ||
| 
 | ||
|    * The input and output arrays are the same (in-place) or different
 | ||
|      (out-of-place) if the plan was originally created to be in-place or
 | ||
|      out-of-place, respectively.
 | ||
| 
 | ||
|    * For split arrays, the separations between the real and imaginary
 | ||
|      parts, 'ii-ri' and 'io-ro', are the same as they were for the input
 | ||
|      and output arrays when the plan was created.  (This condition is
 | ||
|      automatically satisfied for interleaved arrays.)
 | ||
| 
 | ||
|    * The "alignment" of the new input/output arrays is the same as that
 | ||
|      of the input/output arrays when the plan was created, unless the
 | ||
|      plan was created with the 'FFTW_UNALIGNED' flag.  Here, the
 | ||
|      alignment is a platform-dependent quantity (for example, it is the
 | ||
|      address modulo 16 if SSE SIMD instructions are used, but the
 | ||
|      address modulo 4 for non-SIMD single-precision FFTW on the same
 | ||
|      machine).  In general, only arrays allocated with 'fftw_malloc' are
 | ||
|      guaranteed to be equally aligned (*note SIMD alignment and
 | ||
|      fftw_malloc::).
 | ||
| 
 | ||
|    The alignment issue is especially critical, because if you don't use
 | ||
| 'fftw_malloc' then you may have little control over the alignment of
 | ||
| arrays in memory.  For example, neither the C++ 'new' function nor the
 | ||
| Fortran 'allocate' statement provide strong enough guarantees about data
 | ||
| alignment.  If you don't use 'fftw_malloc', therefore, you probably have
 | ||
| to use 'FFTW_UNALIGNED' (which disables most SIMD support).  If
 | ||
| possible, it is probably better for you to simply create multiple plans
 | ||
| (creating a new plan is quick once one exists for a given size), or
 | ||
| better yet re-use the same array for your transforms.
 | ||
| 
 | ||
|    For rare circumstances in which you cannot control the alignment of
 | ||
| allocated memory, but wish to determine where a given array is aligned
 | ||
| like the original array for which a plan was created, you can use the
 | ||
| 'fftw_alignment_of' function:
 | ||
|      int fftw_alignment_of(double *p);
 | ||
|    Two arrays have equivalent alignment (for the purposes of applying a
 | ||
| plan) if and only if 'fftw_alignment_of' returns the same value for the
 | ||
| corresponding pointers to their data (typecast to 'double*' if
 | ||
| necessary).
 | ||
| 
 | ||
|    If you are tempted to use the new-array execute interface because you
 | ||
| want to transform a known bunch of arrays of the same size, you should
 | ||
| probably go use the advanced interface instead (*note Advanced
 | ||
| Interface::)).
 | ||
| 
 | ||
|    The new-array execute functions are:
 | ||
| 
 | ||
|      void fftw_execute_dft(
 | ||
|           const fftw_plan p,
 | ||
|           fftw_complex *in, fftw_complex *out);
 | ||
| 
 | ||
|      void fftw_execute_split_dft(
 | ||
|           const fftw_plan p,
 | ||
|           double *ri, double *ii, double *ro, double *io);
 | ||
| 
 | ||
|      void fftw_execute_dft_r2c(
 | ||
|           const fftw_plan p,
 | ||
|           double *in, fftw_complex *out);
 | ||
| 
 | ||
|      void fftw_execute_split_dft_r2c(
 | ||
|           const fftw_plan p,
 | ||
|           double *in, double *ro, double *io);
 | ||
| 
 | ||
|      void fftw_execute_dft_c2r(
 | ||
|           const fftw_plan p,
 | ||
|           fftw_complex *in, double *out);
 | ||
| 
 | ||
|      void fftw_execute_split_dft_c2r(
 | ||
|           const fftw_plan p,
 | ||
|           double *ri, double *ii, double *out);
 | ||
| 
 | ||
|      void fftw_execute_r2r(
 | ||
|           const fftw_plan p,
 | ||
|           double *in, double *out);
 | ||
| 
 | ||
|    These execute the 'plan' to compute the corresponding transform on
 | ||
| the input/output arrays specified by the subsequent arguments.  The
 | ||
| input/output array arguments have the same meanings as the ones passed
 | ||
| to the guru planner routines in the preceding sections.  The 'plan' is
 | ||
| not modified, and these routines can be called as many times as desired,
 | ||
| or intermixed with calls to the ordinary 'fftw_execute'.
 | ||
| 
 | ||
|    The 'plan' _must_ have been created for the transform type
 | ||
| corresponding to the execute function, e.g.  it must be a complex-DFT
 | ||
| plan for 'fftw_execute_dft'.  Any of the planner routines for that
 | ||
| transform type, from the basic to the guru interface, could have been
 | ||
| used to create the plan, however.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom,  Next: What FFTW Really Computes,  Prev: New-array Execute Functions,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.7 Wisdom
 | ||
| ==========
 | ||
| 
 | ||
| This section documents the FFTW mechanism for saving and restoring plans
 | ||
| from disk.  This mechanism is called "wisdom".
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Wisdom Export::
 | ||
| * Wisdom Import::
 | ||
| * Forgetting Wisdom::
 | ||
| * Wisdom Utilities::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom Export,  Next: Wisdom Import,  Prev: Wisdom,  Up: Wisdom
 | ||
| 
 | ||
| 4.7.1 Wisdom Export
 | ||
| -------------------
 | ||
| 
 | ||
|      int fftw_export_wisdom_to_filename(const char *filename);
 | ||
|      void fftw_export_wisdom_to_file(FILE *output_file);
 | ||
|      char *fftw_export_wisdom_to_string(void);
 | ||
|      void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
 | ||
| 
 | ||
|    These functions allow you to export all currently accumulated wisdom
 | ||
| in a form from which it can be later imported and restored, even during
 | ||
| a separate run of the program.  (*Note Words of Wisdom-Saving Plans::.)
 | ||
| The current store of wisdom is not affected by calling any of these
 | ||
| routines.
 | ||
| 
 | ||
|    'fftw_export_wisdom' exports the wisdom to any output medium, as
 | ||
| specified by the callback function 'write_char'.  'write_char' is a
 | ||
| 'putc'-like function that writes the character 'c' to some output; its
 | ||
| second parameter is the 'data' pointer passed to 'fftw_export_wisdom'.
 | ||
| For convenience, the following three "wrapper" routines are provided:
 | ||
| 
 | ||
|    'fftw_export_wisdom_to_filename' writes wisdom to a file named
 | ||
| 'filename' (which is created or overwritten), returning '1' on success
 | ||
| and '0' on failure.  A lower-level function, which requires you to open
 | ||
| and close the file yourself (e.g.  if you want to write wisdom to a
 | ||
| portion of a larger file) is 'fftw_export_wisdom_to_file'.  This writes
 | ||
| the wisdom to the current position in 'output_file', which should be
 | ||
| open with write permission; upon exit, the file remains open and is
 | ||
| positioned at the end of the wisdom data.
 | ||
| 
 | ||
|    'fftw_export_wisdom_to_string' returns a pointer to a
 | ||
| 'NULL'-terminated string holding the wisdom data.  This string is
 | ||
| dynamically allocated, and it is the responsibility of the caller to
 | ||
| deallocate it with 'free' when it is no longer needed.
 | ||
| 
 | ||
|    All of these routines export the wisdom in the same format, which we
 | ||
| will not document here except to say that it is LISP-like ASCII text
 | ||
| that is insensitive to white space.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom Import,  Next: Forgetting Wisdom,  Prev: Wisdom Export,  Up: Wisdom
 | ||
| 
 | ||
| 4.7.2 Wisdom Import
 | ||
| -------------------
 | ||
| 
 | ||
|      int fftw_import_system_wisdom(void);
 | ||
|      int fftw_import_wisdom_from_filename(const char *filename);
 | ||
|      int fftw_import_wisdom_from_string(const char *input_string);
 | ||
|      int fftw_import_wisdom(int (*read_char)(void *), void *data);
 | ||
| 
 | ||
|    These functions import wisdom into a program from data stored by the
 | ||
| 'fftw_export_wisdom' functions above.  (*Note Words of Wisdom-Saving
 | ||
| Plans::.)  The imported wisdom replaces any wisdom already accumulated
 | ||
| by the running program.
 | ||
| 
 | ||
|    'fftw_import_wisdom' imports wisdom from any input medium, as
 | ||
| specified by the callback function 'read_char'.  'read_char' is a
 | ||
| 'getc'-like function that returns the next character in the input; its
 | ||
| parameter is the 'data' pointer passed to 'fftw_import_wisdom'.  If the
 | ||
| end of the input data is reached (which should never happen for valid
 | ||
| data), 'read_char' should return 'EOF' (as defined in '<stdio.h>').  For
 | ||
| convenience, the following three "wrapper" routines are provided:
 | ||
| 
 | ||
|    'fftw_import_wisdom_from_filename' reads wisdom from a file named
 | ||
| 'filename'.  A lower-level function, which requires you to open and
 | ||
| close the file yourself (e.g.  if you want to read wisdom from a portion
 | ||
| of a larger file) is 'fftw_import_wisdom_from_file'.  This reads wisdom
 | ||
| from the current position in 'input_file' (which should be open with
 | ||
| read permission); upon exit, the file remains open, but the position of
 | ||
| the read pointer is unspecified.
 | ||
| 
 | ||
|    'fftw_import_wisdom_from_string' reads wisdom from the
 | ||
| 'NULL'-terminated string 'input_string'.
 | ||
| 
 | ||
|    'fftw_import_system_wisdom' reads wisdom from an
 | ||
| implementation-defined standard file ('/etc/fftw/wisdom' on Unix and GNU
 | ||
| systems).
 | ||
| 
 | ||
|    The return value of these import routines is '1' if the wisdom was
 | ||
| read successfully and '0' otherwise.  Note that, in all of these
 | ||
| functions, any data in the input stream past the end of the wisdom data
 | ||
| is simply ignored.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Forgetting Wisdom,  Next: Wisdom Utilities,  Prev: Wisdom Import,  Up: Wisdom
 | ||
| 
 | ||
| 4.7.3 Forgetting Wisdom
 | ||
| -----------------------
 | ||
| 
 | ||
|      void fftw_forget_wisdom(void);
 | ||
| 
 | ||
|    Calling 'fftw_forget_wisdom' causes all accumulated 'wisdom' to be
 | ||
| discarded and its associated memory to be freed.  (New 'wisdom' can
 | ||
| still be gathered subsequently, however.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom Utilities,  Prev: Forgetting Wisdom,  Up: Wisdom
 | ||
| 
 | ||
| 4.7.4 Wisdom Utilities
 | ||
| ----------------------
 | ||
| 
 | ||
| FFTW includes two standalone utility programs that deal with wisdom.  We
 | ||
| merely summarize them here, since they come with their own 'man' pages
 | ||
| for Unix and GNU systems (with HTML versions on our web site).
 | ||
| 
 | ||
|    The first program is 'fftw-wisdom' (or 'fftwf-wisdom' in single
 | ||
| precision, etcetera), which can be used to create a wisdom file
 | ||
| containing plans for any of the transform sizes and types supported by
 | ||
| FFTW. It is preferable to create wisdom directly from your executable
 | ||
| (*note Caveats in Using Wisdom::), but this program is useful for
 | ||
| creating global wisdom files for 'fftw_import_system_wisdom'.
 | ||
| 
 | ||
|    The second program is 'fftw-wisdom-to-conf', which takes a wisdom
 | ||
| file as input and produces a "configuration routine" as output.  The
 | ||
| latter is a C subroutine that you can compile and link into your
 | ||
| program, replacing a routine of the same name in the FFTW library, that
 | ||
| determines which parts of FFTW are callable by your program.
 | ||
| 'fftw-wisdom-to-conf' produces a configuration routine that links to
 | ||
| only those parts of FFTW needed by the saved plans in the wisdom,
 | ||
| greatly reducing the size of statically linked executables (which should
 | ||
| only attempt to create plans corresponding to those in the wisdom,
 | ||
| however).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: What FFTW Really Computes,  Prev: Wisdom,  Up: FFTW Reference
 | ||
| 
 | ||
| 4.8 What FFTW Really Computes
 | ||
| =============================
 | ||
| 
 | ||
| In this section, we provide precise mathematical definitions for the
 | ||
| transforms that FFTW computes.  These transform definitions are fairly
 | ||
| standard, but some authors follow slightly different conventions for the
 | ||
| normalization of the transform (the constant factor in front) and the
 | ||
| sign of the complex exponent.  We begin by presenting the
 | ||
| one-dimensional (1d) transform definitions, and then give the
 | ||
| straightforward extension to multi-dimensional transforms.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * The 1d Discrete Fourier Transform (DFT)::
 | ||
| * The 1d Real-data DFT::
 | ||
| * 1d Real-even DFTs (DCTs)::
 | ||
| * 1d Real-odd DFTs (DSTs)::
 | ||
| * 1d Discrete Hartley Transforms (DHTs)::
 | ||
| * Multi-dimensional Transforms::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: The 1d Discrete Fourier Transform (DFT),  Next: The 1d Real-data DFT,  Prev: What FFTW Really Computes,  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.1 The 1d Discrete Fourier Transform (DFT)
 | ||
| ---------------------------------------------
 | ||
| 
 | ||
| The forward ('FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d
 | ||
| complex array X of size n computes an array Y, where:
 | ||
|  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
 | ||
|    The backward ('FFTW_BACKWARD') DFT computes:
 | ||
|  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
 | ||
| 
 | ||
|    FFTW computes an unnormalized transform, in that there is no
 | ||
| coefficient in front of the summation in the DFT. In other words,
 | ||
| applying the forward and then the backward transform will multiply the
 | ||
| input by n.
 | ||
| 
 | ||
|    From above, an 'FFTW_FORWARD' transform corresponds to a sign of -1
 | ||
| in the exponent of the DFT. Note also that we use the standard
 | ||
| "in-order" output ordering--the k-th output corresponds to the frequency
 | ||
| k/n (or k/T, where T is your total sampling period).  For those who like
 | ||
| to think in terms of positive and negative frequencies, this means that
 | ||
| the positive frequencies are stored in the first half of the output and
 | ||
| the negative frequencies are stored in backwards order in the second
 | ||
| half of the output.  (The frequency -k/n is the same as the frequency
 | ||
| (n-k)/n.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: The 1d Real-data DFT,  Next: 1d Real-even DFTs (DCTs),  Prev: The 1d Discrete Fourier Transform (DFT),  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.2 The 1d Real-data DFT
 | ||
| --------------------------
 | ||
| 
 | ||
| The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of
 | ||
| the size 'n' real array X, exactly as defined above, i.e.
 | ||
|  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
 | ||
|    This output array Y can easily be shown to possess the "Hermitian"
 | ||
| symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] =
 | ||
| Y[0].
 | ||
| 
 | ||
|    As a result of this symmetry, half of the output Y is redundant
 | ||
| (being the complex conjugate of the other half), and so the 1d r2c
 | ||
| transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
 | ||
| where the division by 2 is rounded down.
 | ||
| 
 | ||
|    Moreover, the Hermitian symmetry implies that Y[0] and, if n is even,
 | ||
| the Y[n/2] element, are purely real.  So, for the 'R2HC' r2r transform,
 | ||
| the halfcomplex format does not store the imaginary parts of these
 | ||
| elements.
 | ||
| 
 | ||
|    The c2r and 'H2RC' r2r transforms compute the backward DFT of the
 | ||
| _complex_ array X with Hermitian symmetry, stored in the r2c/'R2HC'
 | ||
| output formats, respectively, where the backward transform is defined
 | ||
| exactly as for the complex case:
 | ||
|  Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
 | ||
|    The outputs 'Y' of this transform can easily be seen to be purely
 | ||
| real, and are stored as an array of real numbers.
 | ||
| 
 | ||
|    Like FFTW's complex DFT, these transforms are unnormalized.  In other
 | ||
| words, applying the real-to-complex (forward) and then the
 | ||
| complex-to-real (backward) transform will multiply the input by n.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: 1d Real-even DFTs (DCTs),  Next: 1d Real-odd DFTs (DSTs),  Prev: The 1d Real-data DFT,  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.3 1d Real-even DFTs (DCTs)
 | ||
| ------------------------------
 | ||
| 
 | ||
| The Real-even symmetry DFTs in FFTW are exactly equivalent to the
 | ||
| unnormalized forward (and backward) DFTs as defined above, where the
 | ||
| input array X of length N is purely real and is also "even" symmetry.
 | ||
| In this case, the output array is likewise real and even symmetry.
 | ||
| 
 | ||
|    For the case of 'REDFT00', this even symmetry means that X[j] =
 | ||
| X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
 | ||
| this redundancy, only the first n real numbers are actually stored,
 | ||
| where N = 2(n-1).
 | ||
| 
 | ||
|    The proper definition of even symmetry for 'REDFT10', 'REDFT01', and
 | ||
| 'REDFT11' transforms is somewhat more intricate because of the shifts by
 | ||
| 1/2 of the input and/or output, although the corresponding boundary
 | ||
| conditions are given in *note Real even/odd DFTs (cosine/sine
 | ||
| transforms)::.  Because of the even symmetry, however, the sine terms in
 | ||
| the DFT all cancel and the remaining cosine terms are written explicitly
 | ||
| below.  This formulation often leads people to call such a transform a
 | ||
| "discrete cosine transform" (DCT), although it is really just a special
 | ||
| case of the DFT.
 | ||
| 
 | ||
|    In each of the definitions below, we transform a real array X of
 | ||
| length n to a real array Y of length n:
 | ||
| 
 | ||
| REDFT00 (DCT-I)
 | ||
| ...............
 | ||
| 
 | ||
| An 'REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0] +
 | ||
| (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
 | ||
| Note that this transform is not defined for n=1.  For n=2, the summation
 | ||
| term above is dropped as you might expect.
 | ||
| 
 | ||
| REDFT10 (DCT-II)
 | ||
| ................
 | ||
| 
 | ||
| An 'REDFT10' transform (type-II DCT, sometimes called "the" DCT) in FFTW
 | ||
| is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k /
 | ||
| n)).
 | ||
| 
 | ||
| REDFT01 (DCT-III)
 | ||
| .................
 | ||
| 
 | ||
| An 'REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] = X[0]
 | ||
| + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)).  In the case
 | ||
| of n=1, this reduces to Y[0] = X[0].  Up to a scale factor (see below),
 | ||
| this is the inverse of 'REDFT10' ("the" DCT), and so the 'REDFT01'
 | ||
| (DCT-III) is sometimes called the "IDCT".
 | ||
| 
 | ||
| REDFT11 (DCT-IV)
 | ||
| ................
 | ||
| 
 | ||
| An 'REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
 | ||
| (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
 | ||
| 
 | ||
| Inverses and Normalization
 | ||
| ..........................
 | ||
| 
 | ||
| These definitions correspond directly to the unnormalized DFTs used
 | ||
| elsewhere in FFTW (hence the factors of 2 in front of the summations).
 | ||
| The unnormalized inverse of 'REDFT00' is 'REDFT00', of 'REDFT10' is
 | ||
| 'REDFT01' and vice versa, and of 'REDFT11' is 'REDFT11'.  Each
 | ||
| unnormalized inverse results in the original array multiplied by N,
 | ||
| where N is the _logical_ DFT size.  For 'REDFT00', N=2(n-1) (note that
 | ||
| n=1 is not defined); otherwise, N=2n.
 | ||
| 
 | ||
|    In defining the discrete cosine transform, some authors also include
 | ||
| additional factors of sqrt(2) (or its inverse) multiplying selected
 | ||
| inputs and/or outputs.  This is a mostly cosmetic change that makes the
 | ||
| transform orthogonal, but sacrifices the direct equivalence to a
 | ||
| symmetric DFT.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: 1d Real-odd DFTs (DSTs),  Next: 1d Discrete Hartley Transforms (DHTs),  Prev: 1d Real-even DFTs (DCTs),  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.4 1d Real-odd DFTs (DSTs)
 | ||
| -----------------------------
 | ||
| 
 | ||
| The Real-odd symmetry DFTs in FFTW are exactly equivalent to the
 | ||
| unnormalized forward (and backward) DFTs as defined above, where the
 | ||
| input array X of length N is purely real and is also "odd" symmetry.  In
 | ||
| this case, the output is odd symmetry and purely imaginary.
 | ||
| 
 | ||
|    For the case of 'RODFT00', this odd symmetry means that X[j] =
 | ||
| -X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
 | ||
| this redundancy, only the first n real numbers starting at j=1 are
 | ||
| actually stored (the j=0 element is zero), where N = 2(n+1).
 | ||
| 
 | ||
|    The proper definition of odd symmetry for 'RODFT10', 'RODFT01', and
 | ||
| 'RODFT11' transforms is somewhat more intricate because of the shifts by
 | ||
| 1/2 of the input and/or output, although the corresponding boundary
 | ||
| conditions are given in *note Real even/odd DFTs (cosine/sine
 | ||
| transforms)::.  Because of the odd symmetry, however, the cosine terms
 | ||
| in the DFT all cancel and the remaining sine terms are written
 | ||
| explicitly below.  This formulation often leads people to call such a
 | ||
| transform a "discrete sine transform" (DST), although it is really just
 | ||
| a special case of the DFT.
 | ||
| 
 | ||
|    In each of the definitions below, we transform a real array X of
 | ||
| length n to a real array Y of length n:
 | ||
| 
 | ||
| RODFT00 (DST-I)
 | ||
| ...............
 | ||
| 
 | ||
| An 'RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2 (sum
 | ||
| for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
 | ||
| 
 | ||
| RODFT10 (DST-II)
 | ||
| ................
 | ||
| 
 | ||
| An 'RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
 | ||
| (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
 | ||
| 
 | ||
| RODFT01 (DST-III)
 | ||
| .................
 | ||
| 
 | ||
| An 'RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
 | ||
| (-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
 | ||
| n)).  In the case of n=1, this reduces to Y[0] = X[0].
 | ||
| 
 | ||
| RODFT11 (DST-IV)
 | ||
| ................
 | ||
| 
 | ||
| An 'RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
 | ||
| (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
 | ||
| 
 | ||
| Inverses and Normalization
 | ||
| ..........................
 | ||
| 
 | ||
| These definitions correspond directly to the unnormalized DFTs used
 | ||
| elsewhere in FFTW (hence the factors of 2 in front of the summations).
 | ||
| The unnormalized inverse of 'RODFT00' is 'RODFT00', of 'RODFT10' is
 | ||
| 'RODFT01' and vice versa, and of 'RODFT11' is 'RODFT11'.  Each
 | ||
| unnormalized inverse results in the original array multiplied by N,
 | ||
| where N is the _logical_ DFT size.  For 'RODFT00', N=2(n+1); otherwise,
 | ||
| N=2n.
 | ||
| 
 | ||
|    In defining the discrete sine transform, some authors also include
 | ||
| additional factors of sqrt(2) (or its inverse) multiplying selected
 | ||
| inputs and/or outputs.  This is a mostly cosmetic change that makes the
 | ||
| transform orthogonal, but sacrifices the direct equivalence to an
 | ||
| antisymmetric DFT.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: 1d Discrete Hartley Transforms (DHTs),  Next: Multi-dimensional Transforms,  Prev: 1d Real-odd DFTs (DSTs),  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.5 1d Discrete Hartley Transforms (DHTs)
 | ||
| -------------------------------------------
 | ||
| 
 | ||
| The discrete Hartley transform (DHT) of a 1d real array X of size n
 | ||
| computes a real array Y of the same size, where:
 | ||
| Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
 | ||
| 
 | ||
|    FFTW computes an unnormalized transform, in that there is no
 | ||
| coefficient in front of the summation in the DHT. In other words,
 | ||
| applying the transform twice (the DHT is its own inverse) will multiply
 | ||
| the input by n.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Multi-dimensional Transforms,  Prev: 1d Discrete Hartley Transforms (DHTs),  Up: What FFTW Really Computes
 | ||
| 
 | ||
| 4.8.6 Multi-dimensional Transforms
 | ||
| ----------------------------------
 | ||
| 
 | ||
| The multi-dimensional transforms of FFTW, in general, compute simply the
 | ||
| separable product of the given 1d transform along each dimension of the
 | ||
| array.  Since each of these transforms is unnormalized, computing the
 | ||
| forward followed by the backward/inverse multi-dimensional transform
 | ||
| will result in the original array scaled by the product of the
 | ||
| normalization factors for each dimension (e.g.  the product of the
 | ||
| dimension sizes, for a multi-dimensional DFT).
 | ||
| 
 | ||
|    The definition of FFTW's multi-dimensional DFT of real data (r2c)
 | ||
| deserves special attention.  In this case, we logically compute the full
 | ||
| multi-dimensional DFT of the input data; since the input data are purely
 | ||
| real, the output data have the Hermitian symmetry and therefore only one
 | ||
| non-redundant half need be stored.  More specifically, for an n[0] x
 | ||
| n[1] x n[2] x ...  x n[d-1] multi-dimensional real-input DFT, the full
 | ||
| (logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the
 | ||
| symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ...,
 | ||
| n[d-1] - k[d-1]]* (where each dimension is periodic).  Because of this
 | ||
| symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the _last_
 | ||
| dimension (division by 2 is rounded down).  (We could instead have cut
 | ||
| any other dimension in half, but the last dimension proved
 | ||
| computationally convenient.)  This results in the peculiar array format
 | ||
| described in more detail by *note Real-data DFT Array Format::.
 | ||
| 
 | ||
|    The multi-dimensional c2r transform is simply the unnormalized
 | ||
| inverse of the r2c transform.  i.e.  it is the same as FFTW's complex
 | ||
| backward multi-dimensional DFT, operating on a Hermitian input array in
 | ||
| the peculiar format mentioned above and outputting a real array (since
 | ||
| the DFT output is purely real).
 | ||
| 
 | ||
|    We should remind the user that the separable product of 1d transforms
 | ||
| along each dimension, as computed by FFTW, is not always the same thing
 | ||
| as the usual multi-dimensional transform.  A multi-dimensional 'R2HC'
 | ||
| (or 'HC2R') transform is not identical to the multi-dimensional DFT,
 | ||
| requiring some post-processing to combine the requisite real and
 | ||
| imaginary parts, as was described in *note The Halfcomplex-format DFT::.
 | ||
| Likewise, FFTW's multidimensional 'FFTW_DHT' r2r transform is not the
 | ||
| same thing as the logical multi-dimensional discrete Hartley transform
 | ||
| defined in the literature, as discussed in *note The Discrete Hartley
 | ||
| Transform::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Multi-threaded FFTW,  Next: Distributed-memory FFTW with MPI,  Prev: FFTW Reference,  Up: Top
 | ||
| 
 | ||
| 5 Multi-threaded FFTW
 | ||
| *********************
 | ||
| 
 | ||
| In this chapter we document the parallel FFTW routines for shared-memory
 | ||
| parallel hardware.  These routines, which support parallel one- and
 | ||
| multi-dimensional transforms of both real and complex data, are the
 | ||
| easiest way to take advantage of multiple processors with FFTW. They
 | ||
| work just like the corresponding uniprocessor transform routines, except
 | ||
| that you have an extra initialization routine to call, and there is a
 | ||
| routine to set the number of threads to employ.  Any program that uses
 | ||
| the uniprocessor FFTW can therefore be trivially modified to use the
 | ||
| multi-threaded FFTW.
 | ||
| 
 | ||
|    A shared-memory machine is one in which all CPUs can directly access
 | ||
| the same main memory, and such machines are now common due to the
 | ||
| ubiquity of multi-core CPUs.  FFTW's multi-threading support allows you
 | ||
| to utilize these additional CPUs transparently from a single program.
 | ||
| However, this does not necessarily translate into performance
 | ||
| gains--when multiple threads/CPUs are employed, there is an overhead
 | ||
| required for synchronization that may outweigh the computatational
 | ||
| parallelism.  Therefore, you can only benefit from threads if your
 | ||
| problem is sufficiently large.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Installation and Supported Hardware/Software::
 | ||
| * Usage of Multi-threaded FFTW::
 | ||
| * How Many Threads to Use?::
 | ||
| * Thread safety::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Installation and Supported Hardware/Software,  Next: Usage of Multi-threaded FFTW,  Prev: Multi-threaded FFTW,  Up: Multi-threaded FFTW
 | ||
| 
 | ||
| 5.1 Installation and Supported Hardware/Software
 | ||
| ================================================
 | ||
| 
 | ||
| All of the FFTW threads code is located in the 'threads' subdirectory of
 | ||
| the FFTW package.  On Unix systems, the FFTW threads libraries and
 | ||
| header files can be automatically configured, compiled, and installed
 | ||
| along with the uniprocessor FFTW libraries simply by including
 | ||
| '--enable-threads' in the flags to the 'configure' script (*note
 | ||
| Installation on Unix::), or '--enable-openmp' to use OpenMP
 | ||
| (http://www.openmp.org) threads.
 | ||
| 
 | ||
|    The threads routines require your operating system to have some sort
 | ||
| of shared-memory threads support.  Specifically, the FFTW threads
 | ||
| package works with POSIX threads (available on most Unix variants, from
 | ||
| GNU/Linux to MacOS X) and Win32 threads.  OpenMP threads, which are
 | ||
| supported in many common compilers (e.g.  gcc) are also supported, and
 | ||
| may give better performance on some systems.  (OpenMP threads are also
 | ||
| useful if you are employing OpenMP in your own code, in order to
 | ||
| minimize conflicts between threading models.)  If you have a
 | ||
| shared-memory machine that uses a different threads API, it should be a
 | ||
| simple matter of programming to include support for it; see the file
 | ||
| 'threads/threads.c' for more detail.
 | ||
| 
 | ||
|    You can compile FFTW with _both_ '--enable-threads' and
 | ||
| '--enable-openmp' at the same time, since they install libraries with
 | ||
| different names ('fftw3_threads' and 'fftw3_omp', as described below).
 | ||
| However, your programs may only link to _one_ of these two libraries at
 | ||
| a time.
 | ||
| 
 | ||
|    Ideally, of course, you should also have multiple processors in order
 | ||
| to get any benefit from the threaded transforms.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Usage of Multi-threaded FFTW,  Next: How Many Threads to Use?,  Prev: Installation and Supported Hardware/Software,  Up: Multi-threaded FFTW
 | ||
| 
 | ||
| 5.2 Usage of Multi-threaded FFTW
 | ||
| ================================
 | ||
| 
 | ||
| Here, it is assumed that the reader is already familiar with the usage
 | ||
| of the uniprocessor FFTW routines, described elsewhere in this manual.
 | ||
| We only describe what one has to change in order to use the
 | ||
| multi-threaded routines.
 | ||
| 
 | ||
|    First, programs using the parallel complex transforms should be
 | ||
| linked with '-lfftw3_threads -lfftw3 -lm' on Unix, or '-lfftw3_omp
 | ||
| -lfftw3 -lm' if you compiled with OpenMP. You will also need to link
 | ||
| with whatever library is responsible for threads on your system (e.g.
 | ||
| '-lpthread' on GNU/Linux) or include whatever compiler flag enables
 | ||
| OpenMP (e.g.  '-fopenmp' with gcc).
 | ||
| 
 | ||
|    Second, before calling _any_ FFTW routines, you should call the
 | ||
| function:
 | ||
| 
 | ||
|      int fftw_init_threads(void);
 | ||
| 
 | ||
|    This function, which need only be called once, performs any one-time
 | ||
| initialization required to use threads on your system.  It returns zero
 | ||
| if there was some error (which should not happen under normal
 | ||
| circumstances) and a non-zero value otherwise.
 | ||
| 
 | ||
|    Third, before creating a plan that you want to parallelize, you
 | ||
| should call:
 | ||
| 
 | ||
|      void fftw_plan_with_nthreads(int nthreads);
 | ||
| 
 | ||
|    The 'nthreads' argument indicates the number of threads you want FFTW
 | ||
| to use (or actually, the maximum number).  All plans subsequently
 | ||
| created with any planner routine will use that many threads.  You can
 | ||
| call 'fftw_plan_with_nthreads', create some plans, call
 | ||
| 'fftw_plan_with_nthreads' again with a different argument, and create
 | ||
| some more plans for a new number of threads.  Plans already created
 | ||
| before a call to 'fftw_plan_with_nthreads' are unaffected.  If you pass
 | ||
| an 'nthreads' argument of '1' (the default), threads are disabled for
 | ||
| subsequent plans.
 | ||
| 
 | ||
|    You can determine the current number of threads that the planner can
 | ||
| use by calling:
 | ||
| 
 | ||
|      int fftw_planner_nthreads(void);
 | ||
| 
 | ||
|    With OpenMP, to configure FFTW to use all of the currently running
 | ||
| OpenMP threads (set by 'omp_set_num_threads(nthreads)' or by the
 | ||
| 'OMP_NUM_THREADS' environment variable), you can do:
 | ||
| 'fftw_plan_with_nthreads(omp_get_max_threads())'.  (The 'omp_' OpenMP
 | ||
| functions are declared via '#include <omp.h>'.)
 | ||
| 
 | ||
|    Given a plan, you then execute it as usual with 'fftw_execute(plan)',
 | ||
| and the execution will use the number of threads specified when the plan
 | ||
| was created.  When done, you destroy it as usual with
 | ||
| 'fftw_destroy_plan'.  As described in *note Thread safety::, plan
 | ||
| _execution_ is thread-safe, but plan creation and destruction are _not_:
 | ||
| you should create/destroy plans only from a single thread, but can
 | ||
| safely execute multiple plans in parallel.
 | ||
| 
 | ||
|    There is one additional routine: if you want to get rid of all memory
 | ||
| and other resources allocated internally by FFTW, you can call:
 | ||
| 
 | ||
|      void fftw_cleanup_threads(void);
 | ||
| 
 | ||
|    which is much like the 'fftw_cleanup()' function except that it also
 | ||
| gets rid of threads-related data.  You must _not_ execute any previously
 | ||
| created plans after calling this function.
 | ||
| 
 | ||
|    We should also mention one other restriction: if you save wisdom from
 | ||
| a program using the multi-threaded FFTW, that wisdom _cannot be used_ by
 | ||
| a program using only the single-threaded FFTW (i.e.  not calling
 | ||
| 'fftw_init_threads').  *Note Words of Wisdom-Saving Plans::.
 | ||
| 
 | ||
|    Finally, FFTW provides a optional callback interface that allows you
 | ||
| to replace its parallel threading backend at runtime:
 | ||
| 
 | ||
|      void fftw_threads_set_callback(
 | ||
|          void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data),
 | ||
|          void *data);
 | ||
| 
 | ||
|    This routine (which is _not_ threadsafe and should generally be
 | ||
| called before creating any FFTW plans) allows you to provide a function
 | ||
| 'parallel_loop' that executes parallel work for FFTW: it should call the
 | ||
| function 'work(jobdata + elsize*i)' for 'i' from '0' to 'njobs-1',
 | ||
| possibly in parallel.  (The 'data' pointer supplied to
 | ||
| 'fftw_threads_set_callback' is passed through to your 'parallel_loop'
 | ||
| function.)  For example, if you link to an FFTW threads library built to
 | ||
| use POSIX threads, but you want it to use OpenMP instead (because you
 | ||
| are using OpenMP elsewhere in your program and want to avoid competing
 | ||
| threads), you can call 'fftw_threads_set_callback' with the callback
 | ||
| function:
 | ||
| 
 | ||
|      void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data)
 | ||
|      {
 | ||
|      #pragma omp parallel for
 | ||
|          for (int i = 0; i < njobs; ++i)
 | ||
|              work(jobdata + elsize * i);
 | ||
|      }
 | ||
| 
 | ||
|    The same mechanism could be used in order to make FFTW use a
 | ||
| threading backend implemented via Intel TBB, Apple GCD, or Cilk, for
 | ||
| example.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: How Many Threads to Use?,  Next: Thread safety,  Prev: Usage of Multi-threaded FFTW,  Up: Multi-threaded FFTW
 | ||
| 
 | ||
| 5.3 How Many Threads to Use?
 | ||
| ============================
 | ||
| 
 | ||
| There is a fair amount of overhead involved in synchronizing threads, so
 | ||
| the optimal number of threads to use depends upon the size of the
 | ||
| transform as well as on the number of processors you have.
 | ||
| 
 | ||
|    As a general rule, you don't want to use more threads than you have
 | ||
| processors.  (Using more threads will work, but there will be extra
 | ||
| overhead with no benefit.)  In fact, if the problem size is too small,
 | ||
| you may want to use fewer threads than you have processors.
 | ||
| 
 | ||
|    You will have to experiment with your system to see what level of
 | ||
| parallelization is best for your problem size.  Typically, the problem
 | ||
| will have to involve at least a few thousand data points before threads
 | ||
| become beneficial.  If you plan with 'FFTW_PATIENT', it will
 | ||
| automatically disable threads for sizes that don't benefit from
 | ||
| parallelization.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Thread safety,  Prev: How Many Threads to Use?,  Up: Multi-threaded FFTW
 | ||
| 
 | ||
| 5.4 Thread safety
 | ||
| =================
 | ||
| 
 | ||
| Users writing multi-threaded programs (including OpenMP) must concern
 | ||
| themselves with the "thread safety" of the libraries they use--that is,
 | ||
| whether it is safe to call routines in parallel from multiple threads.
 | ||
| FFTW can be used in such an environment, but some care must be taken
 | ||
| because the planner routines share data (e.g.  wisdom and trigonometric
 | ||
| tables) between calls and plans.
 | ||
| 
 | ||
|    The upshot is that the only thread-safe routine in FFTW is
 | ||
| 'fftw_execute' (and the new-array variants thereof).  All other routines
 | ||
| (e.g.  the planner) should only be called from one thread at a time.
 | ||
| So, for example, you can wrap a semaphore lock around any calls to the
 | ||
| planner; even more simply, you can just create all of your plans from
 | ||
| one thread.  We do not think this should be an important restriction
 | ||
| (FFTW is designed for the situation where the only performance-sensitive
 | ||
| code is the actual execution of the transform), and the benefits of
 | ||
| shared data between plans are great.
 | ||
| 
 | ||
|    Note also that, since the plan is not modified by 'fftw_execute', it
 | ||
| is safe to execute the _same plan_ in parallel by multiple threads.
 | ||
| However, since a given plan operates by default on a fixed array, you
 | ||
| need to use one of the new-array execute functions (*note New-array
 | ||
| Execute Functions::) so that different threads compute the transform of
 | ||
| different data.
 | ||
| 
 | ||
|    (Users should note that these comments only apply to programs using
 | ||
| shared-memory threads or OpenMP. Parallelism using MPI or forked
 | ||
| processes involves a separate address-space and global variables for
 | ||
| each process, and is not susceptible to problems of this sort.)
 | ||
| 
 | ||
|    The FFTW planner is intended to be called from a single thread.  If
 | ||
| you really must call it from multiple threads, you are expected to grab
 | ||
| whatever lock makes sense for your application, with the understanding
 | ||
| that you may be holding that lock for a long time, which is undesirable.
 | ||
| 
 | ||
|    Neither strategy works, however, in the following situation.  The
 | ||
| "application" is structured as a set of "plugins" which are unaware of
 | ||
| each other, and for whatever reason the "plugins" cannot coordinate on
 | ||
| grabbing the lock.  (This is not a technical problem, but an
 | ||
| organizational one.  The "plugins" are written by independent agents,
 | ||
| and from the perspective of each plugin's author, each plugin is using
 | ||
| FFTW correctly from a single thread.)  To cope with this situation,
 | ||
| starting from FFTW-3.3.5, FFTW supports an API to make the planner
 | ||
| thread-safe:
 | ||
| 
 | ||
|      void fftw_make_planner_thread_safe(void);
 | ||
| 
 | ||
|    This call operates by brute force: It just installs a hook that wraps
 | ||
| a lock (chosen by us) around all planner calls.  So there is no magic
 | ||
| and you get the worst of all worlds.  The planner is still
 | ||
| single-threaded, but you cannot choose which lock to use.  The planner
 | ||
| still holds the lock for a long time, but you cannot impose a timeout on
 | ||
| lock acquisition.  As of FFTW-3.3.5 and FFTW-3.3.6, this call does not
 | ||
| work when using OpenMP as threading substrate.  (Suggestions on what to
 | ||
| do about this bug are welcome.)  _Do not use
 | ||
| 'fftw_make_planner_thread_safe' unless there is no other choice,_ such
 | ||
| as in the application/plugin situation.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Distributed-memory FFTW with MPI,  Next: Calling FFTW from Modern Fortran,  Prev: Multi-threaded FFTW,  Up: Top
 | ||
| 
 | ||
| 6 Distributed-memory FFTW with MPI
 | ||
| **********************************
 | ||
| 
 | ||
| In this chapter we document the parallel FFTW routines for parallel
 | ||
| systems supporting the MPI message-passing interface.  Unlike the
 | ||
| shared-memory threads described in the previous chapter, MPI allows you
 | ||
| to use _distributed-memory_ parallelism, where each CPU has its own
 | ||
| separate memory, and which can scale up to clusters of many thousands of
 | ||
| processors.  This capability comes at a price, however: each process
 | ||
| only stores a _portion_ of the data to be transformed, which means that
 | ||
| the data structures and programming-interface are quite different from
 | ||
| the serial or threads versions of FFTW.
 | ||
| 
 | ||
|    Distributed-memory parallelism is especially useful when you are
 | ||
| transforming arrays so large that they do not fit into the memory of a
 | ||
| single processor.  The storage per-process required by FFTW's MPI
 | ||
| routines is proportional to the total array size divided by the number
 | ||
| of processes.  Conversely, distributed-memory parallelism can easily
 | ||
| pose an unacceptably high communications overhead for small problems;
 | ||
| the threshold problem size for which parallelism becomes advantageous
 | ||
| will depend on the precise problem you are interested in, your hardware,
 | ||
| and your MPI implementation.
 | ||
| 
 | ||
|    A note on terminology: in MPI, you divide the data among a set of
 | ||
| "processes" which each run in their own memory address space.
 | ||
| Generally, each process runs on a different physical processor, but this
 | ||
| is not required.  A set of processes in MPI is described by an opaque
 | ||
| data structure called a "communicator," the most common of which is the
 | ||
| predefined communicator 'MPI_COMM_WORLD' which refers to _all_
 | ||
| processes.  For more information on these and other concepts common to
 | ||
| all MPI programs, we refer the reader to the documentation at the MPI
 | ||
| home page (http://www.mcs.anl.gov/research/projects/mpi/).
 | ||
| 
 | ||
|    We assume in this chapter that the reader is familiar with the usage
 | ||
| of the serial (uniprocessor) FFTW, and focus only on the concepts new to
 | ||
| the MPI interface.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * FFTW MPI Installation::
 | ||
| * Linking and Initializing MPI FFTW::
 | ||
| * 2d MPI example::
 | ||
| * MPI Data Distribution::
 | ||
| * Multi-dimensional MPI DFTs of Real Data::
 | ||
| * Other Multi-dimensional Real-data MPI Transforms::
 | ||
| * FFTW MPI Transposes::
 | ||
| * FFTW MPI Wisdom::
 | ||
| * Avoiding MPI Deadlocks::
 | ||
| * FFTW MPI Performance Tips::
 | ||
| * Combining MPI and Threads::
 | ||
| * FFTW MPI Reference::
 | ||
| * FFTW MPI Fortran Interface::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Installation,  Next: Linking and Initializing MPI FFTW,  Prev: Distributed-memory FFTW with MPI,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.1 FFTW MPI Installation
 | ||
| =========================
 | ||
| 
 | ||
| All of the FFTW MPI code is located in the 'mpi' subdirectory of the
 | ||
| FFTW package.  On Unix systems, the FFTW MPI libraries and header files
 | ||
| are automatically configured, compiled, and installed along with the
 | ||
| uniprocessor FFTW libraries simply by including '--enable-mpi' in the
 | ||
| flags to the 'configure' script (*note Installation on Unix::).
 | ||
| 
 | ||
|    Any implementation of the MPI standard, version 1 or later, should
 | ||
| work with FFTW. The 'configure' script will attempt to automatically
 | ||
| detect how to compile and link code using your MPI implementation.  In
 | ||
| some cases, especially if you have multiple different MPI
 | ||
| implementations installed or have an unusual MPI software package, you
 | ||
| may need to provide this information explicitly.
 | ||
| 
 | ||
|    Most commonly, one compiles MPI code by invoking a special compiler
 | ||
| command, typically 'mpicc' for C code.  The 'configure' script knows the
 | ||
| most common names for this command, but you can specify the MPI
 | ||
| compilation command explicitly by setting the 'MPICC' variable, as in
 | ||
| './configure MPICC=mpicc ...'.
 | ||
| 
 | ||
|    If, instead of a special compiler command, you need to link a certain
 | ||
| library, you can specify the link command via the 'MPILIBS' variable, as
 | ||
| in './configure MPILIBS=-lmpi ...'.  Note that if your MPI library is
 | ||
| installed in a non-standard location (one the compiler does not know
 | ||
| about by default), you may also have to specify the location of the
 | ||
| library and header files via 'LDFLAGS' and 'CPPFLAGS' variables,
 | ||
| respectively, as in './configure LDFLAGS=-L/path/to/mpi/libs
 | ||
| CPPFLAGS=-I/path/to/mpi/include ...'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Linking and Initializing MPI FFTW,  Next: 2d MPI example,  Prev: FFTW MPI Installation,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.2 Linking and Initializing MPI FFTW
 | ||
| =====================================
 | ||
| 
 | ||
| Programs using the MPI FFTW routines should be linked with '-lfftw3_mpi
 | ||
| -lfftw3 -lm' on Unix in double precision, '-lfftw3f_mpi -lfftw3f -lm' in
 | ||
| single precision, and so on (*note Precision::).  You will also need to
 | ||
| link with whatever library is responsible for MPI on your system; in
 | ||
| most MPI implementations, there is a special compiler alias named
 | ||
| 'mpicc' to compile and link MPI code.
 | ||
| 
 | ||
|    Before calling any FFTW routines except possibly 'fftw_init_threads'
 | ||
| (*note Combining MPI and Threads::), but after calling 'MPI_Init', you
 | ||
| should call the function:
 | ||
| 
 | ||
|      void fftw_mpi_init(void);
 | ||
| 
 | ||
|    If, at the end of your program, you want to get rid of all memory and
 | ||
| other resources allocated internally by FFTW, for both the serial and
 | ||
| MPI routines, you can call:
 | ||
| 
 | ||
|      void fftw_mpi_cleanup(void);
 | ||
| 
 | ||
|    which is much like the 'fftw_cleanup()' function except that it also
 | ||
| gets rid of FFTW's MPI-related data.  You must _not_ execute any
 | ||
| previously created plans after calling this function.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: 2d MPI example,  Next: MPI Data Distribution,  Prev: Linking and Initializing MPI FFTW,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.3 2d MPI example
 | ||
| ==================
 | ||
| 
 | ||
| Before we document the FFTW MPI interface in detail, we begin with a
 | ||
| simple example outlining how one would perform a two-dimensional 'N0' by
 | ||
| 'N1' complex DFT.
 | ||
| 
 | ||
|      #include <fftw3-mpi.h>
 | ||
| 
 | ||
|      int main(int argc, char **argv)
 | ||
|      {
 | ||
|          const ptrdiff_t N0 = ..., N1 = ...;
 | ||
|          fftw_plan plan;
 | ||
|          fftw_complex *data;
 | ||
|          ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
 | ||
| 
 | ||
|          MPI_Init(&argc, &argv);
 | ||
|          fftw_mpi_init();
 | ||
| 
 | ||
|          /* get local data size and allocate */
 | ||
|          alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
 | ||
|                                               &local_n0, &local_0_start);
 | ||
|          data = fftw_alloc_complex(alloc_local);
 | ||
| 
 | ||
|          /* create plan for in-place forward DFT */
 | ||
|          plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
 | ||
|                                      FFTW_FORWARD, FFTW_ESTIMATE);
 | ||
| 
 | ||
|          /* initialize data to some function my_function(x,y) */
 | ||
|          for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
 | ||
|             data[i*N1 + j] = my_function(local_0_start + i, j);
 | ||
| 
 | ||
|          /* compute transforms, in-place, as many times as desired */
 | ||
|          fftw_execute(plan);
 | ||
| 
 | ||
|          fftw_destroy_plan(plan);
 | ||
| 
 | ||
|          MPI_Finalize();
 | ||
|      }
 | ||
| 
 | ||
|    As can be seen above, the MPI interface follows the same basic style
 | ||
| of allocate/plan/execute/destroy as the serial FFTW routines.  All of
 | ||
| the MPI-specific routines are prefixed with 'fftw_mpi_' instead of
 | ||
| 'fftw_'.  There are a few important differences, however:
 | ||
| 
 | ||
|    First, we must call 'fftw_mpi_init()' after calling 'MPI_Init'
 | ||
| (required in all MPI programs) and before calling any other 'fftw_mpi_'
 | ||
| routine.
 | ||
| 
 | ||
|    Second, when we create the plan with 'fftw_mpi_plan_dft_2d',
 | ||
| analogous to 'fftw_plan_dft_2d', we pass an additional argument: the
 | ||
| communicator, indicating which processes will participate in the
 | ||
| transform (here 'MPI_COMM_WORLD', indicating all processes).  Whenever
 | ||
| you create, execute, or destroy a plan for an MPI transform, you must
 | ||
| call the corresponding FFTW routine on _all_ processes in the
 | ||
| communicator for that transform.  (That is, these are _collective_
 | ||
| calls.)  Note that the plan for the MPI transform uses the standard
 | ||
| 'fftw_execute' and 'fftw_destroy' routines (on the other hand, there are
 | ||
| MPI-specific new-array execute functions documented below).
 | ||
| 
 | ||
|    Third, all of the FFTW MPI routines take 'ptrdiff_t' arguments
 | ||
| instead of 'int' as for the serial FFTW. 'ptrdiff_t' is a standard C
 | ||
| integer type which is (at least) 32 bits wide on a 32-bit machine and 64
 | ||
| bits wide on a 64-bit machine.  This is to make it easy to specify very
 | ||
| large parallel transforms on a 64-bit machine.  (You can specify 64-bit
 | ||
| transform sizes in the serial FFTW, too, but only by using the 'guru64'
 | ||
| planner interface.  *Note 64-bit Guru Interface::.)
 | ||
| 
 | ||
|    Fourth, and most importantly, you don't allocate the entire
 | ||
| two-dimensional array on each process.  Instead, you call
 | ||
| 'fftw_mpi_local_size_2d' to find out what _portion_ of the array resides
 | ||
| on each processor, and how much space to allocate.  Here, the portion of
 | ||
| the array on each process is a 'local_n0' by 'N1' slice of the total
 | ||
| array, starting at index 'local_0_start'.  The total number of
 | ||
| 'fftw_complex' numbers to allocate is given by the 'alloc_local' return
 | ||
| value, which _may_ be greater than 'local_n0 * N1' (in case some
 | ||
| intermediate calculations require additional storage).  The data
 | ||
| distribution in FFTW's MPI interface is described in more detail by the
 | ||
| next section.
 | ||
| 
 | ||
|    Given the portion of the array that resides on the local process, it
 | ||
| is straightforward to initialize the data (here to a function
 | ||
| 'myfunction') and otherwise manipulate it.  Of course, at the end of the
 | ||
| program you may want to output the data somehow, but synchronizing this
 | ||
| output is up to you and is beyond the scope of this manual.  (One good
 | ||
| way to output a large multi-dimensional distributed array in MPI to a
 | ||
| portable binary file is to use the free HDF5 library; see the HDF home
 | ||
| page (http://www.hdfgroup.org/).)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Data Distribution,  Next: Multi-dimensional MPI DFTs of Real Data,  Prev: 2d MPI example,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.4 MPI Data Distribution
 | ||
| =========================
 | ||
| 
 | ||
| The most important concept to understand in using FFTW's MPI interface
 | ||
| is the data distribution.  With a serial or multithreaded FFT, all of
 | ||
| the inputs and outputs are stored as a single contiguous chunk of
 | ||
| memory.  With a distributed-memory FFT, the inputs and outputs are
 | ||
| broken into disjoint blocks, one per process.
 | ||
| 
 | ||
|    In particular, FFTW uses a _1d block distribution_ of the data,
 | ||
| distributed along the _first dimension_.  For example, if you want to
 | ||
| perform a 100 x 200 complex DFT, distributed over 4 processes, each
 | ||
| process will get a 25 x 200 slice of the data.  That is, process 0 will
 | ||
| get rows 0 through 24, process 1 will get rows 25 through 49, process 2
 | ||
| will get rows 50 through 74, and process 3 will get rows 75 through 99.
 | ||
| If you take the same array but distribute it over 3 processes, then it
 | ||
| is not evenly divisible so the different processes will have unequal
 | ||
| chunks.  FFTW's default choice in this case is to assign 34 rows to
 | ||
| processes 0 and 1, and 32 rows to process 2.
 | ||
| 
 | ||
|    FFTW provides several 'fftw_mpi_local_size' routines that you can
 | ||
| call to find out what portion of an array is stored on the current
 | ||
| process.  In most cases, you should use the default block sizes picked
 | ||
| by FFTW, but it is also possible to specify your own block size.  For
 | ||
| example, with a 100 x 200 array on three processes, you can tell FFTW to
 | ||
| use a block size of 40, which would assign 40 rows to processes 0 and 1,
 | ||
| and 20 rows to process 2.  FFTW's default is to divide the data equally
 | ||
| among the processes if possible, and as best it can otherwise.  The rows
 | ||
| are always assigned in "rank order," i.e.  process 0 gets the first
 | ||
| block of rows, then process 1, and so on.  (You can change this by using
 | ||
| 'MPI_Comm_split' to create a new communicator with re-ordered
 | ||
| processes.)  However, you should always call the 'fftw_mpi_local_size'
 | ||
| routines, if possible, rather than trying to predict FFTW's distribution
 | ||
| choices.
 | ||
| 
 | ||
|    In particular, it is critical that you allocate the storage size that
 | ||
| is returned by 'fftw_mpi_local_size', which is _not_ necessarily the
 | ||
| size of the local slice of the array.  The reason is that intermediate
 | ||
| steps of FFTW's algorithms involve transposing the array and
 | ||
| redistributing the data, so at these intermediate steps FFTW may require
 | ||
| more local storage space (albeit always proportional to the total size
 | ||
| divided by the number of processes).  The 'fftw_mpi_local_size'
 | ||
| functions know how much storage is required for these intermediate steps
 | ||
| and tell you the correct amount to allocate.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Basic and advanced distribution interfaces::
 | ||
| * Load balancing::
 | ||
| * Transposed distributions::
 | ||
| * One-dimensional distributions::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Basic and advanced distribution interfaces,  Next: Load balancing,  Prev: MPI Data Distribution,  Up: MPI Data Distribution
 | ||
| 
 | ||
| 6.4.1 Basic and advanced distribution interfaces
 | ||
| ------------------------------------------------
 | ||
| 
 | ||
| As with the planner interface, the 'fftw_mpi_local_size' distribution
 | ||
| interface is broken into basic and advanced ('_many') interfaces, where
 | ||
| the latter allows you to specify the block size manually and also to
 | ||
| request block sizes when computing multiple transforms simultaneously.
 | ||
| These functions are documented more exhaustively by the FFTW MPI
 | ||
| Reference, but we summarize the basic ideas here using a couple of
 | ||
| two-dimensional examples.
 | ||
| 
 | ||
|    For the 100 x 200 complex-DFT example, above, we would find the
 | ||
| distribution by calling the following function in the basic interface:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
 | ||
|                                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 | ||
| 
 | ||
|    Given the total size of the data to be transformed (here, 'n0 = 100'
 | ||
| and 'n1 = 200') and an MPI communicator ('comm'), this function provides
 | ||
| three numbers.
 | ||
| 
 | ||
|    First, it describes the shape of the local data: the current process
 | ||
| should store a 'local_n0' by 'n1' slice of the overall dataset, in
 | ||
| row-major order ('n1' dimension contiguous), starting at index
 | ||
| 'local_0_start'.  That is, if the total dataset is viewed as a 'n0' by
 | ||
| 'n1' matrix, the current process should store the rows 'local_0_start'
 | ||
| to 'local_0_start+local_n0-1'.  Obviously, if you are running with only
 | ||
| a single MPI process, that process will store the entire array:
 | ||
| 'local_0_start' will be zero and 'local_n0' will be 'n0'.  *Note
 | ||
| Row-major Format::.
 | ||
| 
 | ||
|    Second, the return value is the total number of data elements (e.g.,
 | ||
| complex numbers for a complex DFT) that should be allocated for the
 | ||
| input and output arrays on the current process (ideally with
 | ||
| 'fftw_malloc' or an 'fftw_alloc' function, to ensure optimal alignment).
 | ||
| It might seem that this should always be equal to 'local_n0 * n1', but
 | ||
| this is _not_ the case.  FFTW's distributed FFT algorithms require data
 | ||
| redistributions at intermediate stages of the transform, and in some
 | ||
| circumstances this may require slightly larger local storage.  This is
 | ||
| discussed in more detail below, under *note Load balancing::.
 | ||
| 
 | ||
|    The advanced-interface 'local_size' function for multidimensional
 | ||
| transforms returns the same three things ('local_n0', 'local_0_start',
 | ||
| and the total number of elements to allocate), but takes more inputs:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
 | ||
|                                         ptrdiff_t howmany,
 | ||
|                                         ptrdiff_t block0,
 | ||
|                                         MPI_Comm comm,
 | ||
|                                         ptrdiff_t *local_n0,
 | ||
|                                         ptrdiff_t *local_0_start);
 | ||
| 
 | ||
|    The two-dimensional case above corresponds to 'rnk = 2' and an array
 | ||
| 'n' of length 2 with 'n[0] = n0' and 'n[1] = n1'.  This routine is for
 | ||
| any 'rnk > 1'; one-dimensional transforms have their own interface
 | ||
| because they work slightly differently, as discussed below.
 | ||
| 
 | ||
|    First, the advanced interface allows you to perform multiple
 | ||
| transforms at once, of interleaved data, as specified by the 'howmany'
 | ||
| parameter.  ('hoamany' is 1 for a single transform.)
 | ||
| 
 | ||
|    Second, here you can specify your desired block size in the 'n0'
 | ||
| dimension, 'block0'.  To use FFTW's default block size, pass
 | ||
| 'FFTW_MPI_DEFAULT_BLOCK' (0) for 'block0'.  Otherwise, on 'P' processes,
 | ||
| FFTW will return 'local_n0' equal to 'block0' on the first 'P / block0'
 | ||
| processes (rounded down), return 'local_n0' equal to 'n0 - block0 * (P /
 | ||
| block0)' on the next process, and 'local_n0' equal to zero on any
 | ||
| remaining processes.  In general, we recommend using the default block
 | ||
| size (which corresponds to 'n0 / P', rounded up).
 | ||
| 
 | ||
|    For example, suppose you have 'P = 4' processes and 'n0 = 21'.  The
 | ||
| default will be a block size of '6', which will give 'local_n0 = 6' on
 | ||
| the first three processes and 'local_n0 = 3' on the last process.
 | ||
| Instead, however, you could specify 'block0 = 5' if you wanted, which
 | ||
| would give 'local_n0 = 5' on processes 0 to 2, 'local_n0 = 6' on process
 | ||
| 3.  (This choice, while it may look superficially more "balanced," has
 | ||
| the same critical path as FFTW's default but requires more
 | ||
| communications.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Load balancing,  Next: Transposed distributions,  Prev: Basic and advanced distribution interfaces,  Up: MPI Data Distribution
 | ||
| 
 | ||
| 6.4.2 Load balancing
 | ||
| --------------------
 | ||
| 
 | ||
| Ideally, when you parallelize a transform over some P processes, each
 | ||
| process should end up with work that takes equal time.  Otherwise, all
 | ||
| of the processes end up waiting on whichever process is slowest.  This
 | ||
| goal is known as "load balancing."  In this section, we describe the
 | ||
| circumstances under which FFTW is able to load-balance well, and in
 | ||
| particular how you should choose your transform size in order to load
 | ||
| balance.
 | ||
| 
 | ||
|    Load balancing is especially difficult when you are parallelizing
 | ||
| over heterogeneous machines; for example, if one of your processors is a
 | ||
| old 486 and another is a Pentium IV, obviously you should give the
 | ||
| Pentium more work to do than the 486 since the latter is much slower.
 | ||
| FFTW does not deal with this problem, however--it assumes that your
 | ||
| processes run on hardware of comparable speed, and that the goal is
 | ||
| therefore to divide the problem as equally as possible.
 | ||
| 
 | ||
|    For a multi-dimensional complex DFT, FFTW can divide the problem
 | ||
| equally among the processes if: (i) the _first_ dimension 'n0' is
 | ||
| divisible by P; and (ii), the _product_ of the subsequent dimensions is
 | ||
| divisible by P. (For the advanced interface, where you can specify
 | ||
| multiple simultaneous transforms via some "vector" length 'howmany', a
 | ||
| factor of 'howmany' is included in the product of the subsequent
 | ||
| dimensions.)
 | ||
| 
 | ||
|    For a one-dimensional complex DFT, the length 'N' of the data should
 | ||
| be divisible by P _squared_ to be able to divide the problem equally
 | ||
| among the processes.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Transposed distributions,  Next: One-dimensional distributions,  Prev: Load balancing,  Up: MPI Data Distribution
 | ||
| 
 | ||
| 6.4.3 Transposed distributions
 | ||
| ------------------------------
 | ||
| 
 | ||
| Internally, FFTW's MPI transform algorithms work by first computing
 | ||
| transforms of the data local to each process, then by globally
 | ||
| _transposing_ the data in some fashion to redistribute the data among
 | ||
| the processes, transforming the new data local to each process, and
 | ||
| transposing back.  For example, a two-dimensional 'n0' by 'n1' array,
 | ||
| distributed across the 'n0' dimension, is transformd by: (i)
 | ||
| transforming the 'n1' dimension, which are local to each process; (ii)
 | ||
| transposing to an 'n1' by 'n0' array, distributed across the 'n1'
 | ||
| dimension; (iii) transforming the 'n0' dimension, which is now local to
 | ||
| each process; (iv) transposing back.
 | ||
| 
 | ||
|    However, in many applications it is acceptable to compute a
 | ||
| multidimensional DFT whose results are produced in transposed order
 | ||
| (e.g., 'n1' by 'n0' in two dimensions).  This provides a significant
 | ||
| performance advantage, because it means that the final transposition
 | ||
| step can be omitted.  FFTW supports this optimization, which you specify
 | ||
| by passing the flag 'FFTW_MPI_TRANSPOSED_OUT' to the planner routines.
 | ||
| To compute the inverse transform of transposed output, you specify
 | ||
| 'FFTW_MPI_TRANSPOSED_IN' to tell it that the input is transposed.  In
 | ||
| this section, we explain how to interpret the output format of such a
 | ||
| transform.
 | ||
| 
 | ||
|    Suppose you have are transforming multi-dimensional data with (at
 | ||
| least two) dimensions n[0] x n[1] x n[2] x ...  x n[d-1] .  As always,
 | ||
| it is distributed along the first dimension n[0] .  Now, if we compute
 | ||
| its DFT with the 'FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output
 | ||
| data are stored with the first _two_ dimensions transposed: n[1] x n[0]
 | ||
| x n[2] x ...  x n[d-1] , distributed along the n[1] dimension.
 | ||
| Conversely, if we take the n[1] x n[0] x n[2] x ...  x n[d-1] data and
 | ||
| transform it with the 'FFTW_MPI_TRANSPOSED_IN' flag, then the format
 | ||
| goes back to the original n[0] x n[1] x n[2] x ...  x n[d-1] array.
 | ||
| 
 | ||
|    There are two ways to find the portion of the transposed array that
 | ||
| resides on the current process.  First, you can simply call the
 | ||
| appropriate 'local_size' function, passing n[1] x n[0] x n[2] x ...  x
 | ||
| n[d-1] (the transposed dimensions).  This would mean calling the
 | ||
| 'local_size' function twice, once for the transposed and once for the
 | ||
| non-transposed dimensions.  Alternatively, you can call one of the
 | ||
| 'local_size_transposed' functions, which returns both the non-transposed
 | ||
| and transposed data distribution from a single call.  For example, for a
 | ||
| 3d transform with transposed output (or input), you might call:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_3d_transposed(
 | ||
|                      ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
 | ||
|                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                      ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
| 
 | ||
|    Here, 'local_n0' and 'local_0_start' give the size and starting index
 | ||
| of the 'n0' dimension for the _non_-transposed data, as in the previous
 | ||
| sections.  For _transposed_ data (e.g.  the output for
 | ||
| 'FFTW_MPI_TRANSPOSED_OUT'), 'local_n1' and 'local_1_start' give the size
 | ||
| and starting index of the 'n1' dimension, which is the first dimension
 | ||
| of the transposed data ('n1' by 'n0' by 'n2').
 | ||
| 
 | ||
|    (Note that 'FFTW_MPI_TRANSPOSED_IN' is completely equivalent to
 | ||
| performing 'FFTW_MPI_TRANSPOSED_OUT' and passing the first two
 | ||
| dimensions to the planner in reverse order, or vice versa.  If you pass
 | ||
| _both_ the 'FFTW_MPI_TRANSPOSED_IN' and 'FFTW_MPI_TRANSPOSED_OUT' flags,
 | ||
| it is equivalent to swapping the first two dimensions passed to the
 | ||
| planner and passing _neither_ flag.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: One-dimensional distributions,  Prev: Transposed distributions,  Up: MPI Data Distribution
 | ||
| 
 | ||
| 6.4.4 One-dimensional distributions
 | ||
| -----------------------------------
 | ||
| 
 | ||
| For one-dimensional distributed DFTs using FFTW, matters are slightly
 | ||
| more complicated because the data distribution is more closely tied to
 | ||
| how the algorithm works.  In particular, you can no longer pass an
 | ||
| arbitrary block size and must accept FFTW's default; also, the block
 | ||
| sizes may be different for input and output.  Also, the data
 | ||
| distribution depends on the flags and transform direction, in order for
 | ||
| forward and backward transforms to work correctly.
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
 | ||
|                      int sign, unsigned flags,
 | ||
|                      ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
 | ||
|                      ptrdiff_t *local_no, ptrdiff_t *local_o_start);
 | ||
| 
 | ||
|    This function computes the data distribution for a 1d transform of
 | ||
| size 'n0' with the given transform 'sign' and 'flags'.  Both input and
 | ||
| output data use block distributions.  The input on the current process
 | ||
| will consist of 'local_ni' numbers starting at index 'local_i_start';
 | ||
| e.g.  if only a single process is used, then 'local_ni' will be 'n0' and
 | ||
| 'local_i_start' will be '0'.  Similarly for the output, with 'local_no'
 | ||
| numbers starting at index 'local_o_start'.  The return value of
 | ||
| 'fftw_mpi_local_size_1d' will be the total number of elements to
 | ||
| allocate on the current process (which might be slightly larger than the
 | ||
| local size due to intermediate steps in the algorithm).
 | ||
| 
 | ||
|    As mentioned above (*note Load balancing::), the data will be divided
 | ||
| equally among the processes if 'n0' is divisible by the _square_ of the
 | ||
| number of processes.  In this case, 'local_ni' will equal 'local_no'.
 | ||
| Otherwise, they may be different.
 | ||
| 
 | ||
|    For some applications, such as convolutions, the order of the output
 | ||
| data is irrelevant.  In this case, performance can be improved by
 | ||
| specifying that the output data be stored in an FFTW-defined "scrambled"
 | ||
| format.  (In particular, this is the analogue of transposed output in
 | ||
| the multidimensional case: scrambled output saves a communications
 | ||
| step.)  If you pass 'FFTW_MPI_SCRAMBLED_OUT' in the flags, then the
 | ||
| output is stored in this (undocumented) scrambled order.  Conversely, to
 | ||
| perform the inverse transform of data in scrambled order, pass the
 | ||
| 'FFTW_MPI_SCRAMBLED_IN' flag.
 | ||
| 
 | ||
|    In MPI FFTW, only composite sizes 'n0' can be parallelized; we have
 | ||
| not yet implemented a parallel algorithm for large prime sizes.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Multi-dimensional MPI DFTs of Real Data,  Next: Other Multi-dimensional Real-data MPI Transforms,  Prev: MPI Data Distribution,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.5 Multi-dimensional MPI DFTs of Real Data
 | ||
| ===========================================
 | ||
| 
 | ||
| FFTW's MPI interface also supports multi-dimensional DFTs of real data,
 | ||
| similar to the serial r2c and c2r interfaces.  (Parallel one-dimensional
 | ||
| real-data DFTs are not currently supported; you must use a complex
 | ||
| transform and set the imaginary parts of the inputs to zero.)
 | ||
| 
 | ||
|    The key points to understand for r2c and c2r MPI transforms (compared
 | ||
| to the MPI complex DFTs or the serial r2c/c2r transforms), are:
 | ||
| 
 | ||
|    * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1] x
 | ||
|      n[2] x ...  x n[d-1] real data to/from n[0] x n[1] x n[2] x ...  x
 | ||
|      (n[d-1]/2 + 1) complex data: the last dimension of the complex data
 | ||
|      is cut in half (rounded down), plus one.  As for the serial
 | ||
|      transforms, the sizes you pass to the 'plan_dft_r2c' and
 | ||
|      'plan_dft_c2r' are the n[0] x n[1] x n[2] x ...  x n[d-1]
 | ||
|      dimensions of the real data.
 | ||
| 
 | ||
|    * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
 | ||
|      x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x ...
 | ||
|      x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
 | ||
|      _padded_ to make it the same size as the complex output.  This is
 | ||
|      much like the in-place serial r2c/c2r interface (*note
 | ||
|      Multi-Dimensional DFTs of Real Data::), except that in MPI the
 | ||
|      padding is required even for out-of-place data.  The extra padding
 | ||
|      numbers are ignored by FFTW (they are _not_ like zero-padding the
 | ||
|      transform to a larger size); they are only used to determine the
 | ||
|      data layout.
 | ||
| 
 | ||
|    * The data distribution in MPI for _both_ the real and complex data
 | ||
|      is determined by the shape of the _complex_ data.  That is, you
 | ||
|      call the appropriate 'local size' function for the n[0] x n[1] x
 | ||
|      n[2] x ...  x (n[d-1]/2 + 1) complex data, and then use the _same_
 | ||
|      distribution for the real data except that the last complex
 | ||
|      dimension is replaced by a (padded) real dimension of twice the
 | ||
|      length.
 | ||
| 
 | ||
|    For example suppose we are performing an out-of-place r2c transform
 | ||
| of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x M
 | ||
| x N/2+1 complex data.  Similar to the example in *note 2d MPI example::,
 | ||
| we might do something like:
 | ||
| 
 | ||
|      #include <fftw3-mpi.h>
 | ||
| 
 | ||
|      int main(int argc, char **argv)
 | ||
|      {
 | ||
|          const ptrdiff_t L = ..., M = ..., N = ...;
 | ||
|          fftw_plan plan;
 | ||
|          double *rin;
 | ||
|          fftw_complex *cout;
 | ||
|          ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
 | ||
| 
 | ||
|          MPI_Init(&argc, &argv);
 | ||
|          fftw_mpi_init();
 | ||
| 
 | ||
|          /* get local data size and allocate */
 | ||
|          alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
 | ||
|                                               &local_n0, &local_0_start);
 | ||
|          rin = fftw_alloc_real(2 * alloc_local);
 | ||
|          cout = fftw_alloc_complex(alloc_local);
 | ||
| 
 | ||
|          /* create plan for out-of-place r2c DFT */
 | ||
|          plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
 | ||
|                                          FFTW_MEASURE);
 | ||
| 
 | ||
|          /* initialize rin to some function my_func(x,y,z) */
 | ||
|          for (i = 0; i < local_n0; ++i)
 | ||
|             for (j = 0; j < M; ++j)
 | ||
|               for (k = 0; k < N; ++k)
 | ||
|             rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
 | ||
| 
 | ||
|          /* compute transforms as many times as desired */
 | ||
|          fftw_execute(plan);
 | ||
| 
 | ||
|          fftw_destroy_plan(plan);
 | ||
| 
 | ||
|          MPI_Finalize();
 | ||
|      }
 | ||
| 
 | ||
|    Note that we allocated 'rin' using 'fftw_alloc_real' with an argument
 | ||
| of '2 * alloc_local': since 'alloc_local' is the number of _complex_
 | ||
| values to allocate, the number of _real_ values is twice as many.  The
 | ||
| 'rin' array is then local_n0 x M x 2(N/2+1) in row-major order, so its
 | ||
| '(i,j,k)' element is at the index '(i*M + j) * (2*(N/2+1)) + k' (*note
 | ||
| Multi-dimensional Array Format::).
 | ||
| 
 | ||
|    As for the complex transforms, improved performance can be obtained
 | ||
| by specifying that the output is the transpose of the input or vice
 | ||
| versa (*note Transposed distributions::).  In our L x M x N r2c example,
 | ||
| including 'FFTW_TRANSPOSED_OUT' in the flags means that the input would
 | ||
| be a padded L x M x 2(N/2+1) real array distributed over the 'L'
 | ||
| dimension, while the output would be a M x L x N/2+1 complex array
 | ||
| distributed over the 'M' dimension.  To perform the inverse c2r
 | ||
| transform with the same data distributions, you would use the
 | ||
| 'FFTW_TRANSPOSED_IN' flag.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Other Multi-dimensional Real-data MPI Transforms,  Next: FFTW MPI Transposes,  Prev: Multi-dimensional MPI DFTs of Real Data,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.6 Other multi-dimensional Real-Data MPI Transforms
 | ||
| ====================================================
 | ||
| 
 | ||
| FFTW's MPI interface also supports multi-dimensional 'r2r' transforms of
 | ||
| all kinds supported by the serial interface (e.g.  discrete cosine and
 | ||
| sine transforms, discrete Hartley transforms, etc.).  Only
 | ||
| multi-dimensional 'r2r' transforms, not one-dimensional transforms, are
 | ||
| currently parallelized.
 | ||
| 
 | ||
|    These are used much like the multidimensional complex DFTs discussed
 | ||
| above, except that the data is real rather than complex, and one needs
 | ||
| to pass an r2r transform kind ('fftw_r2r_kind') for each dimension as in
 | ||
| the serial FFTW (*note More DFTs of Real Data::).
 | ||
| 
 | ||
|    For example, one might perform a two-dimensional L x M that is an
 | ||
| REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the
 | ||
| second dimension with code like:
 | ||
| 
 | ||
|          const ptrdiff_t L = ..., M = ...;
 | ||
|          fftw_plan plan;
 | ||
|          double *data;
 | ||
|          ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
 | ||
| 
 | ||
|          /* get local data size and allocate */
 | ||
|          alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD,
 | ||
|                                               &local_n0, &local_0_start);
 | ||
|          data = fftw_alloc_real(alloc_local);
 | ||
| 
 | ||
|          /* create plan for in-place REDFT10 x RODFT10 */
 | ||
|          plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD,
 | ||
|                                      FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE);
 | ||
| 
 | ||
|          /* initialize data to some function my_function(x,y) */
 | ||
|          for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j)
 | ||
|             data[i*M + j] = my_function(local_0_start + i, j);
 | ||
| 
 | ||
|          /* compute transforms, in-place, as many times as desired */
 | ||
|          fftw_execute(plan);
 | ||
| 
 | ||
|          fftw_destroy_plan(plan);
 | ||
| 
 | ||
|    Notice that we use the same 'local_size' functions as we did for
 | ||
| complex data, only now we interpret the sizes in terms of real rather
 | ||
| than complex values, and correspondingly use 'fftw_alloc_real'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Transposes,  Next: FFTW MPI Wisdom,  Prev: Other Multi-dimensional Real-data MPI Transforms,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.7 FFTW MPI Transposes
 | ||
| =======================
 | ||
| 
 | ||
| The FFTW's MPI Fourier transforms rely on one or more _global
 | ||
| transposition_ step for their communications.  For example, the
 | ||
| multidimensional transforms work by transforming along some dimensions,
 | ||
| then transposing to make the first dimension local and transforming
 | ||
| that, then transposing back.  Because global transposition of a
 | ||
| block-distributed matrix has many other potential uses besides FFTs,
 | ||
| FFTW's transpose routines can be called directly, as documented in this
 | ||
| section.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Basic distributed-transpose interface::
 | ||
| * Advanced distributed-transpose interface::
 | ||
| * An improved replacement for MPI_Alltoall::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Basic distributed-transpose interface,  Next: Advanced distributed-transpose interface,  Prev: FFTW MPI Transposes,  Up: FFTW MPI Transposes
 | ||
| 
 | ||
| 6.7.1 Basic distributed-transpose interface
 | ||
| -------------------------------------------
 | ||
| 
 | ||
| In particular, suppose that we have an 'n0' by 'n1' array in row-major
 | ||
| order, block-distributed across the 'n0' dimension.  To transpose this
 | ||
| into an 'n1' by 'n0' array block-distributed across the 'n1' dimension,
 | ||
| we would create a plan by calling the following function:
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                        double *in, double *out,
 | ||
|                                        MPI_Comm comm, unsigned flags);
 | ||
| 
 | ||
|    The input and output arrays ('in' and 'out') can be the same.  The
 | ||
| transpose is actually executed by calling 'fftw_execute' on the plan, as
 | ||
| usual.
 | ||
| 
 | ||
|    The 'flags' are the usual FFTW planner flags, but support two
 | ||
| additional flags: 'FFTW_MPI_TRANSPOSED_OUT' and/or
 | ||
| 'FFTW_MPI_TRANSPOSED_IN'.  What these flags indicate, for transpose
 | ||
| plans, is that the output and/or input, respectively, are _locally_
 | ||
| transposed.  That is, on each process input data is normally stored as a
 | ||
| 'local_n0' by 'n1' array in row-major order, but for an
 | ||
| 'FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as 'n1' by
 | ||
| 'local_n0' in row-major order.  Similarly, 'FFTW_MPI_TRANSPOSED_OUT'
 | ||
| means that the output is 'n0' by 'local_n1' instead of 'local_n1' by
 | ||
| 'n0'.
 | ||
| 
 | ||
|    To determine the local size of the array on each process before and
 | ||
| after the transpose, as well as the amount of storage that must be
 | ||
| allocated, one should call 'fftw_mpi_local_size_2d_transposed', just as
 | ||
| for a 2d DFT as described in the previous section:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_2d_transposed
 | ||
|                      (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
 | ||
|                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                       ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
| 
 | ||
|    Again, the return value is the local storage to allocate, which in
 | ||
| this case is the number of _real_ ('double') values rather than complex
 | ||
| numbers as in the previous examples.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Advanced distributed-transpose interface,  Next: An improved replacement for MPI_Alltoall,  Prev: Basic distributed-transpose interface,  Up: FFTW MPI Transposes
 | ||
| 
 | ||
| 6.7.2 Advanced distributed-transpose interface
 | ||
| ----------------------------------------------
 | ||
| 
 | ||
| The above routines are for a transpose of a matrix of numbers (of type
 | ||
| 'double'), using FFTW's default block sizes.  More generally, one can
 | ||
| perform transposes of _tuples_ of numbers, with user-specified block
 | ||
| sizes for the input and output:
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_many_transpose
 | ||
|                      (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
 | ||
|                       ptrdiff_t block0, ptrdiff_t block1,
 | ||
|                       double *in, double *out, MPI_Comm comm, unsigned flags);
 | ||
| 
 | ||
|    In this case, one is transposing an 'n0' by 'n1' matrix of
 | ||
| 'howmany'-tuples (e.g.  'howmany = 2' for complex numbers).  The input
 | ||
| is distributed along the 'n0' dimension with block size 'block0', and
 | ||
| the 'n1' by 'n0' output is distributed along the 'n1' dimension with
 | ||
| block size 'block1'.  If 'FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a
 | ||
| block size then FFTW uses its default block size.  To get the local size
 | ||
| of the data on each process, you should then call
 | ||
| 'fftw_mpi_local_size_many_transposed'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: An improved replacement for MPI_Alltoall,  Prev: Advanced distributed-transpose interface,  Up: FFTW MPI Transposes
 | ||
| 
 | ||
| 6.7.3 An improved replacement for MPI_Alltoall
 | ||
| ----------------------------------------------
 | ||
| 
 | ||
| We close this section by noting that FFTW's MPI transpose routines can
 | ||
| be thought of as a generalization for the 'MPI_Alltoall' function
 | ||
| (albeit only for floating-point types), and in some circumstances can
 | ||
| function as an improved replacement.
 | ||
| 
 | ||
|    'MPI_Alltoall' is defined by the MPI standard as:
 | ||
| 
 | ||
|      int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype,
 | ||
|                       void *recvbuf, int recvcnt, MPI_Datatype recvtype,
 | ||
|                       MPI_Comm comm);
 | ||
| 
 | ||
|    In particular, for 'double*' arrays 'in' and 'out', consider the
 | ||
| call:
 | ||
| 
 | ||
|      MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
 | ||
| 
 | ||
|    This is completely equivalent to:
 | ||
| 
 | ||
|      MPI_Comm_size(comm, &P);
 | ||
|      plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
 | ||
|      fftw_execute(plan);
 | ||
|      fftw_destroy_plan(plan);
 | ||
| 
 | ||
|    That is, computing a P x P transpose on 'P' processes, with a block
 | ||
| size of 1, is just a standard all-to-all communication.
 | ||
| 
 | ||
|    However, using the FFTW routine instead of 'MPI_Alltoall' may have
 | ||
| certain advantages.  First of all, FFTW's routine can operate in-place
 | ||
| ('in == out') whereas 'MPI_Alltoall' can only operate out-of-place.
 | ||
| 
 | ||
|    Second, even for out-of-place plans, FFTW's routine may be faster,
 | ||
| especially if you need to perform the all-to-all communication many
 | ||
| times and can afford to use 'FFTW_MEASURE' or 'FFTW_PATIENT'.  It should
 | ||
| certainly be no slower, not including the time to create the plan, since
 | ||
| one of the possible algorithms that FFTW uses for an out-of-place
 | ||
| transpose _is_ simply to call 'MPI_Alltoall'.  However, FFTW also
 | ||
| considers several other possible algorithms that, depending on your MPI
 | ||
| implementation and your hardware, may be faster.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Wisdom,  Next: Avoiding MPI Deadlocks,  Prev: FFTW MPI Transposes,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.8 FFTW MPI Wisdom
 | ||
| ===================
 | ||
| 
 | ||
| FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be
 | ||
| used to save MPI plans as well as to save uniprocessor plans.  However,
 | ||
| for MPI there are several unavoidable complications.
 | ||
| 
 | ||
|    First, the MPI standard does not guarantee that every process can
 | ||
| perform file I/O (at least, not using C stdio routines)--in general, we
 | ||
| may only assume that process 0 is capable of I/O.(1) So, if we want to
 | ||
| export the wisdom from a single process to a file, we must first export
 | ||
| the wisdom to a string, then send it to process 0, then write it to a
 | ||
| file.
 | ||
| 
 | ||
|    Second, in principle we may want to have separate wisdom for every
 | ||
| process, since in general the processes may run on different hardware
 | ||
| even for a single MPI program.  However, in practice FFTW's MPI code is
 | ||
| designed for the case of homogeneous hardware (*note Load balancing::),
 | ||
| and in this case it is convenient to use the same wisdom for every
 | ||
| process.  Thus, we need a mechanism to synchronize the wisdom.
 | ||
| 
 | ||
|    To address both of these problems, FFTW provides the following two
 | ||
| functions:
 | ||
| 
 | ||
|      void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
 | ||
|      void fftw_mpi_gather_wisdom(MPI_Comm comm);
 | ||
| 
 | ||
|    Given a communicator 'comm', 'fftw_mpi_broadcast_wisdom' will
 | ||
| broadcast the wisdom from process 0 to all other processes.  Conversely,
 | ||
| 'fftw_mpi_gather_wisdom' will collect wisdom from all processes onto
 | ||
| process 0.  (If the plans created for the same problem by different
 | ||
| processes are not the same, 'fftw_mpi_gather_wisdom' will arbitrarily
 | ||
| choose one of the plans.)  Both of these functions may result in
 | ||
| suboptimal plans for different processes if the processes are running on
 | ||
| non-identical hardware.  Both of these functions are _collective_ calls,
 | ||
| which means that they must be executed by all processes in the
 | ||
| communicator.
 | ||
| 
 | ||
|    So, for example, a typical code snippet to import wisdom from a file
 | ||
| and use it on all processes would be:
 | ||
| 
 | ||
|      {
 | ||
|          int rank;
 | ||
| 
 | ||
|          fftw_mpi_init();
 | ||
|          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 | ||
|          if (rank == 0) fftw_import_wisdom_from_filename("mywisdom");
 | ||
|          fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD);
 | ||
|      }
 | ||
| 
 | ||
|    (Note that we must call 'fftw_mpi_init' before importing any wisdom
 | ||
| that might contain MPI plans.)  Similarly, a typical code snippet to
 | ||
| export wisdom from all processes to a file is:
 | ||
| 
 | ||
|      {
 | ||
|          int rank;
 | ||
| 
 | ||
|          fftw_mpi_gather_wisdom(MPI_COMM_WORLD);
 | ||
|          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 | ||
|          if (rank == 0) fftw_export_wisdom_to_filename("mywisdom");
 | ||
|      }
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) In fact, even this assumption is not technically guaranteed by
 | ||
| the standard, although it seems to be universal in actual MPI
 | ||
| implementations and is widely assumed by MPI-using software.
 | ||
| Technically, you need to query the 'MPI_IO' attribute of
 | ||
| 'MPI_COMM_WORLD' with 'MPI_Attr_get'.  If this attribute is
 | ||
| 'MPI_PROC_NULL', no I/O is possible.  If it is 'MPI_ANY_SOURCE', any
 | ||
| process can perform I/O. Otherwise, it is the rank of a process that can
 | ||
| perform I/O ...  but since it is not guaranteed to yield the _same_ rank
 | ||
| on all processes, you have to do an 'MPI_Allreduce' of some kind if you
 | ||
| want all processes to agree about which is going to do I/O. And even
 | ||
| then, the standard only guarantees that this process can perform output,
 | ||
| but not input.  See e.g.  'Parallel Programming with MPI' by P. S.
 | ||
| Pacheco, section 8.1.3.  Needless to say, in our experience virtually no
 | ||
| MPI programmers worry about this.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Avoiding MPI Deadlocks,  Next: FFTW MPI Performance Tips,  Prev: FFTW MPI Wisdom,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.9 Avoiding MPI Deadlocks
 | ||
| ==========================
 | ||
| 
 | ||
| An MPI program can _deadlock_ if one process is waiting for a message
 | ||
| from another process that never gets sent.  To avoid deadlocks when
 | ||
| using FFTW's MPI routines, it is important to know which functions are
 | ||
| _collective_: that is, which functions must _always_ be called in the
 | ||
| _same order_ from _every_ process in a given communicator.  (For
 | ||
| example, 'MPI_Barrier' is the canonical example of a collective function
 | ||
| in the MPI standard.)
 | ||
| 
 | ||
|    The functions in FFTW that are _always_ collective are: every
 | ||
| function beginning with 'fftw_mpi_plan', as well as
 | ||
| 'fftw_mpi_broadcast_wisdom' and 'fftw_mpi_gather_wisdom'.  Also, the
 | ||
| following functions from the ordinary FFTW interface are collective when
 | ||
| they are applied to a plan created by an 'fftw_mpi_plan' function:
 | ||
| 'fftw_execute', 'fftw_destroy_plan', and 'fftw_flops'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Performance Tips,  Next: Combining MPI and Threads,  Prev: Avoiding MPI Deadlocks,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.10 FFTW MPI Performance Tips
 | ||
| ==============================
 | ||
| 
 | ||
| In this section, we collect a few tips on getting the best performance
 | ||
| out of FFTW's MPI transforms.
 | ||
| 
 | ||
|    First, because of the 1d block distribution, FFTW's parallelization
 | ||
| is currently limited by the size of the first dimension.
 | ||
| (Multidimensional block distributions may be supported by a future
 | ||
| version.)  More generally, you should ideally arrange the dimensions so
 | ||
| that FFTW can divide them equally among the processes.  *Note Load
 | ||
| balancing::.
 | ||
| 
 | ||
|    Second, if it is not too inconvenient, you should consider working
 | ||
| with transposed output for multidimensional plans, as this saves a
 | ||
| considerable amount of communications.  *Note Transposed
 | ||
| distributions::.
 | ||
| 
 | ||
|    Third, the fastest choices are generally either an in-place transform
 | ||
| or an out-of-place transform with the 'FFTW_DESTROY_INPUT' flag (which
 | ||
| allows the input array to be used as scratch space).  In-place is
 | ||
| especially beneficial if the amount of data per process is large.
 | ||
| 
 | ||
|    Fourth, if you have multiple arrays to transform at once, rather than
 | ||
| calling FFTW's MPI transforms several times it usually seems to be
 | ||
| faster to interleave the data and use the advanced interface.  (This
 | ||
| groups the communications together instead of requiring separate
 | ||
| messages for each transform.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Combining MPI and Threads,  Next: FFTW MPI Reference,  Prev: FFTW MPI Performance Tips,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.11 Combining MPI and Threads
 | ||
| ==============================
 | ||
| 
 | ||
| In certain cases, it may be advantageous to combine MPI
 | ||
| (distributed-memory) and threads (shared-memory) parallelization.  FFTW
 | ||
| supports this, with certain caveats.  For example, if you have a cluster
 | ||
| of 4-processor shared-memory nodes, you may want to use threads within
 | ||
| the nodes and MPI between the nodes, instead of MPI for all
 | ||
| parallelization.
 | ||
| 
 | ||
|    In particular, it is possible to seamlessly combine the MPI FFTW
 | ||
| routines with the multi-threaded FFTW routines (*note Multi-threaded
 | ||
| FFTW::).  However, some care must be taken in the initialization code,
 | ||
| which should look something like this:
 | ||
| 
 | ||
|      int threads_ok;
 | ||
| 
 | ||
|      int main(int argc, char **argv)
 | ||
|      {
 | ||
|          int provided;
 | ||
|          MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
 | ||
|          threads_ok = provided >= MPI_THREAD_FUNNELED;
 | ||
| 
 | ||
|          if (threads_ok) threads_ok = fftw_init_threads();
 | ||
|          fftw_mpi_init();
 | ||
| 
 | ||
|          ...
 | ||
|          if (threads_ok) fftw_plan_with_nthreads(...);
 | ||
|          ...
 | ||
| 
 | ||
|          MPI_Finalize();
 | ||
|      }
 | ||
| 
 | ||
|    First, note that instead of calling 'MPI_Init', you should call
 | ||
| 'MPI_Init_threads', which is the initialization routine defined by the
 | ||
| MPI-2 standard to indicate to MPI that your program will be
 | ||
| multithreaded.  We pass 'MPI_THREAD_FUNNELED', which indicates that we
 | ||
| will only call MPI routines from the main thread.  (FFTW will launch
 | ||
| additional threads internally, but the extra threads will not call MPI
 | ||
| code.)  (You may also pass 'MPI_THREAD_SERIALIZED' or
 | ||
| 'MPI_THREAD_MULTIPLE', which requests additional multithreading support
 | ||
| from the MPI implementation, but this is not required by FFTW.) The
 | ||
| 'provided' parameter returns what level of threads support is actually
 | ||
| supported by your MPI implementation; this _must_ be at least
 | ||
| 'MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so
 | ||
| we define a global variable 'threads_ok' to record this.  You should
 | ||
| only call 'fftw_init_threads' or 'fftw_plan_with_nthreads' if
 | ||
| 'threads_ok' is true.  For more information on thread safety in MPI, see
 | ||
| the MPI and Threads
 | ||
| (http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the
 | ||
| MPI-2 standard.
 | ||
| 
 | ||
|    Second, we must call 'fftw_init_threads' _before_ 'fftw_mpi_init'.
 | ||
| This is critical for technical reasons having to do with how FFTW
 | ||
| initializes its list of algorithms.
 | ||
| 
 | ||
|    Then, if you call 'fftw_plan_with_nthreads(N)', _every_ MPI process
 | ||
| will launch (up to) 'N' threads to parallelize its transforms.
 | ||
| 
 | ||
|    For example, in the hypothetical cluster of 4-processor nodes, you
 | ||
| might wish to launch only a single MPI process per node, and then call
 | ||
| 'fftw_plan_with_nthreads(4)' on each process to use all processors in
 | ||
| the nodes.
 | ||
| 
 | ||
|    This may or may not be faster than simply using as many MPI processes
 | ||
| as you have processors, however.  On the one hand, using threads within
 | ||
| a node eliminates the need for explicit message passing within the node.
 | ||
| On the other hand, FFTW's transpose routines are not multi-threaded, and
 | ||
| this means that the communications that do take place will not benefit
 | ||
| from parallelization within the node.  Moreover, many MPI
 | ||
| implementations already have optimizations to exploit shared memory when
 | ||
| it is available, so adding the multithreaded FFTW on top of this may be
 | ||
| superfluous.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Reference,  Next: FFTW MPI Fortran Interface,  Prev: Combining MPI and Threads,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.12 FFTW MPI Reference
 | ||
| =======================
 | ||
| 
 | ||
| This chapter provides a complete reference to all FFTW MPI functions,
 | ||
| datatypes, and constants.  See also *note FFTW Reference:: for
 | ||
| information on functions and types in common with the serial interface.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * MPI Files and Data Types::
 | ||
| * MPI Initialization::
 | ||
| * Using MPI Plans::
 | ||
| * MPI Data Distribution Functions::
 | ||
| * MPI Plan Creation::
 | ||
| * MPI Wisdom Communication::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Files and Data Types,  Next: MPI Initialization,  Prev: FFTW MPI Reference,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.1 MPI Files and Data Types
 | ||
| -------------------------------
 | ||
| 
 | ||
| All programs using FFTW's MPI support should include its header file:
 | ||
| 
 | ||
|      #include <fftw3-mpi.h>
 | ||
| 
 | ||
|    Note that this header file includes the serial-FFTW 'fftw3.h' header
 | ||
| file, and also the 'mpi.h' header file for MPI, so you need not include
 | ||
| those files separately.
 | ||
| 
 | ||
|    You must also link to _both_ the FFTW MPI library and to the serial
 | ||
| FFTW library.  On Unix, this means adding '-lfftw3_mpi -lfftw3 -lm' at
 | ||
| the end of the link command.
 | ||
| 
 | ||
|    Different precisions are handled as in the serial interface: *Note
 | ||
| Precision::.  That is, 'fftw_' functions become 'fftwf_' (in single
 | ||
| precision) etcetera, and the libraries become '-lfftw3f_mpi -lfftw3f
 | ||
| -lm' etcetera on Unix.  Long-double precision is supported in MPI, but
 | ||
| quad precision ('fftwq_') is not due to the lack of MPI support for this
 | ||
| type.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Initialization,  Next: Using MPI Plans,  Prev: MPI Files and Data Types,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.2 MPI Initialization
 | ||
| -------------------------
 | ||
| 
 | ||
| Before calling any other FFTW MPI ('fftw_mpi_') function, and before
 | ||
| importing any wisdom for MPI problems, you must call:
 | ||
| 
 | ||
|      void fftw_mpi_init(void);
 | ||
| 
 | ||
|    If FFTW threads support is used, however, 'fftw_mpi_init' should be
 | ||
| called _after_ 'fftw_init_threads' (*note Combining MPI and Threads::).
 | ||
| Calling 'fftw_mpi_init' additional times (before 'fftw_mpi_cleanup') has
 | ||
| no effect.
 | ||
| 
 | ||
|    If you want to deallocate all persistent data and reset FFTW to the
 | ||
| pristine state it was in when you started your program, you can call:
 | ||
| 
 | ||
|      void fftw_mpi_cleanup(void);
 | ||
| 
 | ||
|    (This calls 'fftw_cleanup', so you need not call the serial cleanup
 | ||
| routine too, although it is safe to do so.)  After calling
 | ||
| 'fftw_mpi_cleanup', all existing plans become undefined, and you should
 | ||
| not attempt to execute or destroy them.  You must call 'fftw_mpi_init'
 | ||
| again after 'fftw_mpi_cleanup' if you want to resume using the MPI FFTW
 | ||
| routines.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Using MPI Plans,  Next: MPI Data Distribution Functions,  Prev: MPI Initialization,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.3 Using MPI Plans
 | ||
| ----------------------
 | ||
| 
 | ||
| Once an MPI plan is created, you can execute and destroy it using
 | ||
| 'fftw_execute', 'fftw_destroy_plan', and the other functions in the
 | ||
| serial interface that operate on generic plans (*note Using Plans::).
 | ||
| 
 | ||
|    The 'fftw_execute' and 'fftw_destroy_plan' functions, applied to MPI
 | ||
| plans, are _collective_ calls: they must be called for all processes in
 | ||
| the communicator that was used to create the plan.
 | ||
| 
 | ||
|    You must _not_ use the serial new-array plan-execution functions
 | ||
| 'fftw_execute_dft' and so on (*note New-array Execute Functions::) with
 | ||
| MPI plans.  Such functions are specialized to the problem type, and
 | ||
| there are specific new-array execute functions for MPI plans:
 | ||
| 
 | ||
|      void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out);
 | ||
|      void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out);
 | ||
|      void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out);
 | ||
|      void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out);
 | ||
| 
 | ||
|    These functions have the same restrictions as those of the serial
 | ||
| new-array execute functions.  They are _always_ safe to apply to the
 | ||
| _same_ 'in' and 'out' arrays that were used to create the plan.  They
 | ||
| can only be applied to new arrarys if those arrays have the same types,
 | ||
| dimensions, in-placeness, and alignment as the original arrays, where
 | ||
| the best way to ensure the same alignment is to use FFTW's 'fftw_malloc'
 | ||
| and related allocation functions for all arrays (*note Memory
 | ||
| Allocation::).  Note that distributed transposes (*note FFTW MPI
 | ||
| Transposes::) use 'fftw_mpi_execute_r2r', since they count as rank-zero
 | ||
| r2r plans from FFTW's perspective.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Data Distribution Functions,  Next: MPI Plan Creation,  Prev: Using MPI Plans,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.4 MPI Data Distribution Functions
 | ||
| --------------------------------------
 | ||
| 
 | ||
| As described above (*note MPI Data Distribution::), in order to allocate
 | ||
| your arrays, _before_ creating a plan, you must first call one of the
 | ||
| following routines to determine the required allocation size and the
 | ||
| portion of the array locally stored on a given process.  The 'MPI_Comm'
 | ||
| communicator passed here must be equivalent to the communicator used
 | ||
| below for plan creation.
 | ||
| 
 | ||
|    The basic interface for multidimensional transforms consists of the
 | ||
| functions:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
 | ||
|                                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                       MPI_Comm comm,
 | ||
|                                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
 | ||
|                                    ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
 | ||
|                                                  ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                                                  ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                                  MPI_Comm comm,
 | ||
|                                                  ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                                                  ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
 | ||
|                                               ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                                               ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
| 
 | ||
|    These functions return the number of elements to allocate (complex
 | ||
| numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
 | ||
| 'local_n0' and 'local_0_start' return the portion ('local_0_start' to
 | ||
| 'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1]
 | ||
| x n[2] x ...  x n[d-1] array that is stored on the local process.  *Note
 | ||
| Basic and advanced distribution interfaces::.  For
 | ||
| 'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful
 | ||
| in order to also return the local portion of the first dimension in the
 | ||
| n[1] x n[0] x n[2] x ...  x n[d-1] transposed output.  *Note Transposed
 | ||
| distributions::.  The advanced interface for multidimensional transforms
 | ||
| is:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
 | ||
|                                         ptrdiff_t block0, MPI_Comm comm,
 | ||
|                                         ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
 | ||
|                                                    ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
 | ||
|                                                    ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
 | ||
|                                                    ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
 | ||
| 
 | ||
|    These differ from the basic interface in only two ways.  First, they
 | ||
| allow you to specify block sizes 'block0' and 'block1' (the latter for
 | ||
| the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use
 | ||
| FFTW's default block size as in the basic interface.  Second, you can
 | ||
| pass a 'howmany' parameter, corresponding to the advanced planning
 | ||
| interface below: this is for transforms of contiguous 'howmany'-tuples
 | ||
| of numbers ('howmany = 1' in the basic interface).
 | ||
| 
 | ||
|    The corresponding basic and advanced routines for one-dimensional
 | ||
| transforms (currently only complex DFTs) are:
 | ||
| 
 | ||
|      ptrdiff_t fftw_mpi_local_size_1d(
 | ||
|                   ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
 | ||
|                   ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
 | ||
|                   ptrdiff_t *local_no, ptrdiff_t *local_o_start);
 | ||
|      ptrdiff_t fftw_mpi_local_size_many_1d(
 | ||
|                   ptrdiff_t n0, ptrdiff_t howmany,
 | ||
|                   MPI_Comm comm, int sign, unsigned flags,
 | ||
|                   ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
 | ||
|                   ptrdiff_t *local_no, ptrdiff_t *local_o_start);
 | ||
| 
 | ||
|    As above, the return value is the number of elements to allocate
 | ||
| (complex numbers, for complex DFTs).  The 'local_ni' and 'local_i_start'
 | ||
| arguments return the portion ('local_i_start' to 'local_i_start +
 | ||
| local_ni - 1') of the 1d array that is stored on this process for the
 | ||
| transform _input_, and 'local_no' and 'local_o_start' are the
 | ||
| corresponding quantities for the input.  The 'sign' ('FFTW_FORWARD' or
 | ||
| 'FFTW_BACKWARD') and 'flags' must match the arguments passed when
 | ||
| creating a plan.  Although the inputs and outputs have different data
 | ||
| distributions in general, it is guaranteed that the _output_ data
 | ||
| distribution of an 'FFTW_FORWARD' plan will match the _input_ data
 | ||
| distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for
 | ||
| the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags.  *Note
 | ||
| One-dimensional distributions::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Plan Creation,  Next: MPI Wisdom Communication,  Prev: MPI Data Distribution Functions,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.5 MPI Plan Creation
 | ||
| ------------------------
 | ||
| 
 | ||
| Complex-data MPI DFTs
 | ||
| .....................
 | ||
| 
 | ||
| Plans for complex-data DFTs (*note 2d MPI example::) are created by:
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
 | ||
|                                     MPI_Comm comm, int sign, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                     fftw_complex *in, fftw_complex *out,
 | ||
|                                     MPI_Comm comm, int sign, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                     fftw_complex *in, fftw_complex *out,
 | ||
|                                     MPI_Comm comm, int sign, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
 | ||
|                                  fftw_complex *in, fftw_complex *out,
 | ||
|                                  MPI_Comm comm, int sign, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
 | ||
|                                       ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
 | ||
|                                       fftw_complex *in, fftw_complex *out,
 | ||
|                                       MPI_Comm comm, int sign, unsigned flags);
 | ||
| 
 | ||
|    These are similar to their serial counterparts (*note Complex DFTs::)
 | ||
| in specifying the dimensions, sign, and flags of the transform.  The
 | ||
| 'comm' argument gives an MPI communicator that specifies the set of
 | ||
| processes to participate in the transform; plan creation is a collective
 | ||
| function that must be called for all processes in the communicator.  The
 | ||
| 'in' and 'out' pointers refer only to a portion of the overall transform
 | ||
| data (*note MPI Data Distribution::) as specified by the 'local_size'
 | ||
| functions in the previous section.  Unless 'flags' contains
 | ||
| 'FFTW_ESTIMATE', these arrays are overwritten during plan creation as
 | ||
| for the serial interface.  For multi-dimensional transforms, any
 | ||
| dimensions '> 1' are supported; for one-dimensional transforms, only
 | ||
| composite (non-prime) 'n0' are currently supported (unlike the serial
 | ||
| FFTW). Requesting an unsupported transform size will yield a 'NULL'
 | ||
| plan.  (As in the serial interface, highly composite sizes generally
 | ||
| yield the best performance.)
 | ||
| 
 | ||
|    The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows
 | ||
| you to specify the block sizes for the first dimension ('block') of the
 | ||
| n[0] x n[1] x n[2] x ...  x n[d-1] input data and the first dimension
 | ||
| ('tblock') of the n[1] x n[0] x n[2] x ...  x n[d-1] transposed data (at
 | ||
| intermediate steps of the transform, and for the output if
 | ||
| 'FFTW_TRANSPOSED_OUT' is specified in 'flags').  These must be the same
 | ||
| block sizes as were passed to the corresponding 'local_size' function;
 | ||
| you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
 | ||
| as in the basic interface.  Also, the 'howmany' parameter specifies that
 | ||
| the transform is of contiguous 'howmany'-tuples rather than individual
 | ||
| complex numbers; this corresponds to the same parameter in the serial
 | ||
| advanced interface (*note Advanced Complex DFTs::) with 'stride =
 | ||
| howmany' and 'dist = 1'.
 | ||
| 
 | ||
| MPI flags
 | ||
| .........
 | ||
| 
 | ||
| The 'flags' can be any of those for the serial FFTW (*note Planner
 | ||
| Flags::), and in addition may include one or more of the following
 | ||
| MPI-specific flags, which improve performance at the cost of changing
 | ||
| the output or input data formats.
 | ||
| 
 | ||
|    * 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d
 | ||
|      transforms only, these flags indicate that the output/input of the
 | ||
|      transform are in an undocumented "scrambled" order.  A forward
 | ||
|      'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
 | ||
|      'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).  *Note
 | ||
|      One-dimensional distributions::.
 | ||
| 
 | ||
|    * 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for
 | ||
|      multidimensional ('rnk > 1') transforms only, these flags specify
 | ||
|      that the output or input of an n[0] x n[1] x n[2] x ...  x n[d-1]
 | ||
|      transform is transposed to n[1] x n[0] x n[2] x ...  x n[d-1] .
 | ||
|      *Note Transposed distributions::.
 | ||
| 
 | ||
| Real-data MPI DFTs
 | ||
| ..................
 | ||
| 
 | ||
| Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI
 | ||
| DFTs of Real Data::) are created by:
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                         double *in, fftw_complex *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                         double *in, fftw_complex *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                         double *in, fftw_complex *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
 | ||
|                                      double *in, fftw_complex *out,
 | ||
|                                      MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                         fftw_complex *in, double *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                         fftw_complex *in, double *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                         fftw_complex *in, double *out,
 | ||
|                                         MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
 | ||
|                                      fftw_complex *in, double *out,
 | ||
|                                      MPI_Comm comm, unsigned flags);
 | ||
| 
 | ||
|    Similar to the serial interface (*note Real-data DFTs::), these
 | ||
| transform logically n[0] x n[1] x n[2] x ...  x n[d-1] real data to/from
 | ||
| n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) complex data, representing
 | ||
| the non-redundant half of the conjugate-symmetry output of a real-input
 | ||
| DFT (*note Multi-dimensional Transforms::).  However, the real array
 | ||
| must be stored within a padded n[0] x n[1] x n[2] x ...  x [2 (n[d-1]/2
 | ||
| + 1)] array (much like the in-place serial r2c transforms, but here for
 | ||
| out-of-place transforms as well).  Currently, only multi-dimensional
 | ||
| ('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk
 | ||
| = 1' will yield 'NULL').  As explained above (*note Multi-dimensional
 | ||
| MPI DFTs of Real Data::), the data distribution of both the real and
 | ||
| complex arrays is given by the 'local_size' function called for the
 | ||
| dimensions of the _complex_ array.  Similar to the other planning
 | ||
| functions, the input and output arrays are overwritten when the plan is
 | ||
| created except in 'FFTW_ESTIMATE' mode.
 | ||
| 
 | ||
|    As for the complex DFTs above, there is an advance interface that
 | ||
| allows you to manually specify block sizes and to transform contiguous
 | ||
| 'howmany'-tuples of real/complex numbers:
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_many_dft_r2c
 | ||
|                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
 | ||
|                     ptrdiff_t iblock, ptrdiff_t oblock,
 | ||
|                     double *in, fftw_complex *out,
 | ||
|                     MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_many_dft_c2r
 | ||
|                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
 | ||
|                     ptrdiff_t iblock, ptrdiff_t oblock,
 | ||
|                     fftw_complex *in, double *out,
 | ||
|                     MPI_Comm comm, unsigned flags);
 | ||
| 
 | ||
| MPI r2r transforms
 | ||
| ..................
 | ||
| 
 | ||
| There are corresponding plan-creation routines for r2r transforms (*note
 | ||
| More DFTs of Real Data::), currently supporting multidimensional ('rnk >
 | ||
| 1') transforms only ('rnk = 1' will yield a 'NULL' plan):
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                     double *in, double *out,
 | ||
|                                     MPI_Comm comm,
 | ||
|                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
 | ||
|                                     double *in, double *out,
 | ||
|                                     MPI_Comm comm,
 | ||
|                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
 | ||
|                                     unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
 | ||
|                                  double *in, double *out,
 | ||
|                                  MPI_Comm comm, const fftw_r2r_kind *kind,
 | ||
|                                  unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
 | ||
|                                       ptrdiff_t iblock, ptrdiff_t oblock,
 | ||
|                                       double *in, double *out,
 | ||
|                                       MPI_Comm comm, const fftw_r2r_kind *kind,
 | ||
|                                       unsigned flags);
 | ||
| 
 | ||
|    The parameters are much the same as for the complex DFTs above,
 | ||
| except that the arrays are of real numbers (and hence the outputs of the
 | ||
| 'local_size' data-distribution functions should be interpreted as counts
 | ||
| of real rather than complex numbers).  Also, the 'kind' parameters
 | ||
| specify the r2r kinds along each dimension as for the serial interface
 | ||
| (*note Real-to-Real Transform Kinds::).  *Note Other Multi-dimensional
 | ||
| Real-data MPI Transforms::.
 | ||
| 
 | ||
| MPI transposition
 | ||
| .................
 | ||
| 
 | ||
| FFTW also provides routines to plan a transpose of a distributed 'n0' by
 | ||
| 'n1' array of real numbers, or an array of 'howmany'-tuples of real
 | ||
| numbers with specified block sizes (*note FFTW MPI Transposes::):
 | ||
| 
 | ||
|      fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
 | ||
|                                        double *in, double *out,
 | ||
|                                        MPI_Comm comm, unsigned flags);
 | ||
|      fftw_plan fftw_mpi_plan_many_transpose
 | ||
|                      (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
 | ||
|                       ptrdiff_t block0, ptrdiff_t block1,
 | ||
|                       double *in, double *out, MPI_Comm comm, unsigned flags);
 | ||
| 
 | ||
|    These plans are used with the 'fftw_mpi_execute_r2r' new-array
 | ||
| execute function (*note Using MPI Plans::), since they count as (rank
 | ||
| zero) r2r plans from FFTW's perspective.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: MPI Wisdom Communication,  Prev: MPI Plan Creation,  Up: FFTW MPI Reference
 | ||
| 
 | ||
| 6.12.6 MPI Wisdom Communication
 | ||
| -------------------------------
 | ||
| 
 | ||
| To facilitate synchronizing wisdom among the different MPI processes, we
 | ||
| provide two functions:
 | ||
| 
 | ||
|      void fftw_mpi_gather_wisdom(MPI_Comm comm);
 | ||
|      void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
 | ||
| 
 | ||
|    The 'fftw_mpi_gather_wisdom' function gathers all wisdom in the given
 | ||
| communicator 'comm' to the process of rank 0 in the communicator: that
 | ||
| process obtains the union of all wisdom on all the processes.  As a side
 | ||
| effect, some other processes will gain additional wisdom from other
 | ||
| processes, but only process 0 will gain the complete union.
 | ||
| 
 | ||
|    The 'fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom
 | ||
| from process 0 in 'comm' to all other processes in the communicator,
 | ||
| replacing any wisdom they currently have.
 | ||
| 
 | ||
|    *Note FFTW MPI Wisdom::.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW MPI Fortran Interface,  Prev: FFTW MPI Reference,  Up: Distributed-memory FFTW with MPI
 | ||
| 
 | ||
| 6.13 FFTW MPI Fortran Interface
 | ||
| ===============================
 | ||
| 
 | ||
| The FFTW MPI interface is callable from modern Fortran compilers
 | ||
| supporting the Fortran 2003 'iso_c_binding' standard for calling C
 | ||
| functions.  As described in *note Calling FFTW from Modern Fortran::,
 | ||
| this means that you can directly call FFTW's C interface from Fortran
 | ||
| with only minor changes in syntax.  There are, however, a few things
 | ||
| specific to the MPI interface to keep in mind:
 | ||
| 
 | ||
|    * Instead of including 'fftw3.f03' as in *note Overview of Fortran
 | ||
|      interface::, you should 'include 'fftw3-mpi.f03'' (after 'use,
 | ||
|      intrinsic :: iso_c_binding' as before).  The 'fftw3-mpi.f03' file
 | ||
|      includes 'fftw3.f03', so you should _not_ 'include' them both
 | ||
|      yourself.  (You will also want to include the MPI header file,
 | ||
|      usually via 'include 'mpif.h'' or similar, although though this is
 | ||
|      not needed by 'fftw3-mpi.f03' per se.)  (To use the 'fftwl_' 'long
 | ||
|      double' extended-precision routines in supporting compilers, you
 | ||
|      should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
 | ||
|      *Note Extended and quadruple precision in Fortran::.)
 | ||
| 
 | ||
|    * Because of the different storage conventions between C and Fortran,
 | ||
|      you reverse the order of your array dimensions when passing them to
 | ||
|      FFTW (*note Reversing array dimensions::).  This is merely a
 | ||
|      difference in notation and incurs no performance overhead.
 | ||
|      However, it means that, whereas in C the _first_ dimension is
 | ||
|      distributed, in Fortran the _last_ dimension of your array is
 | ||
|      distributed.
 | ||
| 
 | ||
|    * In Fortran, communicators are stored as 'integer' types; there is
 | ||
|      no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
 | ||
|      Fortunately, this is taken care of for you by the FFTW Fortran
 | ||
|      interface: whenever the C interface expects an 'MPI_Comm' type, you
 | ||
|      should pass the Fortran communicator as an 'integer'.(1)
 | ||
| 
 | ||
|    * Because you need to call the 'local_size' function to find out how
 | ||
|      much space to allocate, and this may be _larger_ than the local
 | ||
|      portion of the array (*note MPI Data Distribution::), you should
 | ||
|      _always_ allocate your arrays dynamically using FFTW's allocation
 | ||
|      routines as described in *note Allocating aligned memory in
 | ||
|      Fortran::.  (Coincidentally, this also provides the best
 | ||
|      performance by guaranteeding proper data alignment.)
 | ||
| 
 | ||
|    * Because all sizes in the MPI FFTW interface are declared as
 | ||
|      'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
 | ||
|      (*note FFTW Fortran type reference::).
 | ||
| 
 | ||
|    * In Fortran, because of the language semantics, we generally
 | ||
|      recommend using the new-array execute functions for all plans, even
 | ||
|      in the common case where you are executing the plan on the same
 | ||
|      arrays for which the plan was created (*note Plan execution in
 | ||
|      Fortran::).  However, note that in the MPI interface these
 | ||
|      functions are changed: 'fftw_execute_dft' becomes
 | ||
|      'fftw_mpi_execute_dft', etcetera.  *Note Using MPI Plans::.
 | ||
| 
 | ||
|    For example, here is a Fortran code snippet to perform a distributed
 | ||
| L x M complex DFT in-place.  (This assumes you have already initialized
 | ||
| MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
 | ||
| 
 | ||
|        use, intrinsic :: iso_c_binding
 | ||
|        include 'fftw3-mpi.f03'
 | ||
|        integer(C_INTPTR_T), parameter :: L = ...
 | ||
|        integer(C_INTPTR_T), parameter :: M = ...
 | ||
|        type(C_PTR) :: plan, cdata
 | ||
|        complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
 | ||
|        integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
 | ||
| 
 | ||
|      !   get local data size and allocate (note dimension reversal)
 | ||
|        alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
 | ||
|                                             local_M, local_j_offset)
 | ||
|        cdata = fftw_alloc_complex(alloc_local)
 | ||
|        call c_f_pointer(cdata, data, [L,local_M])
 | ||
| 
 | ||
|      !   create MPI plan for in-place forward DFT (note dimension reversal)
 | ||
|        plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
 | ||
|                                    FFTW_FORWARD, FFTW_MEASURE)
 | ||
| 
 | ||
|      ! initialize data to some function my_function(i,j)
 | ||
|        do j = 1, local_M
 | ||
|          do i = 1, L
 | ||
|            data(i, j) = my_function(i, j + local_j_offset)
 | ||
|          end do
 | ||
|        end do
 | ||
| 
 | ||
|      ! compute transform (as many times as desired)
 | ||
|        call fftw_mpi_execute_dft(plan, data, data)
 | ||
| 
 | ||
|        call fftw_destroy_plan(plan)
 | ||
|        call fftw_free(cdata)
 | ||
| 
 | ||
|    Note that when we called 'fftw_mpi_local_size_2d' and
 | ||
| 'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
 | ||
| x M Fortran array is viewed by FFTW in C as a M x L array.  This means
 | ||
| that the array was distributed over the 'M' dimension, the local portion
 | ||
| of which is a L x local_M array in Fortran.  (You must _not_ use an
 | ||
| 'allocate' statement to allocate an L x local_M array, however; you must
 | ||
| allocate 'alloc_local' complex numbers, which may be greater than 'L *
 | ||
| local_M', in order to reserve space for intermediate steps of the
 | ||
| transform.)  Finally, we mention that because C's array indices are
 | ||
| zero-based, the 'local_j_offset' argument can conveniently be
 | ||
| interpreted as an offset in the 1-based 'j' index (rather than as a
 | ||
| starting index as in C).
 | ||
| 
 | ||
|    If instead you had used the 'ior(FFTW_MEASURE,
 | ||
| FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
 | ||
| transposed M x local_L array, associated with the _same_ 'cdata'
 | ||
| allocation (since the transform is in-place), and which you could
 | ||
| declare with:
 | ||
| 
 | ||
|        complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
 | ||
|        ...
 | ||
|        call c_f_pointer(cdata, tdata, [M,local_L])
 | ||
| 
 | ||
|    where 'local_L' would have been obtained by changing the
 | ||
| 'fftw_mpi_local_size_2d' call to:
 | ||
| 
 | ||
|        alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
 | ||
|                                 local_M, local_j_offset, local_L, local_i_offset)
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) Technically, this is because you aren't actually calling the C
 | ||
| functions directly.  You are calling wrapper functions that translate
 | ||
| the communicator with 'MPI_Comm_f2c' before calling the ordinary C
 | ||
| interface.  This is all done transparently, however, since the
 | ||
| 'fftw3-mpi.f03' interface file renames the wrappers so that they are
 | ||
| called in Fortran with the same names as the C interface functions.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Calling FFTW from Modern Fortran,  Next: Calling FFTW from Legacy Fortran,  Prev: Distributed-memory FFTW with MPI,  Up: Top
 | ||
| 
 | ||
| 7 Calling FFTW from Modern Fortran
 | ||
| **********************************
 | ||
| 
 | ||
| Fortran 2003 standardized ways for Fortran code to call C libraries, and
 | ||
| this allows us to support a direct translation of the FFTW C API into
 | ||
| Fortran.  Compared to the legacy Fortran 77 interface (*note Calling
 | ||
| FFTW from Legacy Fortran::), this direct interface offers many
 | ||
| advantages, especially compile-time type-checking and aligned memory
 | ||
| allocation.  As of this writing, support for these C interoperability
 | ||
| features seems widespread, having been implemented in nearly all major
 | ||
| Fortran compilers (e.g.  GNU, Intel, IBM, Oracle/Solaris, Portland
 | ||
| Group, NAG).
 | ||
| 
 | ||
|    This chapter documents that interface.  For the most part, since this
 | ||
| interface allows Fortran to call the C interface directly, the usage is
 | ||
| identical to C translated to Fortran syntax.  However, there are a few
 | ||
| subtle points such as memory allocation, wisdom, and data types that
 | ||
| deserve closer attention.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Overview of Fortran interface::
 | ||
| * Reversing array dimensions::
 | ||
| * FFTW Fortran type reference::
 | ||
| * Plan execution in Fortran::
 | ||
| * Allocating aligned memory in Fortran::
 | ||
| * Accessing the wisdom API from Fortran::
 | ||
| * Defining an FFTW module::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Overview of Fortran interface,  Next: Reversing array dimensions,  Prev: Calling FFTW from Modern Fortran,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.1 Overview of Fortran interface
 | ||
| =================================
 | ||
| 
 | ||
| FFTW provides a file 'fftw3.f03' that defines Fortran 2003 interfaces
 | ||
| for all of its C routines, except for the MPI routines described
 | ||
| elsewhere, which can be found in the same directory as 'fftw3.h' (the C
 | ||
| header file).  In any Fortran subroutine where you want to use FFTW
 | ||
| functions, you should begin with:
 | ||
| 
 | ||
|        use, intrinsic :: iso_c_binding
 | ||
|        include 'fftw3.f03'
 | ||
| 
 | ||
|    This includes the interface definitions and the standard
 | ||
| 'iso_c_binding' module (which defines the equivalents of C types).  You
 | ||
| can also put the FFTW functions into a module if you prefer (*note
 | ||
| Defining an FFTW module::).
 | ||
| 
 | ||
|    At this point, you can now call anything in the FFTW C interface
 | ||
| directly, almost exactly as in C other than minor changes in syntax.
 | ||
| For example:
 | ||
| 
 | ||
|        type(C_PTR) :: plan
 | ||
|        complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out
 | ||
|        plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
 | ||
|        ...
 | ||
|        call fftw_execute_dft(plan, in, out)
 | ||
|        ...
 | ||
|        call fftw_destroy_plan(plan)
 | ||
| 
 | ||
|    A few important things to keep in mind are:
 | ||
| 
 | ||
|    * FFTW plans are 'type(C_PTR)'.  Other C types are mapped in the
 | ||
|      obvious way via the 'iso_c_binding' standard: 'int' turns into
 | ||
|      'integer(C_INT)', 'fftw_complex' turns into
 | ||
|      'complex(C_DOUBLE_COMPLEX)', 'double' turns into 'real(C_DOUBLE)',
 | ||
|      and so on.  *Note FFTW Fortran type reference::.
 | ||
| 
 | ||
|    * Functions in C become functions in Fortran if they have a return
 | ||
|      value, and subroutines in Fortran otherwise.
 | ||
| 
 | ||
|    * The ordering of the Fortran array dimensions must be _reversed_
 | ||
|      when they are passed to the FFTW plan creation, thanks to
 | ||
|      differences in array indexing conventions (*note Multi-dimensional
 | ||
|      Array Format::).  This is _unlike_ the legacy Fortran interface
 | ||
|      (*note Fortran-interface routines::), which reversed the dimensions
 | ||
|      for you.  *Note Reversing array dimensions::.
 | ||
| 
 | ||
|    * Using ordinary Fortran array declarations like this works, but may
 | ||
|      yield suboptimal performance because the data may not be not
 | ||
|      aligned to exploit SIMD instructions on modern proessors (*note
 | ||
|      SIMD alignment and fftw_malloc::).  Better performance will often
 | ||
|      be obtained by allocating with 'fftw_alloc'.  *Note Allocating
 | ||
|      aligned memory in Fortran::.
 | ||
| 
 | ||
|    * Similar to the legacy Fortran interface (*note FFTW Execution in
 | ||
|      Fortran::), we currently recommend _not_ using 'fftw_execute' but
 | ||
|      rather using the more specialized functions like 'fftw_execute_dft'
 | ||
|      (*note New-array Execute Functions::).  However, you should execute
 | ||
|      the plan on the 'same arrays' as the ones for which you created the
 | ||
|      plan, unless you are especially careful.  *Note Plan execution in
 | ||
|      Fortran::.  To prevent you from using 'fftw_execute' by mistake,
 | ||
|      the 'fftw3.f03' file does not provide an 'fftw_execute' interface
 | ||
|      declaration.
 | ||
| 
 | ||
|    * Multiple planner flags are combined with 'ior' (equivalent to '|'
 | ||
|      in C). e.g.  'FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes
 | ||
|      'ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'.  (You can also use '+' as
 | ||
|      long as you don't try to include a given flag more than once.)
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Extended and quadruple precision in Fortran::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Extended and quadruple precision in Fortran,  Prev: Overview of Fortran interface,  Up: Overview of Fortran interface
 | ||
| 
 | ||
| 7.1.1 Extended and quadruple precision in Fortran
 | ||
| -------------------------------------------------
 | ||
| 
 | ||
| If FFTW is compiled in 'long double' (extended) precision (*note
 | ||
| Installation and Customization::), you may be able to call the resulting
 | ||
| 'fftwl_' routines (*note Precision::) from Fortran if your compiler
 | ||
| supports the 'C_LONG_DOUBLE_COMPLEX' type code.
 | ||
| 
 | ||
|    Because some Fortran compilers do not support
 | ||
| 'C_LONG_DOUBLE_COMPLEX', the 'fftwl_' declarations are segregated into a
 | ||
| separate interface file 'fftw3l.f03', which you should include _in
 | ||
| addition_ to 'fftw3.f03' (which declares precision-independent 'FFTW_'
 | ||
| constants):
 | ||
| 
 | ||
|        use, intrinsic :: iso_c_binding
 | ||
|        include 'fftw3.f03'
 | ||
|        include 'fftw3l.f03'
 | ||
| 
 | ||
|    We also support using the nonstandard '__float128'
 | ||
| quadruple-precision type provided by recent versions of 'gcc' on 32- and
 | ||
| 64-bit x86 hardware (*note Installation and Customization::), using the
 | ||
| corresponding 'real(16)' and 'complex(16)' types supported by
 | ||
| 'gfortran'.  The quadruple-precision 'fftwq_' functions (*note
 | ||
| Precision::) are declared in a 'fftw3q.f03' interface file, which should
 | ||
| be included in addition to 'fftw3.f03', as above.  You should also link
 | ||
| with '-lfftw3q -lquadmath -lm' as in C.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Reversing array dimensions,  Next: FFTW Fortran type reference,  Prev: Overview of Fortran interface,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.2 Reversing array dimensions
 | ||
| ==============================
 | ||
| 
 | ||
| A minor annoyance in calling FFTW from Fortran is that FFTW's array
 | ||
| dimensions are defined in the C convention (row-major order), while
 | ||
| Fortran's array dimensions are the opposite convention (column-major
 | ||
| order).  *Note Multi-dimensional Array Format::.  This is just a
 | ||
| bookkeeping difference, with no effect on performance.  The only
 | ||
| consequence of this is that, whenever you create an FFTW plan for a
 | ||
| multi-dimensional transform, you must always _reverse the ordering of
 | ||
| the dimensions_.
 | ||
| 
 | ||
|    For example, consider the three-dimensional (L x M x N ) arrays:
 | ||
| 
 | ||
|        complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out
 | ||
| 
 | ||
|    To plan a DFT for these arrays using 'fftw_plan_dft_3d', you could
 | ||
| do:
 | ||
| 
 | ||
|        plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
 | ||
| 
 | ||
|    That is, from FFTW's perspective this is a N x M x L array.  _No data
 | ||
| transposition need occur_, as this is _only notation_.  Similarly, to
 | ||
| use the more generic routine 'fftw_plan_dft' with the same arrays, you
 | ||
| could do:
 | ||
| 
 | ||
|        integer(C_INT), dimension(3) :: n = [N,M,L]
 | ||
|        plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
 | ||
| 
 | ||
|    Note, by the way, that this is different from the legacy Fortran
 | ||
| interface (*note Fortran-interface routines::), which automatically
 | ||
| reverses the order of the array dimension for you.  Here, you are
 | ||
| calling the C interface directly, so there is no "translation" layer.
 | ||
| 
 | ||
|    An important thing to keep in mind is the implication of this for
 | ||
| multidimensional real-to-complex transforms (*note Multi-Dimensional
 | ||
| DFTs of Real Data::).  In C, a multidimensional real-to-complex DFT
 | ||
| chops the last dimension roughly in half (N x M x L real input goes to N
 | ||
| x M x L/2+1 complex output).  In Fortran, because the array dimension
 | ||
| notation is reversed, the _first_ dimension of the complex data is
 | ||
| chopped roughly in half.  For example consider the 'r2c' transform of L
 | ||
| x M x N real input in Fortran:
 | ||
| 
 | ||
|        type(C_PTR) :: plan
 | ||
|        real(C_DOUBLE), dimension(L,M,N) :: in
 | ||
|        complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out
 | ||
|        plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
 | ||
|        ...
 | ||
|        call fftw_execute_dft_r2c(plan, in, out)
 | ||
| 
 | ||
|    Alternatively, for an in-place r2c transform, as described in the C
 | ||
| documentation we must _pad_ the _first_ dimension of the real input with
 | ||
| an extra two entries (which are ignored by FFTW) so as to leave enough
 | ||
| space for the complex output.  The input is _allocated_ as a 2[L/2+1] x
 | ||
| M x N array, even though only L x M x N of it is actually used.  In this
 | ||
| example, we will allocate the array as a pointer type, using
 | ||
| 'fftw_alloc' to ensure aligned memory for maximum performance (*note
 | ||
| Allocating aligned memory in Fortran::); this also makes it easy to
 | ||
| reference the same memory as both a real array and a complex array.
 | ||
| 
 | ||
|        real(C_DOUBLE), pointer :: in(:,:,:)
 | ||
|        complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:)
 | ||
|        type(C_PTR) :: plan, data
 | ||
|        data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T))
 | ||
|        call c_f_pointer(data, in, [2*(L/2+1),M,N])
 | ||
|        call c_f_pointer(data, out, [L/2+1,M,N])
 | ||
|        plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
 | ||
|        ...
 | ||
|        call fftw_execute_dft_r2c(plan, in, out)
 | ||
|        ...
 | ||
|        call fftw_destroy_plan(plan)
 | ||
|        call fftw_free(data)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW Fortran type reference,  Next: Plan execution in Fortran,  Prev: Reversing array dimensions,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.3 FFTW Fortran type reference
 | ||
| ===============================
 | ||
| 
 | ||
| The following are the most important type correspondences between the C
 | ||
| interface and Fortran:
 | ||
| 
 | ||
|    * Plans ('fftw_plan' and variants) are 'type(C_PTR)' (i.e.  an opaque
 | ||
|      pointer).
 | ||
| 
 | ||
|    * The C floating-point types 'double', 'float', and 'long double'
 | ||
|      correspond to 'real(C_DOUBLE)', 'real(C_FLOAT)', and
 | ||
|      'real(C_LONG_DOUBLE)', respectively.  The C complex types
 | ||
|      'fftw_complex', 'fftwf_complex', and 'fftwl_complex' correspond in
 | ||
|      Fortran to 'complex(C_DOUBLE_COMPLEX)', 'complex(C_FLOAT_COMPLEX)',
 | ||
|      and 'complex(C_LONG_DOUBLE_COMPLEX)', respectively.  Just as in C
 | ||
|      (*note Precision::), the FFTW subroutines and types are prefixed
 | ||
|      with 'fftw_', 'fftwf_', and 'fftwl_' for the different precisions,
 | ||
|      and link to different libraries ('-lfftw3', '-lfftw3f', and
 | ||
|      '-lfftw3l' on Unix), but use the _same_ include file 'fftw3.f03'
 | ||
|      and the _same_ constants (all of which begin with 'FFTW_').  The
 | ||
|      exception is 'long double' precision, for which you should _also_
 | ||
|      include 'fftw3l.f03' (*note Extended and quadruple precision in
 | ||
|      Fortran::).
 | ||
| 
 | ||
|    * The C integer types 'int' and 'unsigned' (used for planner flags)
 | ||
|      become 'integer(C_INT)'.  The C integer type 'ptrdiff_t' (e.g.  in
 | ||
|      the *note 64-bit Guru Interface::) becomes 'integer(C_INTPTR_T)',
 | ||
|      and 'size_t' (in 'fftw_malloc' etc.)  becomes 'integer(C_SIZE_T)'.
 | ||
| 
 | ||
|    * The 'fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::)
 | ||
|      becomes 'integer(C_FFTW_R2R_KIND)'.  The various constant values of
 | ||
|      the C enumerated type ('FFTW_R2HC' etc.)  become simply integer
 | ||
|      constants of the same names in Fortran.
 | ||
| 
 | ||
|    * Numeric array pointer arguments (e.g.  'double *') become
 | ||
|      'dimension(*), intent(out)' arrays of the same type, or
 | ||
|      'dimension(*), intent(in)' if they are pointers to constant data
 | ||
|      (e.g.  'const int *').  There are a few exceptions where numeric
 | ||
|      pointers refer to scalar outputs (e.g.  for 'fftw_flops'), in which
 | ||
|      case they are 'intent(out)' scalar arguments in Fortran too.  For
 | ||
|      the new-array execute functions (*note New-array Execute
 | ||
|      Functions::), the input arrays are declared 'dimension(*),
 | ||
|      intent(inout)', since they can be modified in the case of in-place
 | ||
|      or 'FFTW_DESTROY_INPUT' transforms.
 | ||
| 
 | ||
|    * Pointer _return_ values (e.g 'double *') become 'type(C_PTR)'.  (If
 | ||
|      they are pointers to arrays, as for 'fftw_alloc_real', you can
 | ||
|      convert them back to Fortran array pointers with the standard
 | ||
|      intrinsic function 'c_f_pointer'.)
 | ||
| 
 | ||
|    * The 'fftw_iodim' type in the guru interface (*note Guru vector and
 | ||
|      transform sizes::) becomes 'type(fftw_iodim)' in Fortran, a derived
 | ||
|      data type (the Fortran analogue of C's 'struct') with three
 | ||
|      'integer(C_INT)' components: 'n', 'is', and 'os', with the same
 | ||
|      meanings as in C. The 'fftw_iodim64' type in the 64-bit guru
 | ||
|      interface (*note 64-bit Guru Interface::) is the same, except that
 | ||
|      its components are of type 'integer(C_INTPTR_T)'.
 | ||
| 
 | ||
|    * Using the wisdom import/export functions from Fortran is a bit
 | ||
|      tricky, and is discussed in *note Accessing the wisdom API from
 | ||
|      Fortran::.  In brief, the 'FILE *' arguments map to 'type(C_PTR)',
 | ||
|      'const char *' to 'character(C_CHAR), dimension(*), intent(in)'
 | ||
|      (null-terminated!), and the generic read-char/write-char functions
 | ||
|      map to 'type(C_FUNPTR)'.
 | ||
| 
 | ||
|    You may be wondering if you need to search-and-replace
 | ||
| 'real(kind(0.0d0))' (or whatever your favorite Fortran spelling of
 | ||
| "double precision" is) with 'real(C_DOUBLE)' everywhere in your program,
 | ||
| and similarly for 'complex' and 'integer' types.  The answer is no; you
 | ||
| can still use your existing types.  As long as these types match their C
 | ||
| counterparts, things should work without a hitch.  The worst that can
 | ||
| happen, e.g.  in the (unlikely) event of a system where
 | ||
| 'real(kind(0.0d0))' is different from 'real(C_DOUBLE)', is that the
 | ||
| compiler will give you a type-mismatch error.  That is, if you don't use
 | ||
| the 'iso_c_binding' kinds you need to accept at least the theoretical
 | ||
| possibility of having to change your code in response to compiler errors
 | ||
| on some future machine, but you don't need to worry about silently
 | ||
| compiling incorrect code that yields runtime errors.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Plan execution in Fortran,  Next: Allocating aligned memory in Fortran,  Prev: FFTW Fortran type reference,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.4 Plan execution in Fortran
 | ||
| =============================
 | ||
| 
 | ||
| In C, in order to use a plan, one normally calls 'fftw_execute', which
 | ||
| executes the plan to perform the transform on the input/output arrays
 | ||
| passed when the plan was created (*note Using Plans::).  The
 | ||
| corresponding subroutine call in modern Fortran is:
 | ||
|       call fftw_execute(plan)
 | ||
| 
 | ||
|    However, we have had reports that this causes problems with some
 | ||
| recent optimizing Fortran compilers.  The problem is, because the
 | ||
| input/output arrays are not passed as explicit arguments to
 | ||
| 'fftw_execute', the semantics of Fortran (unlike C) allow the compiler
 | ||
| to assume that the input/output arrays are not changed by
 | ||
| 'fftw_execute'.  As a consequence, certain compilers end up
 | ||
| repositioning the call to 'fftw_execute', assuming incorrectly that it
 | ||
| does nothing to the arrays.
 | ||
| 
 | ||
|    There are various workarounds to this, but the safest and simplest
 | ||
| thing is to not use 'fftw_execute' in Fortran.  Instead, use the
 | ||
| functions described in *note New-array Execute Functions::, which take
 | ||
| the input/output arrays as explicit arguments.  For example, if the plan
 | ||
| is for a complex-data DFT and was created for the arrays 'in' and 'out',
 | ||
| you would do:
 | ||
|       call fftw_execute_dft(plan, in, out)
 | ||
| 
 | ||
|    There are a few things to be careful of, however:
 | ||
| 
 | ||
|    * You must use the correct type of execute function, matching the way
 | ||
|      the plan was created.  Complex DFT plans should use
 | ||
|      'fftw_execute_dft', Real-input (r2c) DFT plans should use use
 | ||
|      'fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
 | ||
|      'fftw_execute_dft_c2r'.  The various r2r plans should use
 | ||
|      'fftw_execute_r2r'.  Fortunately, if you use the wrong one you will
 | ||
|      get a compile-time type-mismatch error (unlike legacy Fortran).
 | ||
| 
 | ||
|    * You should normally pass the same input/output arrays that were
 | ||
|      used when creating the plan.  This is always safe.
 | ||
| 
 | ||
|    * _If_ you pass _different_ input/output arrays compared to those
 | ||
|      used when creating the plan, you must abide by all the restrictions
 | ||
|      of the new-array execute functions (*note New-array Execute
 | ||
|      Functions::).  The most tricky of these is the requirement that the
 | ||
|      new arrays have the same alignment as the original arrays; the best
 | ||
|      (and possibly only) way to guarantee this is to use the
 | ||
|      'fftw_alloc' functions to allocate your arrays (*note Allocating
 | ||
|      aligned memory in Fortran::).  Alternatively, you can use the
 | ||
|      'FFTW_UNALIGNED' flag when creating the plan, in which case the
 | ||
|      plan does not depend on the alignment, but this may sacrifice
 | ||
|      substantial performance on architectures (like x86) with SIMD
 | ||
|      instructions (*note SIMD alignment and fftw_malloc::).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Allocating aligned memory in Fortran,  Next: Accessing the wisdom API from Fortran,  Prev: Plan execution in Fortran,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.5 Allocating aligned memory in Fortran
 | ||
| ========================================
 | ||
| 
 | ||
| In order to obtain maximum performance in FFTW, you should store your
 | ||
| data in arrays that have been specially aligned in memory (*note SIMD
 | ||
| alignment and fftw_malloc::).  Enforcing alignment also permits you to
 | ||
| safely use the new-array execute functions (*note New-array Execute
 | ||
| Functions::) to apply a given plan to more than one pair of in/out
 | ||
| arrays.  Unfortunately, standard Fortran arrays do _not_ provide any
 | ||
| alignment guarantees.  The _only_ way to allocate aligned memory in
 | ||
| standard Fortran is to allocate it with an external C function, like the
 | ||
| 'fftw_alloc_real' and 'fftw_alloc_complex' functions.  Fortunately,
 | ||
| Fortran 2003 provides a simple way to associate such allocated memory
 | ||
| with a standard Fortran array pointer that you can then use normally.
 | ||
| 
 | ||
|    We therefore recommend allocating all your input/output arrays using
 | ||
| the following technique:
 | ||
| 
 | ||
|   1. Declare a 'pointer', 'arr', to your array of the desired type and
 | ||
|      dimensions.  For example, 'real(C_DOUBLE), pointer :: a(:,:)' for a
 | ||
|      2d real array, or 'complex(C_DOUBLE_COMPLEX), pointer :: a(:,:,:)'
 | ||
|      for a 3d complex array.
 | ||
| 
 | ||
|   2. The number of elements to allocate must be an 'integer(C_SIZE_T)'.
 | ||
|      You can either declare a variable of this type, e.g.
 | ||
|      'integer(C_SIZE_T) :: sz', to store the number of elements to
 | ||
|      allocate, or you can use the 'int(..., C_SIZE_T)' intrinsic
 | ||
|      function.  e.g.  set 'sz = L * M * N' or use 'int(L * M * N,
 | ||
|      C_SIZE_T)' for an L x M x N array.
 | ||
| 
 | ||
|   3. Declare a 'type(C_PTR) :: p' to hold the return value from FFTW's
 | ||
|      allocation routine.  Set 'p = fftw_alloc_real(sz)' for a real
 | ||
|      array, or 'p = fftw_alloc_complex(sz)' for a complex array.
 | ||
| 
 | ||
|   4. Associate your pointer 'arr' with the allocated memory 'p' using
 | ||
|      the standard 'c_f_pointer' subroutine: 'call c_f_pointer(p, arr,
 | ||
|      [...dimensions...])', where '[...dimensions...])' are an array of
 | ||
|      the dimensions of the array (in the usual Fortran order).  e.g.
 | ||
|      'call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array.
 | ||
|      (Alternatively, you can omit the dimensions argument if you
 | ||
|      specified the shape explicitly when declaring 'arr'.)  You can now
 | ||
|      use 'arr' as a usual multidimensional array.
 | ||
| 
 | ||
|   5. When you are done using the array, deallocate the memory by 'call
 | ||
|      fftw_free(p)' on 'p'.
 | ||
| 
 | ||
|    For example, here is how we would allocate an L x M 2d real array:
 | ||
| 
 | ||
|        real(C_DOUBLE), pointer :: arr(:,:)
 | ||
|        type(C_PTR) :: p
 | ||
|        p = fftw_alloc_real(int(L * M, C_SIZE_T))
 | ||
|        call c_f_pointer(p, arr, [L,M])
 | ||
|        _...use arr and arr(i,j) as usual..._
 | ||
|        call fftw_free(p)
 | ||
| 
 | ||
|    and here is an L x M x N 3d complex array:
 | ||
| 
 | ||
|        complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:)
 | ||
|        type(C_PTR) :: p
 | ||
|        p = fftw_alloc_complex(int(L * M * N, C_SIZE_T))
 | ||
|        call c_f_pointer(p, arr, [L,M,N])
 | ||
|        _...use arr and arr(i,j,k) as usual..._
 | ||
|        call fftw_free(p)
 | ||
| 
 | ||
|    See *note Reversing array dimensions:: for an example allocating a
 | ||
| single array and associating both real and complex array pointers with
 | ||
| it, for in-place real-to-complex transforms.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Accessing the wisdom API from Fortran,  Next: Defining an FFTW module,  Prev: Allocating aligned memory in Fortran,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.6 Accessing the wisdom API from Fortran
 | ||
| =========================================
 | ||
| 
 | ||
| As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a
 | ||
| "wisdom" API for saving plans to disk so that they can be recreated
 | ||
| quickly.  The C API for exporting (*note Wisdom Export::) and importing
 | ||
| (*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran,
 | ||
| however, because of differences in file I/O and string types between C
 | ||
| and Fortran.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Wisdom File Export/Import from Fortran::
 | ||
| * Wisdom String Export/Import from Fortran::
 | ||
| * Wisdom Generic Export/Import from Fortran::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom File Export/Import from Fortran,  Next: Wisdom String Export/Import from Fortran,  Prev: Accessing the wisdom API from Fortran,  Up: Accessing the wisdom API from Fortran
 | ||
| 
 | ||
| 7.6.1 Wisdom File Export/Import from Fortran
 | ||
| --------------------------------------------
 | ||
| 
 | ||
| The easiest way to export and import wisdom is to do so using
 | ||
| 'fftw_export_wisdom_to_filename' and 'fftw_wisdom_from_filename'.  The
 | ||
| only trick is that these require you to pass a C string, which is an
 | ||
| array of type 'CHARACTER(C_CHAR)' that is terminated by 'C_NULL_CHAR'.
 | ||
| You can call them like this:
 | ||
| 
 | ||
|        integer(C_INT) :: ret
 | ||
|        ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
 | ||
|        if (ret .eq. 0) stop 'error exporting wisdom to file'
 | ||
|        ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
 | ||
|        if (ret .eq. 0) stop 'error importing wisdom from file'
 | ||
| 
 | ||
|    Note that prepending 'C_CHAR_' is needed to specify that the literal
 | ||
| string is of kind 'C_CHAR', and we null-terminate the string by
 | ||
| appending '// C_NULL_CHAR'.  These functions return an 'integer(C_INT)'
 | ||
| ('ret') which is '0' if an error occurred during export/import and
 | ||
| nonzero otherwise.
 | ||
| 
 | ||
|    It is also possible to use the lower-level routines
 | ||
| 'fftw_export_wisdom_to_file' and 'fftw_import_wisdom_from_file', which
 | ||
| accept parameters of the C type 'FILE*', expressed in Fortran as
 | ||
| 'type(C_PTR)'.  However, you are then responsible for creating the
 | ||
| 'FILE*' yourself.  You can do this by using 'iso_c_binding' to define
 | ||
| Fortran intefaces for the C library functions 'fopen' and 'fclose',
 | ||
| which is a bit strange in Fortran but workable.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom String Export/Import from Fortran,  Next: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom File Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
 | ||
| 
 | ||
| 7.6.2 Wisdom String Export/Import from Fortran
 | ||
| ----------------------------------------------
 | ||
| 
 | ||
| Dealing with FFTW's C string export/import is a bit more painful.  In
 | ||
| particular, the 'fftw_export_wisdom_to_string' function requires you to
 | ||
| deal with a dynamically allocated C string.  To get its length, you must
 | ||
| define an interface to the C 'strlen' function, and to deallocate it you
 | ||
| must define an interface to C 'free':
 | ||
| 
 | ||
|        use, intrinsic :: iso_c_binding
 | ||
|        interface
 | ||
|          integer(C_INT) function strlen(s) bind(C, name='strlen')
 | ||
|            import
 | ||
|            type(C_PTR), value :: s
 | ||
|          end function strlen
 | ||
|          subroutine free(p) bind(C, name='free')
 | ||
|            import
 | ||
|            type(C_PTR), value :: p
 | ||
|          end subroutine free
 | ||
|        end interface
 | ||
| 
 | ||
|    Given these definitions, you can then export wisdom to a Fortran
 | ||
| character array:
 | ||
| 
 | ||
|        character(C_CHAR), pointer :: s(:)
 | ||
|        integer(C_SIZE_T) :: slen
 | ||
|        type(C_PTR) :: p
 | ||
|        p = fftw_export_wisdom_to_string()
 | ||
|        if (.not. c_associated(p)) stop 'error exporting wisdom'
 | ||
|        slen = strlen(p)
 | ||
|        call c_f_pointer(p, s, [slen+1])
 | ||
|        ...
 | ||
|        call free(p)
 | ||
| 
 | ||
|    Note that 'slen' is the length of the C string, but the length of the
 | ||
| array is 'slen+1' because it includes the terminating null character.
 | ||
| (You can omit the '+1' if you don't want Fortran to know about the null
 | ||
| character.)  The standard 'c_associated' function checks whether 'p' is
 | ||
| a null pointer, which is returned by 'fftw_export_wisdom_to_string' if
 | ||
| there was an error.
 | ||
| 
 | ||
|    To import wisdom from a string, use 'fftw_import_wisdom_from_string'
 | ||
| as usual; note that the argument of this function must be a
 | ||
| 'character(C_CHAR)' that is terminated by the 'C_NULL_CHAR' character,
 | ||
| like the 's' array above.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom String Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
 | ||
| 
 | ||
| 7.6.3 Wisdom Generic Export/Import from Fortran
 | ||
| -----------------------------------------------
 | ||
| 
 | ||
| The most generic wisdom export/import functions allow you to provide an
 | ||
| arbitrary callback function to read/write one character at a time in any
 | ||
| way you want.  However, your callback function must be written in a
 | ||
| special way, using the 'bind(C)' attribute to be passed to a C
 | ||
| interface.
 | ||
| 
 | ||
|    In particular, to call the generic wisdom export function
 | ||
| 'fftw_export_wisdom', you would write a callback subroutine of the form:
 | ||
| 
 | ||
|        subroutine my_write_char(c, p) bind(C)
 | ||
|          use, intrinsic :: iso_c_binding
 | ||
|          character(C_CHAR), value :: c
 | ||
|          type(C_PTR), value :: p
 | ||
|          _...write c..._
 | ||
|        end subroutine my_write_char
 | ||
| 
 | ||
|    Given such a subroutine (along with the corresponding interface
 | ||
| definition), you could then export wisdom using:
 | ||
| 
 | ||
|        call fftw_export_wisdom(c_funloc(my_write_char), p)
 | ||
| 
 | ||
|    The standard 'c_funloc' intrinsic converts a Fortran 'bind(C)'
 | ||
| subroutine into a C function pointer.  The parameter 'p' is a
 | ||
| 'type(C_PTR)' to any arbitrary data that you want to pass to
 | ||
| 'my_write_char' (or 'C_NULL_PTR' if none).  (Note that you can get a C
 | ||
| pointer to Fortran data using the intrinsic 'c_loc', and convert it back
 | ||
| to a Fortran pointer in 'my_write_char' using 'c_f_pointer'.)
 | ||
| 
 | ||
|    Similarly, to use the generic 'fftw_import_wisdom', you would define
 | ||
| a callback function of the form:
 | ||
| 
 | ||
|        integer(C_INT) function my_read_char(p) bind(C)
 | ||
|          use, intrinsic :: iso_c_binding
 | ||
|          type(C_PTR), value :: p
 | ||
|          character :: c
 | ||
|          _...read a character c..._
 | ||
|          my_read_char = ichar(c, C_INT)
 | ||
|        end function my_read_char
 | ||
| 
 | ||
|        ....
 | ||
| 
 | ||
|        integer(C_INT) :: ret
 | ||
|        ret = fftw_import_wisdom(c_funloc(my_read_char), p)
 | ||
|        if (ret .eq. 0) stop 'error importing wisdom'
 | ||
| 
 | ||
|    Your function can return '-1' if the end of the input is reached.
 | ||
| Again, 'p' is an arbitrary 'type(C_PTR' that is passed through to your
 | ||
| function.  'fftw_import_wisdom' returns '0' if an error occurred and
 | ||
| nonzero otherwise.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Defining an FFTW module,  Prev: Accessing the wisdom API from Fortran,  Up: Calling FFTW from Modern Fortran
 | ||
| 
 | ||
| 7.7 Defining an FFTW module
 | ||
| ===========================
 | ||
| 
 | ||
| Rather than using the 'include' statement to include the 'fftw3.f03'
 | ||
| interface file in any subroutine where you want to use FFTW, you might
 | ||
| prefer to define an FFTW Fortran module.  FFTW does not install itself
 | ||
| as a module, primarily because 'fftw3.f03' can be shared between
 | ||
| different Fortran compilers while modules (in general) cannot.  However,
 | ||
| it is trivial to define your own FFTW module if you want.  Just create a
 | ||
| file containing:
 | ||
| 
 | ||
|        module FFTW3
 | ||
|          use, intrinsic :: iso_c_binding
 | ||
|          include 'fftw3.f03'
 | ||
|        end module
 | ||
| 
 | ||
|    Compile this file into a module as usual for your compiler (e.g.
 | ||
| with 'gfortran -c' you will get a file 'fftw3.mod').  Now, instead of
 | ||
| 'include 'fftw3.f03'', whenever you want to use FFTW routines you can
 | ||
| just do:
 | ||
| 
 | ||
|        use FFTW3
 | ||
| 
 | ||
|    as usual for Fortran modules.  (You still need to link to the FFTW
 | ||
| library, of course.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Calling FFTW from Legacy Fortran,  Next: Upgrading from FFTW version 2,  Prev: Calling FFTW from Modern Fortran,  Up: Top
 | ||
| 
 | ||
| 8 Calling FFTW from Legacy Fortran
 | ||
| **********************************
 | ||
| 
 | ||
| This chapter describes the interface to FFTW callable by Fortran code in
 | ||
| older compilers not supporting the Fortran 2003 C interoperability
 | ||
| features (*note Calling FFTW from Modern Fortran::).  This interface has
 | ||
| the major disadvantage that it is not type-checked, so if you mistake
 | ||
| the argument types or ordering then your program will not have any
 | ||
| compiler errors, and will likely crash at runtime.  So, greater care is
 | ||
| needed.  Also, technically interfacing older Fortran versions to C is
 | ||
| nonstandard, but in practice we have found that the techniques used in
 | ||
| this chapter have worked with all known Fortran compilers for many
 | ||
| years.
 | ||
| 
 | ||
|    The legacy Fortran interface differs from the C interface only in the
 | ||
| prefix ('dfftw_' instead of 'fftw_' in double precision) and a few other
 | ||
| minor details.  This Fortran interface is included in the FFTW libraries
 | ||
| by default, unless a Fortran compiler isn't found on your system or
 | ||
| '--disable-fortran' is included in the 'configure' flags.  We assume
 | ||
| here that the reader is already familiar with the usage of FFTW in C, as
 | ||
| described elsewhere in this manual.
 | ||
| 
 | ||
|    The MPI parallel interface to FFTW is _not_ currently available to
 | ||
| legacy Fortran.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Fortran-interface routines::
 | ||
| * FFTW Constants in Fortran::
 | ||
| * FFTW Execution in Fortran::
 | ||
| * Fortran Examples::
 | ||
| * Wisdom of Fortran?::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Fortran-interface routines,  Next: FFTW Constants in Fortran,  Prev: Calling FFTW from Legacy Fortran,  Up: Calling FFTW from Legacy Fortran
 | ||
| 
 | ||
| 8.1 Fortran-interface routines
 | ||
| ==============================
 | ||
| 
 | ||
| Nearly all of the FFTW functions have Fortran-callable equivalents.  The
 | ||
| name of the legacy Fortran routine is the same as that of the
 | ||
| corresponding C routine, but with the 'fftw_' prefix replaced by
 | ||
| 'dfftw_'.(1)  The single and long-double precision versions use 'sfftw_'
 | ||
| and 'lfftw_', respectively, instead of 'fftwf_' and 'fftwl_'; quadruple
 | ||
| precision ('real*16') is available on some systems as 'fftwq_' (*note
 | ||
| Precision::).  (Note that 'long double' on x86 hardware is usually at
 | ||
| most 80-bit extended precision, _not_ quadruple precision.)
 | ||
| 
 | ||
|    For the most part, all of the arguments to the functions are the
 | ||
| same, with the following exceptions:
 | ||
| 
 | ||
|    * 'plan' variables (what would be of type 'fftw_plan' in C), must be
 | ||
|      declared as a type that is at least as big as a pointer (address)
 | ||
|      on your machine.  We recommend using 'integer*8' everywhere, since
 | ||
|      this should always be big enough.
 | ||
| 
 | ||
|    * Any function that returns a value (e.g.  'fftw_plan_dft') is
 | ||
|      converted into a _subroutine_.  The return value is converted into
 | ||
|      an additional _first_ parameter of this subroutine.(2)
 | ||
| 
 | ||
|    * The Fortran routines expect multi-dimensional arrays to be in
 | ||
|      _column-major_ order, which is the ordinary format of Fortran
 | ||
|      arrays (*note Multi-dimensional Array Format::).  They do this
 | ||
|      transparently and costlessly simply by reversing the order of the
 | ||
|      dimensions passed to FFTW, but this has one important consequence
 | ||
|      for multi-dimensional real-complex transforms, discussed below.
 | ||
| 
 | ||
|    * Wisdom import and export is somewhat more tricky because one cannot
 | ||
|      easily pass files or strings between C and Fortran; see *note
 | ||
|      Wisdom of Fortran?::.
 | ||
| 
 | ||
|    * Legacy Fortran cannot use the 'fftw_malloc' dynamic-allocation
 | ||
|      routine.  If you want to exploit the SIMD FFTW (*note SIMD
 | ||
|      alignment and fftw_malloc::), you'll need to figure out some other
 | ||
|      way to ensure that your arrays are at least 16-byte aligned.
 | ||
| 
 | ||
|    * Since Fortran 77 does not have data structures, the 'fftw_iodim'
 | ||
|      structure from the guru interface (*note Guru vector and transform
 | ||
|      sizes::) must be split into separate arguments.  In particular, any
 | ||
|      'fftw_iodim' array arguments in the C guru interface become three
 | ||
|      integer array arguments ('n', 'is', and 'os') in the Fortran guru
 | ||
|      interface, all of whose lengths should be equal to the
 | ||
|      corresponding 'rank' argument.
 | ||
| 
 | ||
|    * The guru planner interface in Fortran does _not_ do any automatic
 | ||
|      translation between column-major and row-major; you are responsible
 | ||
|      for setting the strides etcetera to correspond to your Fortran
 | ||
|      arrays.  However, as a slight bug that we are preserving for
 | ||
|      backwards compatibility, the 'plan_guru_r2r' in Fortran _does_
 | ||
|      reverse the order of its 'kind' array parameter, so the 'kind'
 | ||
|      array of that routine should be in the reverse of the order of the
 | ||
|      iodim arrays (see above).
 | ||
| 
 | ||
|    In general, you should take care to use Fortran data types that
 | ||
| correspond to (i.e.  are the same size as) the C types used by FFTW. In
 | ||
| practice, this correspondence is usually straightforward (i.e.
 | ||
| 'integer' corresponds to 'int', 'real' corresponds to 'float',
 | ||
| etcetera).  The native Fortran double/single-precision complex type
 | ||
| should be compatible with 'fftw_complex'/'fftwf_complex'.  Such simple
 | ||
| correspondences are assumed in the examples below.
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) Technically, Fortran 77 identifiers are not allowed to have more
 | ||
| than 6 characters, nor may they contain underscores.  Any compiler that
 | ||
| enforces this limitation doesn't deserve to link to FFTW.
 | ||
| 
 | ||
|    (2) The reason for this is that some Fortran implementations seem to
 | ||
| have trouble with C function return values, and vice versa.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW Constants in Fortran,  Next: FFTW Execution in Fortran,  Prev: Fortran-interface routines,  Up: Calling FFTW from Legacy Fortran
 | ||
| 
 | ||
| 8.2 FFTW Constants in Fortran
 | ||
| =============================
 | ||
| 
 | ||
| When creating plans in FFTW, a number of constants are used to specify
 | ||
| options, such as 'FFTW_MEASURE' or 'FFTW_ESTIMATE'.  The same constants
 | ||
| must be used with the wrapper routines, but of course the C header files
 | ||
| where the constants are defined can't be incorporated directly into
 | ||
| Fortran code.
 | ||
| 
 | ||
|    Instead, we have placed Fortran equivalents of the FFTW constant
 | ||
| definitions in the file 'fftw3.f', which can be found in the same
 | ||
| directory as 'fftw3.h'.  If your Fortran compiler supports a
 | ||
| preprocessor of some sort, you should be able to 'include' or '#include'
 | ||
| this file; otherwise, you can paste it directly into your code.
 | ||
| 
 | ||
|    In C, you combine different flags (like 'FFTW_PRESERVE_INPUT' and
 | ||
| 'FFTW_MEASURE') using the ''|'' operator; in Fortran you should just use
 | ||
| ''+''.  (Take care not to add in the same flag more than once, though.
 | ||
| Alternatively, you can use the 'ior' intrinsic function standardized in
 | ||
| Fortran 95.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: FFTW Execution in Fortran,  Next: Fortran Examples,  Prev: FFTW Constants in Fortran,  Up: Calling FFTW from Legacy Fortran
 | ||
| 
 | ||
| 8.3 FFTW Execution in Fortran
 | ||
| =============================
 | ||
| 
 | ||
| In C, in order to use a plan, one normally calls 'fftw_execute', which
 | ||
| executes the plan to perform the transform on the input/output arrays
 | ||
| passed when the plan was created (*note Using Plans::).  The
 | ||
| corresponding subroutine call in legacy Fortran is:
 | ||
|              call dfftw_execute(plan)
 | ||
| 
 | ||
|    However, we have had reports that this causes problems with some
 | ||
| recent optimizing Fortran compilers.  The problem is, because the
 | ||
| input/output arrays are not passed as explicit arguments to
 | ||
| 'dfftw_execute', the semantics of Fortran (unlike C) allow the compiler
 | ||
| to assume that the input/output arrays are not changed by
 | ||
| 'dfftw_execute'.  As a consequence, certain compilers end up optimizing
 | ||
| out or repositioning the call to 'dfftw_execute', assuming incorrectly
 | ||
| that it does nothing.
 | ||
| 
 | ||
|    There are various workarounds to this, but the safest and simplest
 | ||
| thing is to not use 'dfftw_execute' in Fortran.  Instead, use the
 | ||
| functions described in *note New-array Execute Functions::, which take
 | ||
| the input/output arrays as explicit arguments.  For example, if the plan
 | ||
| is for a complex-data DFT and was created for the arrays 'in' and 'out',
 | ||
| you would do:
 | ||
|              call dfftw_execute_dft(plan, in, out)
 | ||
| 
 | ||
|    There are a few things to be careful of, however:
 | ||
| 
 | ||
|    * You must use the correct type of execute function, matching the way
 | ||
|      the plan was created.  Complex DFT plans should use
 | ||
|      'dfftw_execute_dft', Real-input (r2c) DFT plans should use use
 | ||
|      'dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
 | ||
|      'dfftw_execute_dft_c2r'.  The various r2r plans should use
 | ||
|      'dfftw_execute_r2r'.
 | ||
| 
 | ||
|    * You should normally pass the same input/output arrays that were
 | ||
|      used when creating the plan.  This is always safe.
 | ||
| 
 | ||
|    * _If_ you pass _different_ input/output arrays compared to those
 | ||
|      used when creating the plan, you must abide by all the restrictions
 | ||
|      of the new-array execute functions (*note New-array Execute
 | ||
|      Functions::).  The most difficult of these, in Fortran, is the
 | ||
|      requirement that the new arrays have the same alignment as the
 | ||
|      original arrays, because there seems to be no way in legacy Fortran
 | ||
|      to obtain guaranteed-aligned arrays (analogous to 'fftw_malloc' in
 | ||
|      C). You can, of course, use the 'FFTW_UNALIGNED' flag when creating
 | ||
|      the plan, in which case the plan does not depend on the alignment,
 | ||
|      but this may sacrifice substantial performance on architectures
 | ||
|      (like x86) with SIMD instructions (*note SIMD alignment and
 | ||
|      fftw_malloc::).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Fortran Examples,  Next: Wisdom of Fortran?,  Prev: FFTW Execution in Fortran,  Up: Calling FFTW from Legacy Fortran
 | ||
| 
 | ||
| 8.4 Fortran Examples
 | ||
| ====================
 | ||
| 
 | ||
| In C, you might have something like the following to transform a
 | ||
| one-dimensional complex array:
 | ||
| 
 | ||
|              fftw_complex in[N], out[N];
 | ||
|              fftw_plan plan;
 | ||
| 
 | ||
|              plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
 | ||
|              fftw_execute(plan);
 | ||
|              fftw_destroy_plan(plan);
 | ||
| 
 | ||
|    In Fortran, you would use the following to accomplish the same thing:
 | ||
| 
 | ||
|              double complex in, out
 | ||
|              dimension in(N), out(N)
 | ||
|              integer*8 plan
 | ||
| 
 | ||
|              call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
 | ||
|              call dfftw_execute_dft(plan, in, out)
 | ||
|              call dfftw_destroy_plan(plan)
 | ||
| 
 | ||
|    Notice how all routines are called as Fortran subroutines, and the
 | ||
| plan is returned via the first argument to 'dfftw_plan_dft_1d'.  Notice
 | ||
| also that we changed 'fftw_execute' to 'dfftw_execute_dft' (*note FFTW
 | ||
| Execution in Fortran::).  To do the same thing, but using 8 threads in
 | ||
| parallel (*note Multi-threaded FFTW::), you would simply prefix these
 | ||
| calls with:
 | ||
| 
 | ||
|              integer iret
 | ||
|              call dfftw_init_threads(iret)
 | ||
|              call dfftw_plan_with_nthreads(8)
 | ||
| 
 | ||
|    (You might want to check the value of 'iret': if it is zero, it
 | ||
| indicates an unlikely error during thread initialization.)
 | ||
| 
 | ||
|    To check the number of threads currently being used by the planner,
 | ||
| you can do the following:
 | ||
| 
 | ||
|              integer iret
 | ||
|              call dfftw_planner_nthreads(iret)
 | ||
| 
 | ||
|    To transform a three-dimensional array in-place with C, you might do:
 | ||
| 
 | ||
|              fftw_complex arr[L][M][N];
 | ||
|              fftw_plan plan;
 | ||
| 
 | ||
|              plan = fftw_plan_dft_3d(L,M,N, arr,arr,
 | ||
|                                      FFTW_FORWARD, FFTW_ESTIMATE);
 | ||
|              fftw_execute(plan);
 | ||
|              fftw_destroy_plan(plan);
 | ||
| 
 | ||
|    In Fortran, you would use this instead:
 | ||
| 
 | ||
|              double complex arr
 | ||
|              dimension arr(L,M,N)
 | ||
|              integer*8 plan
 | ||
| 
 | ||
|              call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
 | ||
|             &                       FFTW_FORWARD, FFTW_ESTIMATE)
 | ||
|              call dfftw_execute_dft(plan, arr, arr)
 | ||
|              call dfftw_destroy_plan(plan)
 | ||
| 
 | ||
|    Note that we pass the array dimensions in the "natural" order in both
 | ||
| C and Fortran.
 | ||
| 
 | ||
|    To transform a one-dimensional real array in Fortran, you might do:
 | ||
| 
 | ||
|              double precision in
 | ||
|              dimension in(N)
 | ||
|              double complex out
 | ||
|              dimension out(N/2 + 1)
 | ||
|              integer*8 plan
 | ||
| 
 | ||
|              call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
 | ||
|              call dfftw_execute_dft_r2c(plan, in, out)
 | ||
|              call dfftw_destroy_plan(plan)
 | ||
| 
 | ||
|    To transform a two-dimensional real array, out of place, you might
 | ||
| use the following:
 | ||
| 
 | ||
|              double precision in
 | ||
|              dimension in(M,N)
 | ||
|              double complex out
 | ||
|              dimension out(M/2 + 1, N)
 | ||
|              integer*8 plan
 | ||
| 
 | ||
|              call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
 | ||
|              call dfftw_execute_dft_r2c(plan, in, out)
 | ||
|              call dfftw_destroy_plan(plan)
 | ||
| 
 | ||
|    *Important:* Notice that it is the _first_ dimension of the complex
 | ||
| output array that is cut in half in Fortran, rather than the last
 | ||
| dimension as in C. This is a consequence of the interface routines
 | ||
| reversing the order of the array dimensions passed to FFTW so that the
 | ||
| Fortran program can use its ordinary column-major order.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Wisdom of Fortran?,  Prev: Fortran Examples,  Up: Calling FFTW from Legacy Fortran
 | ||
| 
 | ||
| 8.5 Wisdom of Fortran?
 | ||
| ======================
 | ||
| 
 | ||
| In this section, we discuss how one can import/export FFTW wisdom (saved
 | ||
| plans) to/from a Fortran program; we assume that the reader is already
 | ||
| familiar with wisdom, as described in *note Words of Wisdom-Saving
 | ||
| Plans::.
 | ||
| 
 | ||
|    The basic problem is that is difficult to (portably) pass files and
 | ||
| strings between Fortran and C, so we cannot provide a direct Fortran
 | ||
| equivalent to the 'fftw_export_wisdom_to_file', etcetera, functions.
 | ||
| Fortran interfaces _are_ provided for the functions that do not take
 | ||
| file/string arguments, however: 'dfftw_import_system_wisdom',
 | ||
| 'dfftw_import_wisdom', 'dfftw_export_wisdom', and 'dfftw_forget_wisdom'.
 | ||
| 
 | ||
|    So, for example, to import the system-wide wisdom, you would do:
 | ||
| 
 | ||
|              integer isuccess
 | ||
|              call dfftw_import_system_wisdom(isuccess)
 | ||
| 
 | ||
|    As usual, the C return value is turned into a first parameter;
 | ||
| 'isuccess' is non-zero on success and zero on failure (e.g.  if there is
 | ||
| no system wisdom installed).
 | ||
| 
 | ||
|    If you want to import/export wisdom from/to an arbitrary file or
 | ||
| elsewhere, you can employ the generic 'dfftw_import_wisdom' and
 | ||
| 'dfftw_export_wisdom' functions, for which you must supply a subroutine
 | ||
| to read/write one character at a time.  The FFTW package contains an
 | ||
| example file 'doc/f77_wisdom.f' demonstrating how to implement
 | ||
| 'import_wisdom_from_file' and 'export_wisdom_to_file' subroutines in
 | ||
| this way.  (These routines cannot be compiled into the FFTW library
 | ||
| itself, lest all FFTW-using programs be required to link with the
 | ||
| Fortran I/O library.)
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Upgrading from FFTW version 2,  Next: Installation and Customization,  Prev: Calling FFTW from Legacy Fortran,  Up: Top
 | ||
| 
 | ||
| 9 Upgrading from FFTW version 2
 | ||
| *******************************
 | ||
| 
 | ||
| In this chapter, we outline the process for updating codes designed for
 | ||
| the older FFTW 2 interface to work with FFTW 3.  The interface for FFTW
 | ||
| 3 is not backwards-compatible with the interface for FFTW 2 and earlier
 | ||
| versions; codes written to use those versions will fail to link with
 | ||
| FFTW 3.  Nor is it possible to write "compatibility wrappers" to bridge
 | ||
| the gap (at least not efficiently), because FFTW 3 has different
 | ||
| semantics from previous versions.  However, upgrading should be a
 | ||
| straightforward process because the data formats are identical and the
 | ||
| overall style of planning/execution is essentially the same.
 | ||
| 
 | ||
|    Unlike FFTW 2, there are no separate header files for real and
 | ||
| complex transforms (or even for different precisions) in FFTW 3; all
 | ||
| interfaces are defined in the '<fftw3.h>' header file.
 | ||
| 
 | ||
| Numeric Types
 | ||
| =============
 | ||
| 
 | ||
| The main difference in data types is that 'fftw_complex' in FFTW 2 was
 | ||
| defined as a 'struct' with macros 'c_re' and 'c_im' for accessing the
 | ||
| real/imaginary parts.  (This is binary-compatible with FFTW 3 on any
 | ||
| machine except perhaps for some older Crays in single precision.)  The
 | ||
| equivalent macros for FFTW 3 are:
 | ||
| 
 | ||
|      #define c_re(c) ((c)[0])
 | ||
|      #define c_im(c) ((c)[1])
 | ||
| 
 | ||
|    This does not work if you are using the C99 complex type, however,
 | ||
| unless you insert a 'double*' typecast into the above macros (*note
 | ||
| Complex numbers::).
 | ||
| 
 | ||
|    Also, FFTW 2 had an 'fftw_real' typedef that was an alias for
 | ||
| 'double' (in double precision).  In FFTW 3 you should just use 'double'
 | ||
| (or whatever precision you are employing).
 | ||
| 
 | ||
| Plans
 | ||
| =====
 | ||
| 
 | ||
| The major difference between FFTW 2 and FFTW 3 is in the
 | ||
| planning/execution division of labor.  In FFTW 2, plans were found for a
 | ||
| given transform size and type, and then could be applied to _any_ arrays
 | ||
| and for _any_ multiplicity/stride parameters.  In FFTW 3, you specify
 | ||
| the particular arrays, stride parameters, etcetera when creating the
 | ||
| plan, and the plan is then executed for _those_ arrays (unless the guru
 | ||
| interface is used) and _those_ parameters _only_.  (FFTW 2 had "specific
 | ||
| planner" routines that planned for a particular array and stride, but
 | ||
| the plan could still be used for other arrays and strides.)  That is,
 | ||
| much of the information that was formerly specified at execution time is
 | ||
| now specified at planning time.
 | ||
| 
 | ||
|    Like FFTW 2's specific planner routines, the FFTW 3 planner
 | ||
| overwrites the input/output arrays unless you use 'FFTW_ESTIMATE'.
 | ||
| 
 | ||
|    FFTW 2 had separate data types 'fftw_plan', 'fftwnd_plan',
 | ||
| 'rfftw_plan', and 'rfftwnd_plan' for complex and real one- and
 | ||
| multi-dimensional transforms, and each type had its own 'destroy'
 | ||
| function.  In FFTW 3, all plans are of type 'fftw_plan' and all are
 | ||
| destroyed by 'fftw_destroy_plan(plan)'.
 | ||
| 
 | ||
|    Where you formerly used 'fftw_create_plan' and 'fftw_one' to plan and
 | ||
| compute a single 1d transform, you would now use 'fftw_plan_dft_1d' to
 | ||
| plan the transform.  If you used the generic 'fftw' function to execute
 | ||
| the transform with multiplicity ('howmany') and stride parameters, you
 | ||
| would now use the advanced interface 'fftw_plan_many_dft' to specify
 | ||
| those parameters.  The plans are now executed with 'fftw_execute(plan)',
 | ||
| which takes all of its parameters (including the input/output arrays)
 | ||
| from the plan.
 | ||
| 
 | ||
|    In-place transforms no longer interpret their output argument as
 | ||
| scratch space, nor is there an 'FFTW_IN_PLACE' flag.  You simply pass
 | ||
| the same pointer for both the input and output arguments.  (Previously,
 | ||
| the output 'ostride' and 'odist' parameters were ignored for in-place
 | ||
| transforms; now, if they are specified via the advanced interface, they
 | ||
| are significant even in the in-place case, although they should normally
 | ||
| equal the corresponding input parameters.)
 | ||
| 
 | ||
|    The 'FFTW_ESTIMATE' and 'FFTW_MEASURE' flags have the same meaning as
 | ||
| before, although the planning time will differ.  You may also consider
 | ||
| using 'FFTW_PATIENT', which is like 'FFTW_MEASURE' except that it takes
 | ||
| more time in order to consider a wider variety of algorithms.
 | ||
| 
 | ||
|    For multi-dimensional complex DFTs, instead of 'fftwnd_create_plan'
 | ||
| (or 'fftw2d_create_plan' or 'fftw3d_create_plan'), followed by
 | ||
| 'fftwnd_one', you would use 'fftw_plan_dft' (or 'fftw_plan_dft_2d' or
 | ||
| 'fftw_plan_dft_3d').  followed by 'fftw_execute'.  If you used 'fftwnd'
 | ||
| to to specify strides etcetera, you would instead specify these via
 | ||
| 'fftw_plan_many_dft'.
 | ||
| 
 | ||
|    The analogues to 'rfftw_create_plan' and 'rfftw_one' with
 | ||
| 'FFTW_REAL_TO_COMPLEX' or 'FFTW_COMPLEX_TO_REAL' directions are
 | ||
| 'fftw_plan_r2r_1d' with kind 'FFTW_R2HC' or 'FFTW_HC2R', followed by
 | ||
| 'fftw_execute'.  The stride etcetera arguments of 'rfftw' are now in
 | ||
| 'fftw_plan_many_r2r'.
 | ||
| 
 | ||
|    Instead of 'rfftwnd_create_plan' (or 'rfftw2d_create_plan' or
 | ||
| 'rfftw3d_create_plan') followed by 'rfftwnd_one_real_to_complex' or
 | ||
| 'rfftwnd_one_complex_to_real', you now use 'fftw_plan_dft_r2c' (or
 | ||
| 'fftw_plan_dft_r2c_2d' or 'fftw_plan_dft_r2c_3d') or 'fftw_plan_dft_c2r'
 | ||
| (or 'fftw_plan_dft_c2r_2d' or 'fftw_plan_dft_c2r_3d'), respectively,
 | ||
| followed by 'fftw_execute'.  As usual, the strides etcetera of
 | ||
| 'rfftwnd_real_to_complex' or 'rfftwnd_complex_to_real' are no specified
 | ||
| in the advanced planner routines, 'fftw_plan_many_dft_r2c' or
 | ||
| 'fftw_plan_many_dft_c2r'.
 | ||
| 
 | ||
| Wisdom
 | ||
| ======
 | ||
| 
 | ||
| In FFTW 2, you had to supply the 'FFTW_USE_WISDOM' flag in order to use
 | ||
| wisdom; in FFTW 3, wisdom is always used.  (You could simulate the FFTW
 | ||
| 2 wisdom-less behavior by calling 'fftw_forget_wisdom' after every
 | ||
| planner call.)
 | ||
| 
 | ||
|    The FFTW 3 wisdom import/export routines are almost the same as
 | ||
| before (although the storage format is entirely different).  There is
 | ||
| one significant difference, however.  In FFTW 2, the import routines
 | ||
| would never read past the end of the wisdom, so you could store extra
 | ||
| data beyond the wisdom in the same file, for example.  In FFTW 3, the
 | ||
| file-import routine may read up to a few hundred bytes past the end of
 | ||
| the wisdom, so you cannot store other data just beyond it.(1)
 | ||
| 
 | ||
|    Wisdom has been enhanced by additional humility in FFTW 3: whereas
 | ||
| FFTW 2 would re-use wisdom for a given transform size regardless of the
 | ||
| stride etc., in FFTW 3 wisdom is only used with the strides etc.  for
 | ||
| which it was created.  Unfortunately, this means FFTW 3 has to create
 | ||
| new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g.
 | ||
| one transform of size 1024 also created wisdom for all smaller powers of
 | ||
| 2, but this no longer occurs).
 | ||
| 
 | ||
|    FFTW 3 also has the new routine 'fftw_import_system_wisdom' to import
 | ||
| wisdom from a standard system-wide location.
 | ||
| 
 | ||
| Memory allocation
 | ||
| =================
 | ||
| 
 | ||
| In FFTW 3, we recommend allocating your arrays with 'fftw_malloc' and
 | ||
| deallocating them with 'fftw_free'; this is not required, but allows
 | ||
| optimal performance when SIMD acceleration is used.  (Those two
 | ||
| functions actually existed in FFTW 2, and worked the same way, but were
 | ||
| not documented.)
 | ||
| 
 | ||
|    In FFTW 2, there were 'fftw_malloc_hook' and 'fftw_free_hook'
 | ||
| functions that allowed the user to replace FFTW's memory-allocation
 | ||
| routines (e.g.  to implement different error-handling, since by default
 | ||
| FFTW prints an error message and calls 'exit' to abort the program if
 | ||
| 'malloc' returns 'NULL').  These hooks are not supported in FFTW 3;
 | ||
| those few users who require this functionality can just directly modify
 | ||
| the memory-allocation routines in FFTW (they are defined in
 | ||
| 'kernel/alloc.c').
 | ||
| 
 | ||
| Fortran interface
 | ||
| =================
 | ||
| 
 | ||
| In FFTW 2, the subroutine names were obtained by replacing 'fftw_' with
 | ||
| 'fftw_f77'; in FFTW 3, you replace 'fftw_' with 'dfftw_' (or 'sfftw_' or
 | ||
| 'lfftw_', depending upon the precision).
 | ||
| 
 | ||
|    In FFTW 3, we have begun recommending that you always declare the
 | ||
| type used to store plans as 'integer*8'.  (Too many people didn't notice
 | ||
| our instruction to switch from 'integer' to 'integer*8' for 64-bit
 | ||
| machines.)
 | ||
| 
 | ||
|    In FFTW 3, we provide a 'fftw3.f' "header file" to include in your
 | ||
| code (and which is officially installed on Unix systems).  (In FFTW 2,
 | ||
| we supplied a 'fftw_f77.i' file, but it was not installed.)
 | ||
| 
 | ||
|    Otherwise, the C-Fortran interface relationship is much the same as
 | ||
| it was before (e.g.  return values become initial parameters, and
 | ||
| multi-dimensional arrays are in column-major order).  Unlike FFTW 2, we
 | ||
| do provide some support for wisdom import/export in Fortran (*note
 | ||
| Wisdom of Fortran?::).
 | ||
| 
 | ||
| Threads
 | ||
| =======
 | ||
| 
 | ||
| Like FFTW 2, only the execution routines are thread-safe.  All planner
 | ||
| routines, etcetera, should be called by only a single thread at a time
 | ||
| (*note Thread safety::).  _Unlike_ FFTW 2, there is no special
 | ||
| 'FFTW_THREADSAFE' flag for the planner to allow a given plan to be
 | ||
| usable by multiple threads in parallel; this is now the case by default.
 | ||
| 
 | ||
|    The multi-threaded version of FFTW 2 required you to pass the number
 | ||
| of threads each time you execute the transform.  The number of threads
 | ||
| is now stored in the plan, and is specified before the planner is called
 | ||
| by 'fftw_plan_with_nthreads'.  The threads initialization routine used
 | ||
| to be called 'fftw_threads_init' and would return zero on success; the
 | ||
| new routine is called 'fftw_init_threads' and returns zero on failure.
 | ||
| The current number of threads used by the planner can be checked with
 | ||
| 'fftw_planner_nthreads'.  *Note Multi-threaded FFTW::.
 | ||
| 
 | ||
|    There is no separate threads header file in FFTW 3; all the function
 | ||
| prototypes are in '<fftw3.h>'.  However, you still have to link to a
 | ||
| separate library ('-lfftw3_threads -lfftw3 -lm' on Unix), as well as to
 | ||
| the threading library (e.g.  POSIX threads on Unix).
 | ||
| 
 | ||
|    ---------- Footnotes ----------
 | ||
| 
 | ||
|    (1) We do our own buffering because GNU libc I/O routines are
 | ||
| horribly slow for single-character I/O, apparently for thread-safety
 | ||
| reasons (whether you are using threads or not).
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Installation and Customization,  Next: Acknowledgments,  Prev: Upgrading from FFTW version 2,  Up: Top
 | ||
| 
 | ||
| 10 Installation and Customization
 | ||
| *********************************
 | ||
| 
 | ||
| This chapter describes the installation and customization of FFTW, the
 | ||
| latest version of which may be downloaded from the FFTW home page
 | ||
| (http://www.fftw.org).
 | ||
| 
 | ||
|    In principle, FFTW should work on any system with an ANSI C compiler
 | ||
| ('gcc' is fine).  However, planner time is drastically reduced if FFTW
 | ||
| can exploit a hardware cycle counter; FFTW comes with cycle-counter
 | ||
| support for all modern general-purpose CPUs, but you may need to add a
 | ||
| couple of lines of code if your compiler is not yet supported (*note
 | ||
| Cycle Counters::).  (On Unix, there will be a warning at the end of the
 | ||
| 'configure' output if no cycle counter is found.)
 | ||
| 
 | ||
|    Installation of FFTW is simplest if you have a Unix or a GNU system,
 | ||
| such as GNU/Linux, and we describe this case in the first section below,
 | ||
| including the use of special configuration options to e.g.  install
 | ||
| different precisions or exploit optimizations for particular
 | ||
| architectures (e.g.  SIMD). Compilation on non-Unix systems is a more
 | ||
| manual process, but we outline the procedure in the second section.  It
 | ||
| is also likely that pre-compiled binaries will be available for popular
 | ||
| systems.
 | ||
| 
 | ||
|    Finally, we describe how you can customize FFTW for particular needs
 | ||
| by generating _codelets_ for fast transforms of sizes not supported
 | ||
| efficiently by the standard FFTW distribution.
 | ||
| 
 | ||
| * Menu:
 | ||
| 
 | ||
| * Installation on Unix::
 | ||
| * Installation on non-Unix systems::
 | ||
| * Cycle Counters::
 | ||
| * Generating your own code::
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Installation on Unix,  Next: Installation on non-Unix systems,  Prev: Installation and Customization,  Up: Installation and Customization
 | ||
| 
 | ||
| 10.1 Installation on Unix
 | ||
| =========================
 | ||
| 
 | ||
| FFTW comes with a 'configure' program in the GNU style.  Installation
 | ||
| can be as simple as:
 | ||
| 
 | ||
|      ./configure
 | ||
|      make
 | ||
|      make install
 | ||
| 
 | ||
|    This will build the uniprocessor complex and real transform libraries
 | ||
| along with the test programs.  (We recommend that you use GNU 'make' if
 | ||
| it is available; on some systems it is called 'gmake'.)  The "'make
 | ||
| install'" command installs the fftw and rfftw libraries in standard
 | ||
| places, and typically requires root privileges (unless you specify a
 | ||
| different install directory with the '--prefix' flag to 'configure').
 | ||
| You can also type "'make check'" to put the FFTW test programs through
 | ||
| their paces.  If you have problems during configuration or compilation,
 | ||
| you may want to run "'make distclean'" before trying again; this ensures
 | ||
| that you don't have any stale files left over from previous compilation
 | ||
| attempts.
 | ||
| 
 | ||
|    The 'configure' script chooses the 'gcc' compiler by default, if it
 | ||
| is available; you can select some other compiler with:
 | ||
|      ./configure CC="<the name of your C compiler>"
 | ||
| 
 | ||
|    The 'configure' script knows good 'CFLAGS' (C compiler flags) for a
 | ||
| few systems.  If your system is not known, the 'configure' script will
 | ||
| print out a warning.  In this case, you should re-configure FFTW with
 | ||
| the command
 | ||
|      ./configure CFLAGS="<write your CFLAGS here>"
 | ||
|    and then compile as usual.  If you do find an optimal set of 'CFLAGS'
 | ||
| for your system, please let us know what they are (along with the output
 | ||
| of 'config.guess') so that we can include them in future releases.
 | ||
| 
 | ||
|    'configure' supports all the standard flags defined by the GNU Coding
 | ||
| Standards; see the 'INSTALL' file in FFTW or the GNU web page
 | ||
| (http://www.gnu.org/prep/standards/html_node/index.html).  Note
 | ||
| especially '--help' to list all flags and '--enable-shared' to create
 | ||
| shared, rather than static, libraries.  'configure' also accepts a few
 | ||
| FFTW-specific flags, particularly:
 | ||
| 
 | ||
|    * '--enable-float': Produces a single-precision version of FFTW
 | ||
|      ('float') instead of the default double-precision ('double').
 | ||
|      *Note Precision::.
 | ||
| 
 | ||
|    * '--enable-long-double': Produces a long-double precision version of
 | ||
|      FFTW ('long double') instead of the default double-precision
 | ||
|      ('double').  The 'configure' script will halt with an error message
 | ||
|      if 'long double' is the same size as 'double' on your
 | ||
|      machine/compiler.  *Note Precision::.
 | ||
| 
 | ||
|    * '--enable-quad-precision': Produces a quadruple-precision version
 | ||
|      of FFTW using the nonstandard '__float128' type provided by 'gcc'
 | ||
|      4.6 or later on x86, x86-64, and Itanium architectures, instead of
 | ||
|      the default double-precision ('double').  The 'configure' script
 | ||
|      will halt with an error message if the compiler is not 'gcc'
 | ||
|      version 4.6 or later or if 'gcc''s 'libquadmath' library is not
 | ||
|      installed.  *Note Precision::.
 | ||
| 
 | ||
|    * '--enable-threads': Enables compilation and installation of the
 | ||
|      FFTW threads library (*note Multi-threaded FFTW::), which provides
 | ||
|      a simple interface to parallel transforms for SMP systems.  By
 | ||
|      default, the threads routines are not compiled.
 | ||
| 
 | ||
|    * '--enable-openmp': Like '--enable-threads', but using OpenMP
 | ||
|      compiler directives in order to induce parallelism rather than
 | ||
|      spawning its own threads directly, and installing an 'fftw3_omp'
 | ||
|      library rather than an 'fftw3_threads' library (*note
 | ||
|      Multi-threaded FFTW::).  You can use both '--enable-openmp' and
 | ||
|      '--enable-threads' since they compile/install libraries with
 | ||
|      different names.  By default, the OpenMP routines are not compiled.
 | ||
| 
 | ||
|    * '--with-combined-threads': By default, if '--enable-threads' is
 | ||
|      used, the threads support is compiled into a separate library that
 | ||
|      must be linked in addition to the main FFTW library.  This is so
 | ||
|      that users of the serial library do not need to link the system
 | ||
|      threads libraries.  If '--with-combined-threads' is specified,
 | ||
|      however, then no separate threads library is created, and threads
 | ||
|      are included in the main FFTW library.  This is mainly useful under
 | ||
|      Windows, where no system threads library is required and
 | ||
|      inter-library dependencies are problematic.
 | ||
| 
 | ||
|    * '--enable-mpi': Enables compilation and installation of the FFTW
 | ||
|      MPI library (*note Distributed-memory FFTW with MPI::), which
 | ||
|      provides parallel transforms for distributed-memory systems with
 | ||
|      MPI. (By default, the MPI routines are not compiled.)  *Note FFTW
 | ||
|      MPI Installation::.
 | ||
| 
 | ||
|    * '--disable-fortran': Disables inclusion of legacy-Fortran wrapper
 | ||
|      routines (*note Calling FFTW from Legacy Fortran::) in the standard
 | ||
|      FFTW libraries.  These wrapper routines increase the library size
 | ||
|      by only a negligible amount, so they are included by default as
 | ||
|      long as the 'configure' script finds a Fortran compiler on your
 | ||
|      system.  (To specify a particular Fortran compiler foo, pass
 | ||
|      'F77='foo to 'configure'.)
 | ||
| 
 | ||
|    * '--with-g77-wrappers': By default, when Fortran wrappers are
 | ||
|      included, the wrappers employ the linking conventions of the
 | ||
|      Fortran compiler detected by the 'configure' script.  If this
 | ||
|      compiler is GNU 'g77', however, then _two_ versions of the wrappers
 | ||
|      are included: one with 'g77''s idiosyncratic convention of
 | ||
|      appending two underscores to identifiers, and one with the more
 | ||
|      common convention of appending only a single underscore.  This way,
 | ||
|      the same FFTW library will work with both 'g77' and other Fortran
 | ||
|      compilers, such as GNU 'gfortran'.  However, the converse is not
 | ||
|      true: if you configure with a different compiler, then the
 | ||
|      'g77'-compatible wrappers are not included.  By specifying
 | ||
|      '--with-g77-wrappers', the 'g77'-compatible wrappers are included
 | ||
|      in addition to wrappers for whatever Fortran compiler 'configure'
 | ||
|      finds.
 | ||
| 
 | ||
|    * '--with-slow-timer': Disables the use of hardware cycle counters,
 | ||
|      and falls back on 'gettimeofday' or 'clock'.  This greatly worsens
 | ||
|      performance, and should generally not be used (unless you don't
 | ||
|      have a cycle counter but still really want an optimized plan
 | ||
|      regardless of the time).  *Note Cycle Counters::.
 | ||
| 
 | ||
|    * '--enable-sse' (single precision), '--enable-sse2' (single,
 | ||
|      double), '--enable-avx' (single, double), '--enable-avx2' (single,
 | ||
|      double), '--enable-avx512' (single, double),
 | ||
|      '--enable-avx-128-fma', '--enable-kcvi' (single),
 | ||
|      '--enable-altivec' (single), '--enable-vsx' (single, double),
 | ||
|      '--enable-neon' (single, double on aarch64),
 | ||
|      '--enable-generic-simd128', and '--enable-generic-simd256':
 | ||
| 
 | ||
|      Enable various SIMD instruction sets.  You need compiler that
 | ||
|      supports the given SIMD extensions, but FFTW will try to detect at
 | ||
|      runtime whether the CPU supports these extensions.  That is, you
 | ||
|      can compile with'--enable-avx' and the code will still run on a CPU
 | ||
|      without AVX support.
 | ||
| 
 | ||
|         - These options require a compiler supporting SIMD extensions,
 | ||
|           and compiler support is always a bit flaky: see the FFTW FAQ
 | ||
|           for a list of compiler versions that have problems compiling
 | ||
|           FFTW.
 | ||
|         - Because of the large variety of ARM processors and ABIs, FFTW
 | ||
|           does not attempt to guess the correct 'gcc' flags for
 | ||
|           generating NEON code.  In general, you will have to provide
 | ||
|           them on the command line.  This command line is known to have
 | ||
|           worked at least once:
 | ||
|                ./configure --with-slow-timer --host=arm-linux-gnueabi \
 | ||
|                  --enable-single --enable-neon \
 | ||
|                  "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
 | ||
| 
 | ||
|    To force 'configure' to use a particular C compiler foo (instead of
 | ||
| the default, usually 'gcc'), pass 'CC='foo to the 'configure' script;
 | ||
| you may also need to set the flags via the variable 'CFLAGS' as
 | ||
| described above.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Installation on non-Unix systems,  Next: Cycle Counters,  Prev: Installation on Unix,  Up: Installation and Customization
 | ||
| 
 | ||
| 10.2 Installation on non-Unix systems
 | ||
| =====================================
 | ||
| 
 | ||
| It should be relatively straightforward to compile FFTW even on non-Unix
 | ||
| systems lacking the niceties of a 'configure' script.  Basically, you
 | ||
| need to edit the 'config.h' header (copy it from 'config.h.in') to
 | ||
| '#define' the various options and compiler characteristics, and then
 | ||
| compile all the '.c' files in the relevant directories.
 | ||
| 
 | ||
|    The 'config.h' header contains about 100 options to set, each one
 | ||
| initially an '#undef', each documented with a comment, and most of them
 | ||
| fairly obvious.  For most of the options, you should simply '#define'
 | ||
| them to '1' if they are applicable, although a few options require a
 | ||
| particular value (e.g.  'SIZEOF_LONG_LONG' should be defined to the size
 | ||
| of the 'long long' type, in bytes, or zero if it is not supported).  We
 | ||
| will likely post some sample 'config.h' files for various operating
 | ||
| systems and compilers for you to use (at least as a starting point).
 | ||
| Please let us know if you have to hand-create a configuration file
 | ||
| (and/or a pre-compiled binary) that you want to share.
 | ||
| 
 | ||
|    To create the FFTW library, you will then need to compile all of the
 | ||
| '.c' files in the 'kernel', 'dft', 'dft/scalar', 'dft/scalar/codelets',
 | ||
| 'rdft', 'rdft/scalar', 'rdft/scalar/r2cf', 'rdft/scalar/r2cb',
 | ||
| 'rdft/scalar/r2r', 'reodft', and 'api' directories.  If you are
 | ||
| compiling with SIMD support (e.g.  you defined 'HAVE_SSE2' in
 | ||
| 'config.h'), then you also need to compile the '.c' files in the
 | ||
| 'simd-support', '{dft,rdft}/simd', '{dft,rdft}/simd/*' directories.
 | ||
| 
 | ||
|    Once these files are all compiled, link them into a library, or a
 | ||
| shared library, or directly into your program.
 | ||
| 
 | ||
|    To compile the FFTW test program, additionally compile the code in
 | ||
| the 'libbench2/' directory, and link it into a library.  Then compile
 | ||
| the code in the 'tests/' directory and link it to the 'libbench2' and
 | ||
| FFTW libraries.  To compile the 'fftw-wisdom' (command-line) tool (*note
 | ||
| Wisdom Utilities::), compile 'tools/fftw-wisdom.c' and link it to the
 | ||
| 'libbench2' and FFTW libraries
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Cycle Counters,  Next: Generating your own code,  Prev: Installation on non-Unix systems,  Up: Installation and Customization
 | ||
| 
 | ||
| 10.3 Cycle Counters
 | ||
| ===================
 | ||
| 
 | ||
| FFTW's planner actually executes and times different possible FFT
 | ||
| algorithms in order to pick the fastest plan for a given n.  In order to
 | ||
| do this in as short a time as possible, however, the timer must have a
 | ||
| very high resolution, and to accomplish this we employ the hardware
 | ||
| "cycle counters" that are available on most CPUs.  Currently, FFTW
 | ||
| supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC
 | ||
| (SPARC v9), IA64, PA-RISC, and MIPS processors.
 | ||
| 
 | ||
|    Access to the cycle counters, unfortunately, is a compiler and/or
 | ||
| operating-system dependent task, often requiring inline assembly
 | ||
| language, and it may be that your compiler is not supported.  If you are
 | ||
| _not_ supported, FFTW will by default fall back on its estimator
 | ||
| (effectively using 'FFTW_ESTIMATE' for all plans).
 | ||
| 
 | ||
|    You can add support by editing the file 'kernel/cycle.h'; normally,
 | ||
| this will involve adapting one of the examples already present in order
 | ||
| to use the inline-assembler syntax for your C compiler, and will only
 | ||
| require a couple of lines of code.  Anyone adding support for a new
 | ||
| system to 'cycle.h' is encouraged to email us at <fftw@fftw.org>.
 | ||
| 
 | ||
|    If a cycle counter is not available on your system (e.g.  some
 | ||
| embedded processor), and you don't want to use estimated plans, as a
 | ||
| last resort you can use the '--with-slow-timer' option to 'configure'
 | ||
| (on Unix) or '#define WITH_SLOW_TIMER' in 'config.h' (elsewhere).  This
 | ||
| will use the much lower-resolution 'gettimeofday' function, or even
 | ||
| 'clock' if the former is unavailable, and planning will be extremely
 | ||
| slow.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Generating your own code,  Prev: Cycle Counters,  Up: Installation and Customization
 | ||
| 
 | ||
| 10.4 Generating your own code
 | ||
| =============================
 | ||
| 
 | ||
| The directory 'genfft' contains the programs that were used to generate
 | ||
| FFTW's "codelets," which are hard-coded transforms of small sizes.  We
 | ||
| do not expect casual users to employ the generator, which is a rather
 | ||
| sophisticated program that generates directed acyclic graphs of FFT
 | ||
| algorithms and performs algebraic simplifications on them.  It was
 | ||
| written in Objective Caml, a dialect of ML, which is available at
 | ||
| <http://caml.inria.fr/ocaml/index.en.html>.
 | ||
| 
 | ||
|    If you have Objective Caml installed (along with recent versions of
 | ||
| GNU 'autoconf', 'automake', and 'libtool'), then you can change the set
 | ||
| of codelets that are generated or play with the generation options.  The
 | ||
| set of generated codelets is specified by the
 | ||
| '{dft,rdft}/{codelets,simd}/*/Makefile.am' files.  For example, you can
 | ||
| add efficient REDFT codelets of small sizes by modifying
 | ||
| 'rdft/codelets/r2r/Makefile.am'.  After you modify any 'Makefile.am'
 | ||
| files, you can type 'sh bootstrap.sh' in the top-level directory
 | ||
| followed by 'make' to re-generate the files.
 | ||
| 
 | ||
|    We do not provide more details about the code-generation process,
 | ||
| since we do not expect that most users will need to generate their own
 | ||
| code.  However, feel free to contact us at <fftw@fftw.org> if you are
 | ||
| interested in the subject.
 | ||
| 
 | ||
|    You might find it interesting to learn Caml and/or some modern
 | ||
| programming techniques that we used in the generator (including monadic
 | ||
| programming), especially if you heard the rumor that Java and
 | ||
| object-oriented programming are the latest advancement in the field.
 | ||
| The internal operation of the codelet generator is described in the
 | ||
| paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
 | ||
| available from the FFTW home page (http://www.fftw.org) and also
 | ||
| appeared in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
 | ||
| Programming Language Design and Implementation (PLDI)'.
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: Acknowledgments,  Next: License and Copyright,  Prev: Installation and Customization,  Up: Top
 | ||
| 
 | ||
| 11 Acknowledgments
 | ||
| ******************
 | ||
| 
 | ||
| Matteo Frigo was supported in part by the Special Research Program SFB
 | ||
| F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln
 | ||
| Laboratory.  For previous versions of FFTW, he was supported in part by
 | ||
| the Defense Advanced Research Projects Agency (DARPA), under Grants
 | ||
| N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
 | ||
| Corporation Fellowship.
 | ||
| 
 | ||
|    Steven G. Johnson was supported in part by a Dept. of Defense NDSEG
 | ||
| Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
 | ||
| Research Science and Engineering Center program of the National Science
 | ||
| Foundation under award DMR-9400334.
 | ||
| 
 | ||
|    Code for the Cell Broadband Engine was graciously donated to the FFTW
 | ||
| project by the IBM Austin Research Lab and included in fftw-3.2.  (This
 | ||
| code was removed in fftw-3.3.)
 | ||
| 
 | ||
|    Code for the MIPS paired-single SIMD support was graciously donated
 | ||
| to the FFTW project by CodeSourcery, Inc.
 | ||
| 
 | ||
|    We are grateful to Sun Microsystems Inc. for its donation of a
 | ||
| cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak).  These
 | ||
| machines served as the primary platform for the development of early
 | ||
| versions of FFTW.
 | ||
| 
 | ||
|    We thank Intel Corporation for donating a four-processor Pentium Pro
 | ||
| machine.  We thank the GNU/Linux community for giving us a decent OS to
 | ||
| run on that machine.
 | ||
| 
 | ||
|    We are thankful to the AMD corporation for donating an AMD Athlon XP
 | ||
| 1700+ computer to the FFTW project.
 | ||
| 
 | ||
|    We thank the Compaq/HP testdrive program and VA Software Corporation
 | ||
| (SourceForge.net) for providing remote access to machines that were used
 | ||
| to test FFTW.
 | ||
| 
 | ||
|    The 'genfft' suite of code generators was written using Objective
 | ||
| Caml, a dialect of ML. Objective Caml is a small and elegant language
 | ||
| developed by Xavier Leroy.  The implementation is available from
 | ||
| 'http://caml.inria.fr/' (http://caml.inria.fr/).  In previous releases
 | ||
| of FFTW, 'genfft' was written in Caml Light, by the same authors.  An
 | ||
| even earlier implementation of 'genfft' was written in Scheme, but Caml
 | ||
| is definitely better for this kind of application.
 | ||
| 
 | ||
|    FFTW uses many tools from the GNU project, including 'automake',
 | ||
| 'texinfo', and 'libtool'.
 | ||
| 
 | ||
|    Prof. Charles E. Leiserson of MIT provided continuous support and
 | ||
| encouragement.  This program would not exist without him.  Charles also
 | ||
| proposed the name "codelets" for the basic FFT blocks.
 | ||
| 
 | ||
|    Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance
 | ||
| of Steven's "extra-curricular" computer-science activities, as well as
 | ||
| remarkable creativity in working them into his grant proposals.
 | ||
| Steven's physics degree would not exist without him.
 | ||
| 
 | ||
|    Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
 | ||
| led to the SIMD support in FFTW 3.
 | ||
| 
 | ||
|    Stefan Kral wrote most of the K7 code generator distributed with FFTW
 | ||
| 3.0.x and 3.1.x.
 | ||
| 
 | ||
|    Andrew Sterian contributed the Windows timing code in FFTW 2.
 | ||
| 
 | ||
|    Didier Miras reported a bug in the test procedure used in FFTW 1.2.
 | ||
| We now use a completely different test algorithm by Funda Ergun that
 | ||
| does not require a separate FFT program to compare against.
 | ||
| 
 | ||
|    Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
 | ||
| that help portability.
 | ||
| 
 | ||
|    Ming-Chang Liu uncovered a well-hidden bug in the complex transforms
 | ||
| of FFTW 2.0 and supplied a patch to correct it.
 | ||
| 
 | ||
|    The FFTW FAQ was written in 'bfnn' (Bizarre Format With No Name) and
 | ||
| formatted using the tools developed by Ian Jackson for the Linux FAQ.
 | ||
| 
 | ||
|    _We are especially thankful to all of our users for their continuing
 | ||
| support, feedback, and interest during our development of FFTW._
 | ||
| 
 | ||
| 
 | ||
| File: fftw3.info,  Node: License and Copyright,  Next: Concept Index,  Prev: Acknowledgments,  Up: Top
 | ||
| 
 | ||
| 12 License and Copyright
 | ||
| ************************
 | ||
| 
 | ||
| FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003,
 | ||
| 2007-11 Massachusetts Institute of Technology.
 | ||
| 
 | ||
|    FFTW is free software; you can redistribute it and/or modify it under
 | ||
| the terms of the GNU General Public License as published by the Free
 | ||
| Software Foundation; either version 2 of the License, or (at your
 | ||
| option) any later version.
 | ||
| 
 | ||
|    This program is distributed in the hope that it will be useful, but
 | ||
| WITHOUT ANY WARRANTY; without even the implied warranty of
 | ||
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
 | ||
| Public License for more details.
 | ||
| 
 | ||
|    You should have received a copy of the GNU General Public License
 | ||
| along with this program; if not, write to the Free Software Foundation,
 | ||
| Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You can
 | ||
| also find the GPL on the GNU web site
 | ||
| (http://www.gnu.org/licenses/gpl-2.0.html).
 | ||
| 
 | ||
|    In addition, we kindly ask you to acknowledge FFTW and its authors in
 | ||
| any program or publication in which you use FFTW. (You are not
 | ||
| _required_ to do so; it is up to your common sense to decide whether you
 | ||
| want to comply with this request or not.)  For general publications, we
 | ||
| suggest referencing: Matteo Frigo and Steven G. Johnson, "The design and
 | ||
| implementation of FFTW3," Proc.  IEEE 93 (2), 216-231 (2005).
 | ||
| 
 | ||
|    Non-free versions of FFTW are available under terms different from
 | ||
| those of the General Public License.  (e.g.  they do not require you to
 | ||
| accompany any object code using FFTW with the corresponding source
 | ||
| code.)  For these alternative terms you must purchase a license from
 | ||
| MIT's Technology Licensing Office.  Users interested in such a license
 | ||
| should contact us (<fftw@fftw.org>) for more information.
 | ||
| 
 | 
