6305 lines
		
	
	
		
			293 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			6305 lines
		
	
	
		
			293 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
								 | 
							
								This is fftw3.info, produced by makeinfo version 6.7 from fftw3.texi.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This manual is for FFTW (version 3.3.10, 10 December 2020).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Copyright (C) 2003 Matteo Frigo.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Copyright (C) 2003 Massachusetts Institute of Technology.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     Permission is granted to make and distribute verbatim copies of
							 | 
						|||
| 
								 | 
							
								     this manual provided the copyright notice and this permission
							 | 
						|||
| 
								 | 
							
								     notice are preserved on all copies.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     Permission is granted to copy and distribute modified versions of
							 | 
						|||
| 
								 | 
							
								     this manual under the conditions for verbatim copying, provided
							 | 
						|||
| 
								 | 
							
								     that the entire resulting derived work is distributed under the
							 | 
						|||
| 
								 | 
							
								     terms of a permission notice identical to this one.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     Permission is granted to copy and distribute translations of this
							 | 
						|||
| 
								 | 
							
								     manual into another language, under the above conditions for
							 | 
						|||
| 
								 | 
							
								     modified versions, except that this permission notice may be stated
							 | 
						|||
| 
								 | 
							
								     in a translation approved by the Free Software Foundation.
							 | 
						|||
| 
								 | 
							
								INFO-DIR-SECTION Development
							 | 
						|||
| 
								 | 
							
								START-INFO-DIR-ENTRY
							 | 
						|||
| 
								 | 
							
								* fftw3: (fftw3).	FFTW User's Manual.
							 | 
						|||
| 
								 | 
							
								END-INFO-DIR-ENTRY
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Top,  Next: Introduction,  Prev: (dir),  Up: (dir)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW User Manual
							 | 
						|||
| 
								 | 
							
								****************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is a
							 | 
						|||
| 
								 | 
							
								collection of fast C routines to compute the discrete Fourier transform.
							 | 
						|||
| 
								 | 
							
								This manual documents FFTW version 3.3.10.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Introduction::
							 | 
						|||
| 
								 | 
							
								* Tutorial::
							 | 
						|||
| 
								 | 
							
								* Other Important Topics::
							 | 
						|||
| 
								 | 
							
								* FFTW Reference::
							 | 
						|||
| 
								 | 
							
								* Multi-threaded FFTW::
							 | 
						|||
| 
								 | 
							
								* Distributed-memory FFTW with MPI::
							 | 
						|||
| 
								 | 
							
								* Calling FFTW from Modern Fortran::
							 | 
						|||
| 
								 | 
							
								* Calling FFTW from Legacy Fortran::
							 | 
						|||
| 
								 | 
							
								* Upgrading from FFTW version 2::
							 | 
						|||
| 
								 | 
							
								* Installation and Customization::
							 | 
						|||
| 
								 | 
							
								* Acknowledgments::
							 | 
						|||
| 
								 | 
							
								* License and Copyright::
							 | 
						|||
| 
								 | 
							
								* Concept Index::
							 | 
						|||
| 
								 | 
							
								* Library Index::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Introduction,  Next: Tutorial,  Prev: Top,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								1 Introduction
							 | 
						|||
| 
								 | 
							
								**************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This manual documents version 3.3.10 of FFTW, the _Fastest Fourier
							 | 
						|||
| 
								 | 
							
								Transform in the West_.  FFTW is a comprehensive collection of fast C
							 | 
						|||
| 
								 | 
							
								routines for computing the discrete Fourier transform (DFT) and various
							 | 
						|||
| 
								 | 
							
								special cases thereof.
							 | 
						|||
| 
								 | 
							
								   * FFTW computes the DFT of complex data, real data, even- or
							 | 
						|||
| 
								 | 
							
								     odd-symmetric real data (these symmetric transforms are usually
							 | 
						|||
| 
								 | 
							
								     known as the discrete cosine or sine transform, respectively), and
							 | 
						|||
| 
								 | 
							
								     the discrete Hartley transform (DHT) of real data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The input data can have arbitrary length.  FFTW employs O(n log n)
							 | 
						|||
| 
								 | 
							
								     algorithms for all lengths, including prime numbers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * FFTW supports arbitrary multi-dimensional data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX,
							 | 
						|||
| 
								 | 
							
								     and NEON vector instruction sets.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * FFTW includes parallel (multi-threaded) transforms for
							 | 
						|||
| 
								 | 
							
								     shared-memory systems.
							 | 
						|||
| 
								 | 
							
								   * Starting with version 3.3, FFTW includes distributed-memory
							 | 
						|||
| 
								 | 
							
								     parallel transforms using MPI.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We assume herein that you are familiar with the properties and uses
							 | 
						|||
| 
								 | 
							
								of the DFT that are relevant to your application.  Otherwise, see e.g.
							 | 
						|||
| 
								 | 
							
								'The Fast Fourier Transform and Its Applications' by E. O. Brigham
							 | 
						|||
| 
								 | 
							
								(Prentice-Hall, Englewood Cliffs, NJ, 1988).  Our web page
							 | 
						|||
| 
								 | 
							
								(http://www.fftw.org) also has links to FFT-related information online.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In order to use FFTW effectively, you need to learn one basic concept
							 | 
						|||
| 
								 | 
							
								of FFTW's internal structure: FFTW does not use a fixed algorithm for
							 | 
						|||
| 
								 | 
							
								computing the transform, but instead it adapts the DFT algorithm to
							 | 
						|||
| 
								 | 
							
								details of the underlying hardware in order to maximize performance.
							 | 
						|||
| 
								 | 
							
								Hence, the computation of the transform is split into two phases.
							 | 
						|||
| 
								 | 
							
								First, FFTW's "planner" "learns" the fastest way to compute the
							 | 
						|||
| 
								 | 
							
								transform on your machine.  The planner produces a data structure called
							 | 
						|||
| 
								 | 
							
								a "plan" that contains this information.  Subsequently, the plan is
							 | 
						|||
| 
								 | 
							
								"executed" to transform the array of input data as dictated by the plan.
							 | 
						|||
| 
								 | 
							
								The plan can be reused as many times as needed.  In typical
							 | 
						|||
| 
								 | 
							
								high-performance applications, many transforms of the same size are
							 | 
						|||
| 
								 | 
							
								computed and, consequently, a relatively expensive initialization of
							 | 
						|||
| 
								 | 
							
								this sort is acceptable.  On the other hand, if you need a single
							 | 
						|||
| 
								 | 
							
								transform of a given size, the one-time cost of the planner becomes
							 | 
						|||
| 
								 | 
							
								significant.  For this case, FFTW provides fast planners based on
							 | 
						|||
| 
								 | 
							
								heuristics or on previously computed plans.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW supports transforms of data with arbitrary length, rank,
							 | 
						|||
| 
								 | 
							
								multiplicity, and a general memory layout.  In simple cases, however,
							 | 
						|||
| 
								 | 
							
								this generality may be unnecessary and confusing.  Consequently, we
							 | 
						|||
| 
								 | 
							
								organized the interface to FFTW into three levels of increasing
							 | 
						|||
| 
								 | 
							
								generality.
							 | 
						|||
| 
								 | 
							
								   * The "basic interface" computes a single transform of contiguous
							 | 
						|||
| 
								 | 
							
								     data.
							 | 
						|||
| 
								 | 
							
								   * The "advanced interface" computes transforms of multiple or strided
							 | 
						|||
| 
								 | 
							
								     arrays.
							 | 
						|||
| 
								 | 
							
								   * The "guru interface" supports the most general data layouts,
							 | 
						|||
| 
								 | 
							
								     multiplicities, and strides.
							 | 
						|||
| 
								 | 
							
								   We expect that most users will be best served by the basic interface,
							 | 
						|||
| 
								 | 
							
								whereas the guru interface requires careful attention to the
							 | 
						|||
| 
								 | 
							
								documentation to avoid problems.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Besides the automatic performance adaptation performed by the
							 | 
						|||
| 
								 | 
							
								planner, it is also possible for advanced users to customize FFTW
							 | 
						|||
| 
								 | 
							
								manually.  For example, if code space is a concern, we provide a tool
							 | 
						|||
| 
								 | 
							
								that links only the subset of FFTW needed by your application.
							 | 
						|||
| 
								 | 
							
								Conversely, you may need to extend FFTW because the standard
							 | 
						|||
| 
								 | 
							
								distribution is not sufficient for your needs.  For example, the
							 | 
						|||
| 
								 | 
							
								standard FFTW distribution works most efficiently for arrays whose size
							 | 
						|||
| 
								 | 
							
								can be factored into small primes (2, 3, 5, and 7), and otherwise it
							 | 
						|||
| 
								 | 
							
								uses a slower general-purpose routine.  If you need efficient transforms
							 | 
						|||
| 
								 | 
							
								of other sizes, you can use FFTW's code generator, which produces fast C
							 | 
						|||
| 
								 | 
							
								programs ("codelets") for any particular array size you may care about.
							 | 
						|||
| 
								 | 
							
								For example, if you need transforms of size 513 = 19 x 3^3, you can
							 | 
						|||
| 
								 | 
							
								customize FFTW to support the factor 19 efficiently.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For more information regarding FFTW, see the paper, "The Design and
							 | 
						|||
| 
								 | 
							
								Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an
							 | 
						|||
| 
								 | 
							
								invited paper in 'Proc. IEEE' 93 (2), p.  216 (2005).  The code
							 | 
						|||
| 
								 | 
							
								generator is described in the paper "A fast Fourier transform compiler",
							 | 
						|||
| 
								 | 
							
								by M. Frigo, in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
							 | 
						|||
| 
								 | 
							
								Programming Language Design and Implementation (PLDI), Atlanta, Georgia,
							 | 
						|||
| 
								 | 
							
								May 1999'.  These papers, along with the latest version of FFTW, the
							 | 
						|||
| 
								 | 
							
								FAQ, benchmarks, and other links, are available at the FFTW home page
							 | 
						|||
| 
								 | 
							
								(http://www.fftw.org).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The current version of FFTW incorporates many good ideas from the
							 | 
						|||
| 
								 | 
							
								past thirty years of FFT literature.  In one way or another, FFTW uses
							 | 
						|||
| 
								 | 
							
								the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
							 | 
						|||
| 
								 | 
							
								algorithm for prime sizes, and a split-radix algorithm (with a
							 | 
						|||
| 
								 | 
							
								"conjugate-pair" variation pointed out to us by Dan Bernstein).  FFTW's
							 | 
						|||
| 
								 | 
							
								code generator also produces new algorithms that we do not completely
							 | 
						|||
| 
								 | 
							
								understand.  The reader is referred to the cited papers for the
							 | 
						|||
| 
								 | 
							
								appropriate references.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The rest of this manual is organized as follows.  We first discuss
							 | 
						|||
| 
								 | 
							
								the sequential (single-processor) implementation.  We start by
							 | 
						|||
| 
								 | 
							
								describing the basic interface/features of FFTW in *note Tutorial::.
							 | 
						|||
| 
								 | 
							
								Next, *note Other Important Topics:: discusses data alignment (*note
							 | 
						|||
| 
								 | 
							
								SIMD alignment and fftw_malloc::), the storage scheme of
							 | 
						|||
| 
								 | 
							
								multi-dimensional arrays (*note Multi-dimensional Array Format::), and
							 | 
						|||
| 
								 | 
							
								FFTW's mechanism for storing plans on disk (*note Words of Wisdom-Saving
							 | 
						|||
| 
								 | 
							
								Plans::).  Next, *note FFTW Reference:: provides comprehensive
							 | 
						|||
| 
								 | 
							
								documentation of all FFTW's features.  Parallel transforms are discussed
							 | 
						|||
| 
								 | 
							
								in their own chapters: *note Multi-threaded FFTW:: and *note
							 | 
						|||
| 
								 | 
							
								Distributed-memory FFTW with MPI::.  Fortran programmers can also use
							 | 
						|||
| 
								 | 
							
								FFTW, as described in *note Calling FFTW from Legacy Fortran:: and *note
							 | 
						|||
| 
								 | 
							
								Calling FFTW from Modern Fortran::.  *note Installation and
							 | 
						|||
| 
								 | 
							
								Customization:: explains how to install FFTW in your computer system and
							 | 
						|||
| 
								 | 
							
								how to adapt FFTW to your needs.  License and copyright information is
							 | 
						|||
| 
								 | 
							
								given in *note License and Copyright::.  Finally, we thank all the
							 | 
						|||
| 
								 | 
							
								people who helped us in *note Acknowledgments::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Tutorial,  Next: Other Important Topics,  Prev: Introduction,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2 Tutorial
							 | 
						|||
| 
								 | 
							
								**********
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Complex One-Dimensional DFTs::
							 | 
						|||
| 
								 | 
							
								* Complex Multi-Dimensional DFTs::
							 | 
						|||
| 
								 | 
							
								* One-Dimensional DFTs of Real Data::
							 | 
						|||
| 
								 | 
							
								* Multi-Dimensional DFTs of Real Data::
							 | 
						|||
| 
								 | 
							
								* More DFTs of Real Data::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This chapter describes the basic usage of FFTW, i.e., how to compute the
							 | 
						|||
| 
								 | 
							
								Fourier transform of a single array.  This chapter tells the truth, but
							 | 
						|||
| 
								 | 
							
								not the _whole_ truth.  Specifically, FFTW implements additional
							 | 
						|||
| 
								 | 
							
								routines and flags that are not documented here, although in many cases
							 | 
						|||
| 
								 | 
							
								we try to indicate where added capabilities exist.  For more complete
							 | 
						|||
| 
								 | 
							
								information, see *note FFTW Reference::.  (Note that you need to compile
							 | 
						|||
| 
								 | 
							
								and install FFTW before you can use it in a program.  For the details of
							 | 
						|||
| 
								 | 
							
								the installation, see *note Installation and Customization::.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We recommend that you read this tutorial in order.(1)  At the least,
							 | 
						|||
| 
								 | 
							
								read the first section (*note Complex One-Dimensional DFTs::) before
							 | 
						|||
| 
								 | 
							
								reading any of the others, even if your main interest lies in one of the
							 | 
						|||
| 
								 | 
							
								other transform types.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Users of FFTW version 2 and earlier may also want to read *note
							 | 
						|||
| 
								 | 
							
								Upgrading from FFTW version 2::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) You can read the tutorial in bit-reversed order after computing
							 | 
						|||
| 
								 | 
							
								your first transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Complex One-Dimensional DFTs,  Next: Complex Multi-Dimensional DFTs,  Prev: Tutorial,  Up: Tutorial
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.1 Complex One-Dimensional DFTs
							 | 
						|||
| 
								 | 
							
								================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     Plan: To bother about the best method of accomplishing an
							 | 
						|||
| 
								 | 
							
								     accidental result.  [Ambrose Bierce, 'The Enlarged Devil's
							 | 
						|||
| 
								 | 
							
								     Dictionary'.]
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The basic usage of FFTW to compute a one-dimensional DFT of size 'N'
							 | 
						|||
| 
								 | 
							
								is simple, and it typically looks something like this code:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #include <fftw3.h>
							 | 
						|||
| 
								 | 
							
								     ...
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         fftw_complex *in, *out;
							 | 
						|||
| 
								 | 
							
								         fftw_plan p;
							 | 
						|||
| 
								 | 
							
								         ...
							 | 
						|||
| 
								 | 
							
								         in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
							 | 
						|||
| 
								 | 
							
								         out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
							 | 
						|||
| 
								 | 
							
								         p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								         ...
							 | 
						|||
| 
								 | 
							
								         fftw_execute(p); /* repeat as needed */
							 | 
						|||
| 
								 | 
							
								         ...
							 | 
						|||
| 
								 | 
							
								         fftw_destroy_plan(p);
							 | 
						|||
| 
								 | 
							
								         fftw_free(in); fftw_free(out);
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You must link this code with the 'fftw3' library.  On Unix systems,
							 | 
						|||
| 
								 | 
							
								link with '-lfftw3 -lm'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The example code first allocates the input and output arrays.  You
							 | 
						|||
| 
								 | 
							
								can allocate them in any way that you like, but we recommend using
							 | 
						|||
| 
								 | 
							
								'fftw_malloc', which behaves like 'malloc' except that it properly
							 | 
						|||
| 
								 | 
							
								aligns the array when SIMD instructions (such as SSE and Altivec) are
							 | 
						|||
| 
								 | 
							
								available (*note SIMD alignment and fftw_malloc::).  [Alternatively, we
							 | 
						|||
| 
								 | 
							
								provide a convenient wrapper function 'fftw_alloc_complex(N)' which has
							 | 
						|||
| 
								 | 
							
								the same effect.]
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The data is an array of type 'fftw_complex', which is by default a
							 | 
						|||
| 
								 | 
							
								'double[2]' composed of the real ('in[i][0]') and imaginary ('in[i][1]')
							 | 
						|||
| 
								 | 
							
								parts of a complex number.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The next step is to create a "plan", which is an object that contains
							 | 
						|||
| 
								 | 
							
								all the data that FFTW needs to compute the FFT. This function creates
							 | 
						|||
| 
								 | 
							
								the plan:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The first argument, 'n', is the size of the transform you are trying
							 | 
						|||
| 
								 | 
							
								to compute.  The size 'n' can be any positive integer, but sizes that
							 | 
						|||
| 
								 | 
							
								are products of small factors are transformed most efficiently (although
							 | 
						|||
| 
								 | 
							
								prime sizes still use an O(n log n) algorithm).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The next two arguments are pointers to the input and output arrays of
							 | 
						|||
| 
								 | 
							
								the transform.  These pointers can be equal, indicating an "in-place"
							 | 
						|||
| 
								 | 
							
								transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The fourth argument, 'sign', can be either 'FFTW_FORWARD' ('-1') or
							 | 
						|||
| 
								 | 
							
								'FFTW_BACKWARD' ('+1'), and indicates the direction of the transform you
							 | 
						|||
| 
								 | 
							
								are interested in; technically, it is the sign of the exponent in the
							 | 
						|||
| 
								 | 
							
								transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'flags' argument is usually either 'FFTW_MEASURE' or
							 | 
						|||
| 
								 | 
							
								'FFTW_ESTIMATE'.  'FFTW_MEASURE' instructs FFTW to run and measure the
							 | 
						|||
| 
								 | 
							
								execution time of several FFTs in order to find the best way to compute
							 | 
						|||
| 
								 | 
							
								the transform of size 'n'.  This process takes some time (usually a few
							 | 
						|||
| 
								 | 
							
								seconds), depending on your machine and on the size of the transform.
							 | 
						|||
| 
								 | 
							
								'FFTW_ESTIMATE', on the contrary, does not run any computation and just
							 | 
						|||
| 
								 | 
							
								builds a reasonable plan that is probably sub-optimal.  In short, if
							 | 
						|||
| 
								 | 
							
								your program performs many transforms of the same size and
							 | 
						|||
| 
								 | 
							
								initialization time is not important, use 'FFTW_MEASURE'; otherwise use
							 | 
						|||
| 
								 | 
							
								the estimate.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   _You must create the plan before initializing the input_, because
							 | 
						|||
| 
								 | 
							
								'FFTW_MEASURE' overwrites the 'in'/'out' arrays.  (Technically,
							 | 
						|||
| 
								 | 
							
								'FFTW_ESTIMATE' does not touch your arrays, but you should always create
							 | 
						|||
| 
								 | 
							
								plans first just to be sure.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Once the plan has been created, you can use it as many times as you
							 | 
						|||
| 
								 | 
							
								like for transforms on the specified 'in'/'out' arrays, computing the
							 | 
						|||
| 
								 | 
							
								actual transforms via 'fftw_execute(plan)':
							 | 
						|||
| 
								 | 
							
								     void fftw_execute(const fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The DFT results are stored in-order in the array 'out', with the
							 | 
						|||
| 
								 | 
							
								zero-frequency (DC) component in 'out[0]'.  If 'in != out', the
							 | 
						|||
| 
								 | 
							
								transform is "out-of-place" and the input array 'in' is not modified.
							 | 
						|||
| 
								 | 
							
								Otherwise, the input array is overwritten with the transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you want to transform a _different_ array of the same size, you
							 | 
						|||
| 
								 | 
							
								can create a new plan with 'fftw_plan_dft_1d' and FFTW automatically
							 | 
						|||
| 
								 | 
							
								reuses the information from the previous plan, if possible.
							 | 
						|||
| 
								 | 
							
								Alternatively, with the "guru" interface you can apply a given plan to a
							 | 
						|||
| 
								 | 
							
								different array, if you are careful.  *Note FFTW Reference::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   When you are done with the plan, you deallocate it by calling
							 | 
						|||
| 
								 | 
							
								'fftw_destroy_plan(plan)':
							 | 
						|||
| 
								 | 
							
								     void fftw_destroy_plan(fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								   If you allocate an array with 'fftw_malloc()' you must deallocate it
							 | 
						|||
| 
								 | 
							
								with 'fftw_free()'.  Do not use 'free()' or, heaven forbid, 'delete'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW computes an _unnormalized_ DFT. Thus, computing a forward
							 | 
						|||
| 
								 | 
							
								followed by a backward transform (or vice versa) results in the original
							 | 
						|||
| 
								 | 
							
								array scaled by 'n'.  For the definition of the DFT, see *note What FFTW
							 | 
						|||
| 
								 | 
							
								Really Computes::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you have a C compiler, such as 'gcc', that supports the C99
							 | 
						|||
| 
								 | 
							
								standard, and you '#include <complex.h>' _before_ '<fftw3.h>', then
							 | 
						|||
| 
								 | 
							
								'fftw_complex' is the native double-precision complex type and you can
							 | 
						|||
| 
								 | 
							
								manipulate it with ordinary arithmetic.  Otherwise, FFTW defines its own
							 | 
						|||
| 
								 | 
							
								complex type, which is bit-compatible with the C99 complex type.  *Note
							 | 
						|||
| 
								 | 
							
								Complex numbers::.  (The C++ '<complex>' template class may also be
							 | 
						|||
| 
								 | 
							
								usable via a typecast.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To use single or long-double precision versions of FFTW, replace the
							 | 
						|||
| 
								 | 
							
								'fftw_' prefix by 'fftwf_' or 'fftwl_' and link with '-lfftw3f' or
							 | 
						|||
| 
								 | 
							
								'-lfftw3l', but use the _same_ '<fftw3.h>' header file.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Many more flags exist besides 'FFTW_MEASURE' and 'FFTW_ESTIMATE'.
							 | 
						|||
| 
								 | 
							
								For example, use 'FFTW_PATIENT' if you're willing to wait even longer
							 | 
						|||
| 
								 | 
							
								for a possibly even faster plan (*note FFTW Reference::).  You can also
							 | 
						|||
| 
								 | 
							
								save plans for future use, as described by *note Words of Wisdom-Saving
							 | 
						|||
| 
								 | 
							
								Plans::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Complex Multi-Dimensional DFTs,  Next: One-Dimensional DFTs of Real Data,  Prev: Complex One-Dimensional DFTs,  Up: Tutorial
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.2 Complex Multi-Dimensional DFTs
							 | 
						|||
| 
								 | 
							
								==================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Multi-dimensional transforms work much the same way as one-dimensional
							 | 
						|||
| 
								 | 
							
								transforms: you allocate arrays of 'fftw_complex' (preferably using
							 | 
						|||
| 
								 | 
							
								'fftw_malloc'), create an 'fftw_plan', execute it as many times as you
							 | 
						|||
| 
								 | 
							
								want with 'fftw_execute(plan)', and clean up with
							 | 
						|||
| 
								 | 
							
								'fftw_destroy_plan(plan)' (and 'fftw_free').
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW provides two routines for creating plans for 2d and 3d
							 | 
						|||
| 
								 | 
							
								transforms, and one routine for creating plans of arbitrary
							 | 
						|||
| 
								 | 
							
								dimensionality.  The 2d and 3d routines have the following signature:
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_2d(int n0, int n1,
							 | 
						|||
| 
								 | 
							
								                                fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These routines create plans for 'n0' by 'n1' two-dimensional (2d)
							 | 
						|||
| 
								 | 
							
								transforms and 'n0' by 'n1' by 'n2' 3d transforms, respectively.  All of
							 | 
						|||
| 
								 | 
							
								these transforms operate on contiguous arrays in the C-standard
							 | 
						|||
| 
								 | 
							
								"row-major" order, so that the last dimension has the fastest-varying
							 | 
						|||
| 
								 | 
							
								index in the array.  This layout is described further in *note
							 | 
						|||
| 
								 | 
							
								Multi-dimensional Array Format::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW can also compute transforms of higher dimensionality.  In order
							 | 
						|||
| 
								 | 
							
								to avoid confusion between the various meanings of the the word
							 | 
						|||
| 
								 | 
							
								"dimension", we use the term _rank_ to denote the number of independent
							 | 
						|||
| 
								 | 
							
								indices in an array.(1)  For example, we say that a 2d transform has
							 | 
						|||
| 
								 | 
							
								rank 2, a 3d transform has rank 3, and so on.  You can plan transforms
							 | 
						|||
| 
								 | 
							
								of arbitrary rank by means of the following function:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft(int rank, const int *n,
							 | 
						|||
| 
								 | 
							
								                             fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                             int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, 'n' is a pointer to an array 'n[rank]' denoting an 'n[0]' by
							 | 
						|||
| 
								 | 
							
								'n[1]' by ... by 'n[rank-1]' transform.  Thus, for example, the call
							 | 
						|||
| 
								 | 
							
								     fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
							 | 
						|||
| 
								 | 
							
								   is equivalent to the following code fragment:
							 | 
						|||
| 
								 | 
							
								     int n[2];
							 | 
						|||
| 
								 | 
							
								     n[0] = n0;
							 | 
						|||
| 
								 | 
							
								     n[1] = n1;
							 | 
						|||
| 
								 | 
							
								     fftw_plan_dft(2, n, in, out, sign, flags);
							 | 
						|||
| 
								 | 
							
								   'fftw_plan_dft' is not restricted to 2d and 3d transforms, however,
							 | 
						|||
| 
								 | 
							
								but it can plan transforms of arbitrary rank.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You may have noticed that all the planner routines described so far
							 | 
						|||
| 
								 | 
							
								have overlapping functionality.  For example, you can plan a 1d or 2d
							 | 
						|||
| 
								 | 
							
								transform by using 'fftw_plan_dft' with a 'rank' of '1' or '2', or even
							 | 
						|||
| 
								 | 
							
								by calling 'fftw_plan_dft_3d' with 'n0' and/or 'n1' equal to '1' (with
							 | 
						|||
| 
								 | 
							
								no loss in efficiency).  This pattern continues, and FFTW's planning
							 | 
						|||
| 
								 | 
							
								routines in general form a "partial order," sequences of interfaces with
							 | 
						|||
| 
								 | 
							
								strictly increasing generality but correspondingly greater complexity.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_plan_dft' is the most general complex-DFT routine that we
							 | 
						|||
| 
								 | 
							
								describe in this tutorial, but there are also the advanced and guru
							 | 
						|||
| 
								 | 
							
								interfaces, which allow one to efficiently combine multiple/strided
							 | 
						|||
| 
								 | 
							
								transforms into a single FFTW plan, transform a subset of a larger
							 | 
						|||
| 
								 | 
							
								multi-dimensional array, and/or to handle more general complex-number
							 | 
						|||
| 
								 | 
							
								formats.  For more information, see *note FFTW Reference::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) The term "rank" is commonly used in the APL, FORTRAN, and Common
							 | 
						|||
| 
								 | 
							
								Lisp traditions, although it is not so common in the C world.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: One-Dimensional DFTs of Real Data,  Next: Multi-Dimensional DFTs of Real Data,  Prev: Complex Multi-Dimensional DFTs,  Up: Tutorial
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.3 One-Dimensional DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								=====================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In many practical applications, the input data 'in[i]' are purely real
							 | 
						|||
| 
								 | 
							
								numbers, in which case the DFT output satisfies the "Hermitian"
							 | 
						|||
| 
								 | 
							
								redundancy: 'out[i]' is the conjugate of 'out[n-i]'.  It is possible to
							 | 
						|||
| 
								 | 
							
								take advantage of these circumstances in order to achieve roughly a
							 | 
						|||
| 
								 | 
							
								factor of two improvement in both speed and memory usage.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In exchange for these speed and space advantages, the user sacrifices
							 | 
						|||
| 
								 | 
							
								some of the simplicity of FFTW's complex transforms.  First of all, the
							 | 
						|||
| 
								 | 
							
								input and output arrays are of _different sizes and types_: the input is
							 | 
						|||
| 
								 | 
							
								'n' real numbers, while the output is 'n/2+1' complex numbers (the
							 | 
						|||
| 
								 | 
							
								non-redundant outputs); this also requires slight "padding" of the input
							 | 
						|||
| 
								 | 
							
								array for in-place transforms.  Second, the inverse transform (complex
							 | 
						|||
| 
								 | 
							
								to real) has the side-effect of _overwriting its input array_, by
							 | 
						|||
| 
								 | 
							
								default.  Neither of these inconveniences should pose a serious problem
							 | 
						|||
| 
								 | 
							
								for users, but it is important to be aware of them.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The routines to perform real-data transforms are almost the same as
							 | 
						|||
| 
								 | 
							
								those for complex transforms: you allocate arrays of 'double' and/or
							 | 
						|||
| 
								 | 
							
								'fftw_complex' (preferably using 'fftw_malloc' or 'fftw_alloc_complex'),
							 | 
						|||
| 
								 | 
							
								create an 'fftw_plan', execute it as many times as you want with
							 | 
						|||
| 
								 | 
							
								'fftw_execute(plan)', and clean up with 'fftw_destroy_plan(plan)' (and
							 | 
						|||
| 
								 | 
							
								'fftw_free').  The only differences are that the input (or output) is of
							 | 
						|||
| 
								 | 
							
								type 'double' and there are new routines to create the plan.  In one
							 | 
						|||
| 
								 | 
							
								dimension:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   for the real input to complex-Hermitian output ("r2c") and
							 | 
						|||
| 
								 | 
							
								complex-Hermitian input to real output ("c2r") transforms.  Unlike the
							 | 
						|||
| 
								 | 
							
								complex DFT planner, there is no 'sign' argument.  Instead, r2c DFTs are
							 | 
						|||
| 
								 | 
							
								always 'FFTW_FORWARD' and c2r DFTs are always 'FFTW_BACKWARD'.  (For
							 | 
						|||
| 
								 | 
							
								single/long-double precision 'fftwf' and 'fftwl', 'double' should be
							 | 
						|||
| 
								 | 
							
								replaced by 'float' and 'long double', respectively.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, 'n' is the "logical" size of the DFT, not necessarily the
							 | 
						|||
| 
								 | 
							
								physical size of the array.  In particular, the real ('double') array
							 | 
						|||
| 
								 | 
							
								has 'n' elements, while the complex ('fftw_complex') array has 'n/2+1'
							 | 
						|||
| 
								 | 
							
								elements (where the division is rounded down).  For an in-place
							 | 
						|||
| 
								 | 
							
								transform, 'in' and 'out' are aliased to the same array, which must be
							 | 
						|||
| 
								 | 
							
								big enough to hold both; so, the real array would actually have
							 | 
						|||
| 
								 | 
							
								'2*(n/2+1)' elements, where the elements beyond the first 'n' are unused
							 | 
						|||
| 
								 | 
							
								padding.  (Note that this is very different from the concept of
							 | 
						|||
| 
								 | 
							
								"zero-padding" a transform to a larger length, which changes the logical
							 | 
						|||
| 
								 | 
							
								size of the DFT by actually adding new input data.)  The kth element of
							 | 
						|||
| 
								 | 
							
								the complex array is exactly the same as the kth element of the
							 | 
						|||
| 
								 | 
							
								corresponding complex DFT. All positive 'n' are supported; products of
							 | 
						|||
| 
								 | 
							
								small factors are most efficient, but an O(n log n) algorithm is used
							 | 
						|||
| 
								 | 
							
								even for prime sizes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As noted above, the c2r transform destroys its input array even for
							 | 
						|||
| 
								 | 
							
								out-of-place transforms.  This can be prevented, if necessary, by
							 | 
						|||
| 
								 | 
							
								including 'FFTW_PRESERVE_INPUT' in the 'flags', with unfortunately some
							 | 
						|||
| 
								 | 
							
								sacrifice in performance.  This flag is also not currently supported for
							 | 
						|||
| 
								 | 
							
								multi-dimensional real DFTs (next section).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Readers familiar with DFTs of real data will recall that the 0th (the
							 | 
						|||
| 
								 | 
							
								"DC") and 'n/2'-th (the "Nyquist" frequency, when 'n' is even) elements
							 | 
						|||
| 
								 | 
							
								of the complex output are purely real.  Some implementations therefore
							 | 
						|||
| 
								 | 
							
								store the Nyquist element where the DC imaginary part would go, in order
							 | 
						|||
| 
								 | 
							
								to make the input and output arrays the same size.  Such packing,
							 | 
						|||
| 
								 | 
							
								however, does not generalize well to multi-dimensional transforms, and
							 | 
						|||
| 
								 | 
							
								the space savings are miniscule in any case; FFTW does not support it.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   An alternative interface for one-dimensional r2c and c2r DFTs can be
							 | 
						|||
| 
								 | 
							
								found in the 'r2r' interface (*note The Halfcomplex-format DFT::), with
							 | 
						|||
| 
								 | 
							
								"halfcomplex"-format output that _is_ the same size (and type) as the
							 | 
						|||
| 
								 | 
							
								input array.  That interface, although it is not very useful for
							 | 
						|||
| 
								 | 
							
								multi-dimensional transforms, may sometimes yield better performance.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Multi-Dimensional DFTs of Real Data,  Next: More DFTs of Real Data,  Prev: One-Dimensional DFTs of Real Data,  Up: Tutorial
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.4 Multi-Dimensional DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								=======================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Multi-dimensional DFTs of real data use the following planner routines:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
							 | 
						|||
| 
								 | 
							
								                                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
							 | 
						|||
| 
								 | 
							
								                                 double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                 unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   as well as the corresponding 'c2r' routines with the input/output
							 | 
						|||
| 
								 | 
							
								types swapped.  These routines work similarly to their complex
							 | 
						|||
| 
								 | 
							
								analogues, except for the fact that here the complex output array is cut
							 | 
						|||
| 
								 | 
							
								roughly in half and the real array requires padding for in-place
							 | 
						|||
| 
								 | 
							
								transforms (as in 1d, above).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As before, 'n' is the logical size of the array, and the consequences
							 | 
						|||
| 
								 | 
							
								of this on the the format of the complex arrays deserve careful
							 | 
						|||
| 
								 | 
							
								attention.  Suppose that the real data has dimensions n[0] x n[1] x n[2]
							 | 
						|||
| 
								 | 
							
								x ...  x n[d-1] (in row-major order).  Then, after an r2c transform, the
							 | 
						|||
| 
								 | 
							
								output is an n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) array of
							 | 
						|||
| 
								 | 
							
								'fftw_complex' values in row-major order, corresponding to slightly over
							 | 
						|||
| 
								 | 
							
								half of the output of the corresponding complex DFT. (The division is
							 | 
						|||
| 
								 | 
							
								rounded down.)  The ordering of the data is otherwise exactly the same
							 | 
						|||
| 
								 | 
							
								as in the complex-DFT case.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For out-of-place transforms, this is the end of the story: the real
							 | 
						|||
| 
								 | 
							
								data is stored as a row-major array of size n[0] x n[1] x n[2] x ...  x
							 | 
						|||
| 
								 | 
							
								n[d-1] and the complex data is stored as a row-major array of size n[0]
							 | 
						|||
| 
								 | 
							
								x n[1] x n[2] x ...  x (n[d-1]/2 + 1) .
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For in-place transforms, however, extra padding of the real-data
							 | 
						|||
| 
								 | 
							
								array is necessary because the complex array is larger than the real
							 | 
						|||
| 
								 | 
							
								array, and the two arrays share the same memory locations.  Thus, for
							 | 
						|||
| 
								 | 
							
								in-place transforms, the final dimension of the real-data array must be
							 | 
						|||
| 
								 | 
							
								padded with extra values to accommodate the size of the complex
							 | 
						|||
| 
								 | 
							
								data--two values if the last dimension is even and one if it is odd.
							 | 
						|||
| 
								 | 
							
								That is, the last dimension of the real data must physically contain 2 *
							 | 
						|||
| 
								 | 
							
								(n[d-1]/2+1) 'double' values (exactly enough to hold the complex data).
							 | 
						|||
| 
								 | 
							
								This physical array size does not, however, change the _logical_ array
							 | 
						|||
| 
								 | 
							
								size--only n[d-1] values are actually stored in the last dimension, and
							 | 
						|||
| 
								 | 
							
								n[d-1] is the last dimension passed to the plan-creation routine.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, consider the transform of a two-dimensional real array
							 | 
						|||
| 
								 | 
							
								of size 'n0' by 'n1'.  The output of the r2c transform is a
							 | 
						|||
| 
								 | 
							
								two-dimensional complex array of size 'n0' by 'n1/2+1', where the 'y'
							 | 
						|||
| 
								 | 
							
								dimension has been cut nearly in half because of redundancies in the
							 | 
						|||
| 
								 | 
							
								output.  Because 'fftw_complex' is twice the size of 'double', the
							 | 
						|||
| 
								 | 
							
								output array is slightly bigger than the input array.  Thus, if we want
							 | 
						|||
| 
								 | 
							
								to compute the transform in place, we must _pad_ the input array so that
							 | 
						|||
| 
								 | 
							
								it is of size 'n0' by '2*(n1/2+1)'.  If 'n1' is even, then there are two
							 | 
						|||
| 
								 | 
							
								padding elements at the end of each row (which need not be initialized,
							 | 
						|||
| 
								 | 
							
								as they are only used for output).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These transforms are unnormalized, so an r2c followed by a c2r
							 | 
						|||
| 
								 | 
							
								transform (or vice versa) will result in the original data scaled by the
							 | 
						|||
| 
								 | 
							
								number of real data elements--that is, the product of the (logical)
							 | 
						|||
| 
								 | 
							
								dimensions of the real data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (Because the last dimension is treated specially, if it is equal to
							 | 
						|||
| 
								 | 
							
								'1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
							 | 
						|||
| 
								 | 
							
								transform.  In that case, the last complex dimension also has size '1'
							 | 
						|||
| 
								 | 
							
								('=1/2+1'), and no advantage is gained over the complex transforms.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: More DFTs of Real Data,  Prev: Multi-Dimensional DFTs of Real Data,  Up: Tutorial
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.5 More DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								==========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* The Halfcomplex-format DFT::
							 | 
						|||
| 
								 | 
							
								* Real even/odd DFTs (cosine/sine transforms)::
							 | 
						|||
| 
								 | 
							
								* The Discrete Hartley Transform::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW supports several other transform types via a unified "r2r"
							 | 
						|||
| 
								 | 
							
								(real-to-real) interface, so called because it takes a real ('double')
							 | 
						|||
| 
								 | 
							
								array and outputs a real array of the same size.  These r2r transforms
							 | 
						|||
| 
								 | 
							
								currently fall into three categories: DFTs of real input and
							 | 
						|||
| 
								 | 
							
								complex-Hermitian output in halfcomplex format, DFTs of real input with
							 | 
						|||
| 
								 | 
							
								even/odd symmetry (a.k.a.  discrete cosine/sine transforms, DCTs/DSTs),
							 | 
						|||
| 
								 | 
							
								and discrete Hartley transforms (DHTs), all described in more detail by
							 | 
						|||
| 
								 | 
							
								the following sections.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The r2r transforms follow the by now familiar interface of creating
							 | 
						|||
| 
								 | 
							
								an 'fftw_plan', executing it with 'fftw_execute(plan)', and destroying
							 | 
						|||
| 
								 | 
							
								it with 'fftw_destroy_plan(plan)'.  Furthermore, all r2r transforms
							 | 
						|||
| 
								 | 
							
								share the same planner interface:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind0, fftw_r2r_kind kind1,
							 | 
						|||
| 
								 | 
							
								                                unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind0,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind1,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind2,
							 | 
						|||
| 
								 | 
							
								                                unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                             const fftw_r2r_kind *kind, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
							 | 
						|||
| 
								 | 
							
								transforms for contiguous arrays in row-major order, transforming (real)
							 | 
						|||
| 
								 | 
							
								input to output of the same size, where 'n' specifies the _physical_
							 | 
						|||
| 
								 | 
							
								dimensions of the arrays.  All positive 'n' are supported (with the
							 | 
						|||
| 
								 | 
							
								exception of 'n=1' for the 'FFTW_REDFT00' kind, noted in the real-even
							 | 
						|||
| 
								 | 
							
								subsection below); products of small factors are most efficient
							 | 
						|||
| 
								 | 
							
								(factorizing 'n-1' and 'n+1' for 'FFTW_REDFT00' and 'FFTW_RODFT00'
							 | 
						|||
| 
								 | 
							
								kinds, described below), but an O(n log n) algorithm is used even for
							 | 
						|||
| 
								 | 
							
								prime sizes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Each dimension has a "kind" parameter, of type 'fftw_r2r_kind',
							 | 
						|||
| 
								 | 
							
								specifying the kind of r2r transform to be used for that dimension.  (In
							 | 
						|||
| 
								 | 
							
								the case of 'fftw_plan_r2r', this is an array 'kind[rank]' where
							 | 
						|||
| 
								 | 
							
								'kind[i]' is the transform kind for the dimension 'n[i]'.)  The kind can
							 | 
						|||
| 
								 | 
							
								be one of a set of predefined constants, defined in the following
							 | 
						|||
| 
								 | 
							
								subsections.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In other words, FFTW computes the separable product of the specified
							 | 
						|||
| 
								 | 
							
								r2r transforms over each dimension, which can be used e.g.  for partial
							 | 
						|||
| 
								 | 
							
								differential equations with mixed boundary conditions.  (For some r2r
							 | 
						|||
| 
								 | 
							
								kinds, notably the halfcomplex DFT and the DHT, such a separable product
							 | 
						|||
| 
								 | 
							
								is somewhat problematic in more than one dimension, however, as is
							 | 
						|||
| 
								 | 
							
								described below.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In the current version of FFTW, all r2r transforms except for the
							 | 
						|||
| 
								 | 
							
								halfcomplex type are computed via pre- or post-processing of halfcomplex
							 | 
						|||
| 
								 | 
							
								transforms, and they are therefore not as fast as they could be.  Since
							 | 
						|||
| 
								 | 
							
								most other general DCT/DST codes employ a similar algorithm, however,
							 | 
						|||
| 
								 | 
							
								FFTW's implementation should provide at least competitive performance.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: The Halfcomplex-format DFT,  Next: Real even/odd DFTs (cosine/sine transforms),  Prev: More DFTs of Real Data,  Up: More DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.5.1 The Halfcomplex-format DFT
							 | 
						|||
| 
								 | 
							
								--------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An r2r kind of 'FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
							 | 
						|||
| 
								 | 
							
								One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
							 | 
						|||
| 
								 | 
							
								output, and may sometimes be faster and/or more convenient than the
							 | 
						|||
| 
								 | 
							
								latter.  The inverse "hc2r" transform is of kind 'FFTW_HC2R'.  This
							 | 
						|||
| 
								 | 
							
								consists of the non-redundant half of the complex output for a 1d
							 | 
						|||
| 
								 | 
							
								real-input DFT of size 'n', stored as a sequence of 'n' real numbers
							 | 
						|||
| 
								 | 
							
								('double') in the format:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, rk is the real part of the kth output, and ik is the imaginary
							 | 
						|||
| 
								 | 
							
								part.  (Division by 2 is rounded down.)  For a halfcomplex array
							 | 
						|||
| 
								 | 
							
								'hc[n]', the kth component thus has its real part in 'hc[k]' and its
							 | 
						|||
| 
								 | 
							
								imaginary part in 'hc[n-k]', with the exception of 'k' '==' '0' or 'n/2'
							 | 
						|||
| 
								 | 
							
								(the latter only if 'n' is even)--in these two cases, the imaginary part
							 | 
						|||
| 
								 | 
							
								is zero due to symmetries of the real-input DFT, and is not stored.
							 | 
						|||
| 
								 | 
							
								Thus, the r2hc transform of 'n' real values is a halfcomplex array of
							 | 
						|||
| 
								 | 
							
								length 'n', and vice versa for hc2r.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Aside from the differing format, the output of
							 | 
						|||
| 
								 | 
							
								'FFTW_R2HC'/'FFTW_HC2R' is otherwise exactly the same as for the
							 | 
						|||
| 
								 | 
							
								corresponding 1d r2c/c2r transform (i.e.  'FFTW_FORWARD'/'FFTW_BACKWARD'
							 | 
						|||
| 
								 | 
							
								transforms, respectively).  Recall that these transforms are
							 | 
						|||
| 
								 | 
							
								unnormalized, so r2hc followed by hc2r will result in the original data
							 | 
						|||
| 
								 | 
							
								multiplied by 'n'.  Furthermore, like the c2r transform, an out-of-place
							 | 
						|||
| 
								 | 
							
								hc2r transform will _destroy its input_ array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Although these halfcomplex transforms can be used with the
							 | 
						|||
| 
								 | 
							
								multi-dimensional r2r interface, the interpretation of such a separable
							 | 
						|||
| 
								 | 
							
								product of transforms along each dimension is problematic.  For example,
							 | 
						|||
| 
								 | 
							
								consider a two-dimensional 'n0' by 'n1', r2hc by r2hc transform planned
							 | 
						|||
| 
								 | 
							
								by 'fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC,
							 | 
						|||
| 
								 | 
							
								FFTW_MEASURE)'.  Conceptually, FFTW first transforms the rows (of size
							 | 
						|||
| 
								 | 
							
								'n1') to produce halfcomplex rows, and then transforms the columns (of
							 | 
						|||
| 
								 | 
							
								size 'n0').  Half of these column transforms, however, are of imaginary
							 | 
						|||
| 
								 | 
							
								parts, and should therefore be multiplied by i and combined with the
							 | 
						|||
| 
								 | 
							
								r2hc transforms of the real columns to produce the 2d DFT amplitudes;
							 | 
						|||
| 
								 | 
							
								FFTW's r2r transform does _not_ perform this combination for you.  Thus,
							 | 
						|||
| 
								 | 
							
								if a multi-dimensional real-input/output DFT is required, we recommend
							 | 
						|||
| 
								 | 
							
								using the ordinary r2c/c2r interface (*note Multi-Dimensional DFTs of
							 | 
						|||
| 
								 | 
							
								Real Data::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Real even/odd DFTs (cosine/sine transforms),  Next: The Discrete Hartley Transform,  Prev: The Halfcomplex-format DFT,  Up: More DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.5.2 Real even/odd DFTs (cosine/sine transforms)
							 | 
						|||
| 
								 | 
							
								-------------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The Fourier transform of a real-even function f(-x) = f(x) is real-even,
							 | 
						|||
| 
								 | 
							
								and i times the Fourier transform of a real-odd function f(-x) = -f(x)
							 | 
						|||
| 
								 | 
							
								is real-odd.  Similar results hold for a discrete Fourier transform, and
							 | 
						|||
| 
								 | 
							
								thus for these symmetries the need for complex inputs/outputs is
							 | 
						|||
| 
								 | 
							
								entirely eliminated.  Moreover, one gains a factor of two in speed/space
							 | 
						|||
| 
								 | 
							
								from the fact that the data are real, and an additional factor of two
							 | 
						|||
| 
								 | 
							
								from the even/odd symmetry: only the non-redundant (first) half of the
							 | 
						|||
| 
								 | 
							
								array need be stored.  The result is the real-even DFT ("REDFT") and the
							 | 
						|||
| 
								 | 
							
								real-odd DFT ("RODFT"), also known as the discrete cosine and sine
							 | 
						|||
| 
								 | 
							
								transforms ("DCT" and "DST"), respectively.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (In this section, we describe the 1d transforms; multi-dimensional
							 | 
						|||
| 
								 | 
							
								transforms are just a separable product of these transforms operating
							 | 
						|||
| 
								 | 
							
								along each dimension.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Because of the discrete sampling, one has an additional choice: is
							 | 
						|||
| 
								 | 
							
								the data even/odd around a sampling point, or around the point halfway
							 | 
						|||
| 
								 | 
							
								between two samples?  The latter corresponds to _shifting_ the samples
							 | 
						|||
| 
								 | 
							
								by _half_ an interval, and gives rise to several transform variants
							 | 
						|||
| 
								 | 
							
								denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate whether
							 | 
						|||
| 
								 | 
							
								the input (a) and/or output (b) are shifted by half a sample (1 means it
							 | 
						|||
| 
								 | 
							
								is shifted).  These are also known as types I-IV of the DCT and DST, and
							 | 
						|||
| 
								 | 
							
								all four types are supported by FFTW's r2r interface.(1)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The r2r kinds for the various REDFT and RODFT types supported by
							 | 
						|||
| 
								 | 
							
								FFTW, along with the boundary conditions at both ends of the _input_
							 | 
						|||
| 
								 | 
							
								array ('n' real numbers 'in[j=0..n-1]'), are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
							 | 
						|||
| 
								 | 
							
								     around j=n-0.5.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
							 | 
						|||
| 
								 | 
							
								     around j=n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that these symmetries apply to the "logical" array being
							 | 
						|||
| 
								 | 
							
								transformed; *there are no constraints on your physical input data*.
							 | 
						|||
| 
								 | 
							
								So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
							 | 
						|||
| 
								 | 
							
								abcde, it corresponds to the DFT of the logical even array abcdedcb of
							 | 
						|||
| 
								 | 
							
								size 8.  A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
							 | 
						|||
| 
								 | 
							
								size-8 logical DFT of the even array abcddcba, shifted by half a sample.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   All of these transforms are invertible.  The inverse of R*DFT00 is
							 | 
						|||
| 
								 | 
							
								R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
							 | 
						|||
| 
								 | 
							
								simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
							 | 
						|||
| 
								 | 
							
								However, the transforms computed by FFTW are unnormalized, exactly like
							 | 
						|||
| 
								 | 
							
								the corresponding real and complex DFTs, so computing a transform
							 | 
						|||
| 
								 | 
							
								followed by its inverse yields the original array scaled by N, where N
							 | 
						|||
| 
								 | 
							
								is the _logical_ DFT size.  For REDFT00, N=2(n-1); for RODFT00,
							 | 
						|||
| 
								 | 
							
								N=2(n+1); otherwise, N=2n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that the boundary conditions of the transform output array are
							 | 
						|||
| 
								 | 
							
								given by the input boundary conditions of the inverse transform.  Thus,
							 | 
						|||
| 
								 | 
							
								the above transforms are all inequivalent in terms of input/output
							 | 
						|||
| 
								 | 
							
								boundary conditions, even neglecting the 0.5 shift difference.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW is most efficient when N is a product of small factors; note
							 | 
						|||
| 
								 | 
							
								that this _differs_ from the factorization of the physical size 'n' for
							 | 
						|||
| 
								 | 
							
								REDFT00 and RODFT00!  There is another oddity: 'n=1' REDFT00 transforms
							 | 
						|||
| 
								 | 
							
								correspond to N=0, and so are _not defined_ (the planner will return
							 | 
						|||
| 
								 | 
							
								'NULL').  Otherwise, any positive 'n' is supported.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the precise mathematical definitions of these transforms as used
							 | 
						|||
| 
								 | 
							
								by FFTW, see *note What FFTW Really Computes::.  (For people accustomed
							 | 
						|||
| 
								 | 
							
								to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
							 | 
						|||
| 
								 | 
							
								the cos/sin functions so that they correspond precisely to an even/odd
							 | 
						|||
| 
								 | 
							
								DFT of size N. Some authors also include additional multiplicative
							 | 
						|||
| 
								 | 
							
								factors of sqrt(2) for selected inputs and outputs; this makes the
							 | 
						|||
| 
								 | 
							
								transform orthogonal, but sacrifices the direct equivalence to a
							 | 
						|||
| 
								 | 
							
								symmetric DFT.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Which type do you need?
							 | 
						|||
| 
								 | 
							
								.......................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Since the required flavor of even/odd DFT depends upon your problem, you
							 | 
						|||
| 
								 | 
							
								are the best judge of this choice, but we can make a few comments on
							 | 
						|||
| 
								 | 
							
								relative efficiency to help you in your selection.  In particular,
							 | 
						|||
| 
								 | 
							
								R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
							 | 
						|||
| 
								 | 
							
								for odd sizes), while the R*DFT00 transforms are sometimes significantly
							 | 
						|||
| 
								 | 
							
								slower (especially for even sizes).(2)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Thus, if only the boundary conditions on the transform inputs are
							 | 
						|||
| 
								 | 
							
								specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
							 | 
						|||
| 
								 | 
							
								R*DFT11 (unless the half-sample shift or the self-inverse property is
							 | 
						|||
| 
								 | 
							
								significant for your problem).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If performance is important to you and you are using only small sizes
							 | 
						|||
| 
								 | 
							
								(say n<200), e.g.  for multi-dimensional transforms, then you might
							 | 
						|||
| 
								 | 
							
								consider generating hard-coded transforms of those sizes and types that
							 | 
						|||
| 
								 | 
							
								you are interested in (*note Generating your own code::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We are interested in hearing what types of symmetric transforms you
							 | 
						|||
| 
								 | 
							
								find most useful.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) There are also type V-VIII transforms, which correspond to a
							 | 
						|||
| 
								 | 
							
								logical DFT of _odd_ size N, independent of whether the physical size
							 | 
						|||
| 
								 | 
							
								'n' is odd, but we do not support these variants.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (2) R*DFT00 is sometimes slower in FFTW because we discovered that
							 | 
						|||
| 
								 | 
							
								the standard algorithm for computing this by a pre/post-processed real
							 | 
						|||
| 
								 | 
							
								DFT--the algorithm used in FFTPACK, Numerical Recipes, and other sources
							 | 
						|||
| 
								 | 
							
								for decades now--has serious numerical problems: it already loses
							 | 
						|||
| 
								 | 
							
								several decimal places of accuracy for 16k sizes.  There seem to be only
							 | 
						|||
| 
								 | 
							
								two alternatives in the literature that do not suffer similarly: a
							 | 
						|||
| 
								 | 
							
								recursive decomposition into smaller DCTs, which would require a large
							 | 
						|||
| 
								 | 
							
								set of codelets for efficiency and generality, or sacrificing a factor
							 | 
						|||
| 
								 | 
							
								of 2 in speed to use a real DFT of twice the size.  We currently employ
							 | 
						|||
| 
								 | 
							
								the latter technique for general n, as well as a limited form of the
							 | 
						|||
| 
								 | 
							
								former method: a split-radix decomposition when n is odd (N a multiple
							 | 
						|||
| 
								 | 
							
								of 4).  For N containing many factors of 2, the split-radix method seems
							 | 
						|||
| 
								 | 
							
								to recover most of the speed of the standard algorithm without the
							 | 
						|||
| 
								 | 
							
								accuracy tradeoff.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: The Discrete Hartley Transform,  Prev: Real even/odd DFTs (cosine/sine transforms),  Up: More DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								2.5.3 The Discrete Hartley Transform
							 | 
						|||
| 
								 | 
							
								------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								If you are planning to use the DHT because you've heard that it is
							 | 
						|||
| 
								 | 
							
								"faster" than the DFT (FFT), *stop here*.  The DHT is not faster than
							 | 
						|||
| 
								 | 
							
								the DFT. That story is an old but enduring misconception that was
							 | 
						|||
| 
								 | 
							
								debunked in 1987.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The discrete Hartley transform (DHT) is an invertible linear
							 | 
						|||
| 
								 | 
							
								transform closely related to the DFT. In the DFT, one multiplies each
							 | 
						|||
| 
								 | 
							
								input by cos - i * sin (a complex exponential), whereas in the DHT each
							 | 
						|||
| 
								 | 
							
								input is multiplied by simply cos + sin.  Thus, the DHT transforms 'n'
							 | 
						|||
| 
								 | 
							
								real numbers to 'n' real numbers, and has the convenient property of
							 | 
						|||
| 
								 | 
							
								being its own inverse.  In FFTW, a DHT (of any positive 'n') can be
							 | 
						|||
| 
								 | 
							
								specified by an r2r kind of 'FFTW_DHT'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
							 | 
						|||
| 
								 | 
							
								size 'n' followed by another DHT of the same size will result in the
							 | 
						|||
| 
								 | 
							
								original array multiplied by 'n'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The DHT was originally proposed as a more efficient alternative to
							 | 
						|||
| 
								 | 
							
								the DFT for real data, but it was subsequently shown that a specialized
							 | 
						|||
| 
								 | 
							
								DFT (such as FFTW's r2hc or r2c transforms) could be just as fast.  In
							 | 
						|||
| 
								 | 
							
								FFTW, the DHT is actually computed by post-processing an r2hc transform,
							 | 
						|||
| 
								 | 
							
								so there is ordinarily no reason to prefer it from a performance
							 | 
						|||
| 
								 | 
							
								perspective.(1)  However, we have heard rumors that the DHT might be the
							 | 
						|||
| 
								 | 
							
								most appropriate transform in its own right for certain applications,
							 | 
						|||
| 
								 | 
							
								and we would be very interested to hear from anyone who finds it useful.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If 'FFTW_DHT' is specified for multiple dimensions of a
							 | 
						|||
| 
								 | 
							
								multi-dimensional transform, FFTW computes the separable product of 1d
							 | 
						|||
| 
								 | 
							
								DHTs along each dimension.  Unfortunately, this is not quite the same
							 | 
						|||
| 
								 | 
							
								thing as a true multi-dimensional DHT; you can compute the latter, if
							 | 
						|||
| 
								 | 
							
								necessary, with at most 'rank-1' post-processing passes [see e.g.  H.
							 | 
						|||
| 
								 | 
							
								Hao and R. N. Bracewell, Proc.  IEEE 75, 264-266 (1987)].
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the precise mathematical definition of the DHT as used by FFTW,
							 | 
						|||
| 
								 | 
							
								see *note What FFTW Really Computes::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) We provide the DHT mainly as a byproduct of some internal
							 | 
						|||
| 
								 | 
							
								algorithms.  FFTW computes a real input/output DFT of _prime_ size by
							 | 
						|||
| 
								 | 
							
								re-expressing it as a DHT plus post/pre-processing and then using
							 | 
						|||
| 
								 | 
							
								Rader's prime-DFT algorithm adapted to the DHT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Other Important Topics,  Next: FFTW Reference,  Prev: Tutorial,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3 Other Important Topics
							 | 
						|||
| 
								 | 
							
								************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* SIMD alignment and fftw_malloc::
							 | 
						|||
| 
								 | 
							
								* Multi-dimensional Array Format::
							 | 
						|||
| 
								 | 
							
								* Words of Wisdom-Saving Plans::
							 | 
						|||
| 
								 | 
							
								* Caveats in Using Wisdom::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: SIMD alignment and fftw_malloc,  Next: Multi-dimensional Array Format,  Prev: Other Important Topics,  Up: Other Important Topics
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.1 SIMD alignment and fftw_malloc
							 | 
						|||
| 
								 | 
							
								==================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								SIMD, which stands for "Single Instruction Multiple Data," is a set of
							 | 
						|||
| 
								 | 
							
								special operations supported by some processors to perform a single
							 | 
						|||
| 
								 | 
							
								operation on several numbers (usually 2 or 4) simultaneously.  SIMD
							 | 
						|||
| 
								 | 
							
								floating-point instructions are available on several popular CPUs:
							 | 
						|||
| 
								 | 
							
								SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
							 | 
						|||
| 
								 | 
							
								VSX on some POWER/PowerPCs, NEON on some ARM models.  FFTW can be
							 | 
						|||
| 
								 | 
							
								compiled to support the SIMD instructions on any of these systems.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   A program linking to an FFTW library compiled with SIMD support can
							 | 
						|||
| 
								 | 
							
								obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
							 | 
						|||
| 
								 | 
							
								In order to obtain this speedup, however, the arrays of complex (or
							 | 
						|||
| 
								 | 
							
								real) data passed to FFTW must be specially aligned in memory (typically
							 | 
						|||
| 
								 | 
							
								16-byte aligned), and often this alignment is more stringent than that
							 | 
						|||
| 
								 | 
							
								provided by the usual 'malloc' (etc.)  allocation routines.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In order to guarantee proper alignment for SIMD, therefore, in case
							 | 
						|||
| 
								 | 
							
								your program is ever linked against a SIMD-using FFTW, we recommend
							 | 
						|||
| 
								 | 
							
								allocating your transform data with 'fftw_malloc' and de-allocating it
							 | 
						|||
| 
								 | 
							
								with 'fftw_free'.  These have exactly the same interface and behavior as
							 | 
						|||
| 
								 | 
							
								'malloc'/'free', except that for a SIMD FFTW they ensure that the
							 | 
						|||
| 
								 | 
							
								returned pointer has the necessary alignment (by calling 'memalign' or
							 | 
						|||
| 
								 | 
							
								its equivalent on your OS).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You are not _required_ to use 'fftw_malloc'.  You can allocate your
							 | 
						|||
| 
								 | 
							
								data in any way that you like, from 'malloc' to 'new' (in C++) to a
							 | 
						|||
| 
								 | 
							
								fixed-size array declaration.  If the array happens not to be properly
							 | 
						|||
| 
								 | 
							
								aligned, FFTW will not use the SIMD extensions.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Since 'fftw_malloc' only ever needs to be used for real and complex
							 | 
						|||
| 
								 | 
							
								arrays, we provide two convenient wrapper routines 'fftw_alloc_real(N)'
							 | 
						|||
| 
								 | 
							
								and 'fftw_alloc_complex(N)' that are equivalent to
							 | 
						|||
| 
								 | 
							
								'(double*)fftw_malloc(sizeof(double) * N)' and
							 | 
						|||
| 
								 | 
							
								'(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively (or
							 | 
						|||
| 
								 | 
							
								their equivalents in other precisions).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Multi-dimensional Array Format,  Next: Words of Wisdom-Saving Plans,  Prev: SIMD alignment and fftw_malloc,  Up: Other Important Topics
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2 Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								==================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This section describes the format in which multi-dimensional arrays are
							 | 
						|||
| 
								 | 
							
								stored in FFTW. We felt that a detailed discussion of this topic was
							 | 
						|||
| 
								 | 
							
								necessary.  Since several different formats are common, this topic is
							 | 
						|||
| 
								 | 
							
								often a source of confusion.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Row-major Format::
							 | 
						|||
| 
								 | 
							
								* Column-major Format::
							 | 
						|||
| 
								 | 
							
								* Fixed-size Arrays in C::
							 | 
						|||
| 
								 | 
							
								* Dynamic Arrays in C::
							 | 
						|||
| 
								 | 
							
								* Dynamic Arrays in C-The Wrong Way::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Row-major Format,  Next: Column-major Format,  Prev: Multi-dimensional Array Format,  Up: Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2.1 Row-major Format
							 | 
						|||
| 
								 | 
							
								----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The multi-dimensional arrays passed to 'fftw_plan_dft' etcetera are
							 | 
						|||
| 
								 | 
							
								expected to be stored as a single contiguous block in "row-major" order
							 | 
						|||
| 
								 | 
							
								(sometimes called "C order").  Basically, this means that as you step
							 | 
						|||
| 
								 | 
							
								through adjacent memory locations, the first dimension's index varies
							 | 
						|||
| 
								 | 
							
								most slowly and the last dimension's index varies most quickly.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To be more explicit, let us consider an array of rank d whose
							 | 
						|||
| 
								 | 
							
								dimensions are n[0] x n[1] x n[2] x ...  x n[d-1] .  Now, we specify a
							 | 
						|||
| 
								 | 
							
								location in the array by a sequence of d (zero-based) indices, one for
							 | 
						|||
| 
								 | 
							
								each dimension: (i[0], i[1], ..., i[d-1]).  If the array is stored in
							 | 
						|||
| 
								 | 
							
								row-major order, then this element is located at the position i[d-1] +
							 | 
						|||
| 
								 | 
							
								n[d-1] * (i[d-2] + n[d-2] * (...  + n[1] * i[0])).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that, for the ordinary complex DFT, each element of the array
							 | 
						|||
| 
								 | 
							
								must be of type 'fftw_complex'; i.e.  a (real, imaginary) pair of
							 | 
						|||
| 
								 | 
							
								(double-precision) numbers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In the advanced FFTW interface, the physical dimensions n from which
							 | 
						|||
| 
								 | 
							
								the indices are computed can be different from (larger than) the logical
							 | 
						|||
| 
								 | 
							
								dimensions of the transform to be computed, in order to transform a
							 | 
						|||
| 
								 | 
							
								subset of a larger array.  Note also that, in the advanced interface,
							 | 
						|||
| 
								 | 
							
								the expression above is multiplied by a "stride" to get the actual array
							 | 
						|||
| 
								 | 
							
								index--this is useful in situations where each element of the
							 | 
						|||
| 
								 | 
							
								multi-dimensional array is actually a data structure (or another array),
							 | 
						|||
| 
								 | 
							
								and you just want to transform a single field.  In the basic interface,
							 | 
						|||
| 
								 | 
							
								however, the stride is 1.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Column-major Format,  Next: Fixed-size Arrays in C,  Prev: Row-major Format,  Up: Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2.2 Column-major Format
							 | 
						|||
| 
								 | 
							
								-------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Readers from the Fortran world are used to arrays stored in
							 | 
						|||
| 
								 | 
							
								"column-major" order (sometimes called "Fortran order").  This is
							 | 
						|||
| 
								 | 
							
								essentially the exact opposite of row-major order in that, here, the
							 | 
						|||
| 
								 | 
							
								_first_ dimension's index varies most quickly.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you have an array stored in column-major order and wish to
							 | 
						|||
| 
								 | 
							
								transform it using FFTW, it is quite easy to do.  When creating the
							 | 
						|||
| 
								 | 
							
								plan, simply pass the dimensions of the array to the planner in _reverse
							 | 
						|||
| 
								 | 
							
								order_.  For example, if your array is a rank three 'N x M x L' matrix
							 | 
						|||
| 
								 | 
							
								in column-major order, you should pass the dimensions of the array as if
							 | 
						|||
| 
								 | 
							
								it were an 'L x M x N' matrix (which it is, from the perspective of
							 | 
						|||
| 
								 | 
							
								FFTW). This is done for you _automatically_ by the FFTW legacy-Fortran
							 | 
						|||
| 
								 | 
							
								interface (*note Calling FFTW from Legacy Fortran::), but you must do it
							 | 
						|||
| 
								 | 
							
								manually with the modern Fortran interface (*note Reversing array
							 | 
						|||
| 
								 | 
							
								dimensions::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Fixed-size Arrays in C,  Next: Dynamic Arrays in C,  Prev: Column-major Format,  Up: Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2.3 Fixed-size Arrays in C
							 | 
						|||
| 
								 | 
							
								----------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								A multi-dimensional array whose size is declared at compile time in C is
							 | 
						|||
| 
								 | 
							
								_already_ in row-major order.  You don't have to do anything special to
							 | 
						|||
| 
								 | 
							
								transform it.  For example:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								          fftw_complex data[N0][N1][N2];
							 | 
						|||
| 
								 | 
							
								          fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								          ...
							 | 
						|||
| 
								 | 
							
								          plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
							 | 
						|||
| 
								 | 
							
								                                  FFTW_FORWARD, FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								          ...
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This will plan a 3d in-place transform of size 'N0 x N1 x N2'.
							 | 
						|||
| 
								 | 
							
								Notice how we took the address of the zero-th element to pass to the
							 | 
						|||
| 
								 | 
							
								planner (we could also have used a typecast).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   However, we tend to _discourage_ users from declaring their arrays in
							 | 
						|||
| 
								 | 
							
								this way, for two reasons.  First, this allocates the array on the stack
							 | 
						|||
| 
								 | 
							
								("automatic" storage), which has a very limited size on most operating
							 | 
						|||
| 
								 | 
							
								systems (declaring an array with more than a few thousand elements will
							 | 
						|||
| 
								 | 
							
								often cause a crash).  (You can get around this limitation on many
							 | 
						|||
| 
								 | 
							
								systems by declaring the array as 'static' and/or global, but that has
							 | 
						|||
| 
								 | 
							
								its own drawbacks.)  Second, it may not optimally align the array for
							 | 
						|||
| 
								 | 
							
								use with a SIMD FFTW (*note SIMD alignment and fftw_malloc::).  Instead,
							 | 
						|||
| 
								 | 
							
								we recommend using 'fftw_malloc', as described below.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Dynamic Arrays in C,  Next: Dynamic Arrays in C-The Wrong Way,  Prev: Fixed-size Arrays in C,  Up: Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2.4 Dynamic Arrays in C
							 | 
						|||
| 
								 | 
							
								-------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								We recommend allocating most arrays dynamically, with 'fftw_malloc'.
							 | 
						|||
| 
								 | 
							
								This isn't too hard to do, although it is not as straightforward for
							 | 
						|||
| 
								 | 
							
								multi-dimensional arrays as it is for one-dimensional arrays.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Creating the array is simple: using a dynamic-allocation routine like
							 | 
						|||
| 
								 | 
							
								'fftw_malloc', allocate an array big enough to store N 'fftw_complex'
							 | 
						|||
| 
								 | 
							
								values (for a complex DFT), where N is the product of the sizes of the
							 | 
						|||
| 
								 | 
							
								array dimensions (i.e.  the total number of complex values in the
							 | 
						|||
| 
								 | 
							
								array).  For example, here is code to allocate a 5 x 12 x 27 rank-3
							 | 
						|||
| 
								 | 
							
								array:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_complex *an_array;
							 | 
						|||
| 
								 | 
							
								     an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Accessing the array elements, however, is more tricky--you can't
							 | 
						|||
| 
								 | 
							
								simply use multiple applications of the '[]' operator like you could for
							 | 
						|||
| 
								 | 
							
								fixed-size arrays.  Instead, you have to explicitly compute the offset
							 | 
						|||
| 
								 | 
							
								into the array using the formula given earlier for row-major arrays.
							 | 
						|||
| 
								 | 
							
								For example, to reference the (i,j,k)-th element of the array allocated
							 | 
						|||
| 
								 | 
							
								above, you would use the expression 'an_array[k + 27 * (j + 12 * i)]'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This pain can be alleviated somewhat by defining appropriate macros,
							 | 
						|||
| 
								 | 
							
								or, in C++, creating a class and overloading the '()' operator.  The
							 | 
						|||
| 
								 | 
							
								recent C99 standard provides a way to reinterpret the dynamic array as a
							 | 
						|||
| 
								 | 
							
								"variable-length" multi-dimensional array amenable to '[]', but this
							 | 
						|||
| 
								 | 
							
								feature is not yet widely supported by compilers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Dynamic Arrays in C-The Wrong Way,  Prev: Dynamic Arrays in C,  Up: Multi-dimensional Array Format
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.2.5 Dynamic Arrays in C--The Wrong Way
							 | 
						|||
| 
								 | 
							
								----------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								A different method for allocating multi-dimensional arrays in C is often
							 | 
						|||
| 
								 | 
							
								suggested that is incompatible with FFTW: _using it will cause FFTW to
							 | 
						|||
| 
								 | 
							
								die a painful death_.  We discuss the technique here, however, because
							 | 
						|||
| 
								 | 
							
								it is so commonly known and used.  This method is to create arrays of
							 | 
						|||
| 
								 | 
							
								pointers of arrays of pointers of ...etcetera.  For example, the
							 | 
						|||
| 
								 | 
							
								analogue in this method to the example above is:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int i,j;
							 | 
						|||
| 
								 | 
							
								     fftw_complex ***a_bad_array;  /* another way to make a 5x12x27 array */
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
							 | 
						|||
| 
								 | 
							
								     for (i = 0; i < 5; ++i) {
							 | 
						|||
| 
								 | 
							
								          a_bad_array[i] =
							 | 
						|||
| 
								 | 
							
								             (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
							 | 
						|||
| 
								 | 
							
								          for (j = 0; j < 12; ++j)
							 | 
						|||
| 
								 | 
							
								               a_bad_array[i][j] =
							 | 
						|||
| 
								 | 
							
								                     (fftw_complex *) malloc(27 * sizeof(fftw_complex));
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As you can see, this sort of array is inconvenient to allocate (and
							 | 
						|||
| 
								 | 
							
								deallocate).  On the other hand, it has the advantage that the
							 | 
						|||
| 
								 | 
							
								(i,j,k)-th element can be referenced simply by 'a_bad_array[i][j][k]'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you like this technique and want to maximize convenience in
							 | 
						|||
| 
								 | 
							
								accessing the array, but still want to pass the array to FFTW, you can
							 | 
						|||
| 
								 | 
							
								use a hybrid method.  Allocate the array as one contiguous block, but
							 | 
						|||
| 
								 | 
							
								also declare an array of arrays of pointers that point to appropriate
							 | 
						|||
| 
								 | 
							
								places in the block.  That sort of trick is beyond the scope of this
							 | 
						|||
| 
								 | 
							
								documentation; for more information on multi-dimensional arrays in C,
							 | 
						|||
| 
								 | 
							
								see the 'comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Words of Wisdom-Saving Plans,  Next: Caveats in Using Wisdom,  Prev: Multi-dimensional Array Format,  Up: Other Important Topics
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.3 Words of Wisdom--Saving Plans
							 | 
						|||
| 
								 | 
							
								=================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW implements a method for saving plans to disk and restoring them.
							 | 
						|||
| 
								 | 
							
								In fact, what FFTW does is more general than just saving and loading
							 | 
						|||
| 
								 | 
							
								plans.  The mechanism is called "wisdom".  Here, we describe this
							 | 
						|||
| 
								 | 
							
								feature at a high level.  *Note FFTW Reference::, for a less casual but
							 | 
						|||
| 
								 | 
							
								more complete discussion of how to use wisdom in FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plans created with the 'FFTW_MEASURE', 'FFTW_PATIENT', or
							 | 
						|||
| 
								 | 
							
								'FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may
							 | 
						|||
| 
								 | 
							
								require a long time to compute because FFTW must measure the runtime of
							 | 
						|||
| 
								 | 
							
								many possible plans and select the best one.  This setup is designed for
							 | 
						|||
| 
								 | 
							
								the situations where so many transforms of the same size must be
							 | 
						|||
| 
								 | 
							
								computed that the start-up time is irrelevant.  For short initialization
							 | 
						|||
| 
								 | 
							
								times, but slower transforms, we have provided 'FFTW_ESTIMATE'.  The
							 | 
						|||
| 
								 | 
							
								'wisdom' mechanism is a way to get the best of both worlds: you compute
							 | 
						|||
| 
								 | 
							
								a good plan once, save it to disk, and later reload it as many times as
							 | 
						|||
| 
								 | 
							
								necessary.  The wisdom mechanism can actually save and reload many plans
							 | 
						|||
| 
								 | 
							
								at once, not just one.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Whenever you create a plan, the FFTW planner accumulates wisdom,
							 | 
						|||
| 
								 | 
							
								which is information sufficient to reconstruct the plan.  After
							 | 
						|||
| 
								 | 
							
								planning, you can save this information to disk by means of the
							 | 
						|||
| 
								 | 
							
								function:
							 | 
						|||
| 
								 | 
							
								     int fftw_export_wisdom_to_filename(const char *filename);
							 | 
						|||
| 
								 | 
							
								   (This function returns non-zero on success.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The next time you run the program, you can restore the wisdom with
							 | 
						|||
| 
								 | 
							
								'fftw_import_wisdom_from_filename' (which also returns non-zero on
							 | 
						|||
| 
								 | 
							
								success), and then recreate the plan using the same flags as before.
							 | 
						|||
| 
								 | 
							
								     int fftw_import_wisdom_from_filename(const char *filename);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Wisdom is automatically used for any size to which it is applicable,
							 | 
						|||
| 
								 | 
							
								as long as the planner flags are not more "patient" than those with
							 | 
						|||
| 
								 | 
							
								which the wisdom was created.  For example, wisdom created with
							 | 
						|||
| 
								 | 
							
								'FFTW_MEASURE' can be used if you later plan with 'FFTW_ESTIMATE' or
							 | 
						|||
| 
								 | 
							
								'FFTW_MEASURE', but not with 'FFTW_PATIENT'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'wisdom' is cumulative, and is stored in a global, private data
							 | 
						|||
| 
								 | 
							
								structure managed internally by FFTW. The storage space required is
							 | 
						|||
| 
								 | 
							
								minimal, proportional to the logarithm of the sizes the wisdom was
							 | 
						|||
| 
								 | 
							
								generated from.  If memory usage is a concern, however, the wisdom can
							 | 
						|||
| 
								 | 
							
								be forgotten and its associated memory freed by calling:
							 | 
						|||
| 
								 | 
							
								     void fftw_forget_wisdom(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Wisdom can be exported to a file, a string, or any other medium.  For
							 | 
						|||
| 
								 | 
							
								details, see *note Wisdom::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Caveats in Using Wisdom,  Prev: Words of Wisdom-Saving Plans,  Up: Other Important Topics
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								3.4 Caveats in Using Wisdom
							 | 
						|||
| 
								 | 
							
								===========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     For in much wisdom is much grief, and he that increaseth knowledge
							 | 
						|||
| 
								 | 
							
								     increaseth sorrow.  [Ecclesiastes 1:18]
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are pitfalls to using wisdom, in that it can negate FFTW's
							 | 
						|||
| 
								 | 
							
								ability to adapt to changing hardware and other conditions.  For
							 | 
						|||
| 
								 | 
							
								example, it would be perfectly possible to export wisdom from a program
							 | 
						|||
| 
								 | 
							
								running on one processor and import it into a program running on another
							 | 
						|||
| 
								 | 
							
								processor.  Doing so, however, would mean that the second program would
							 | 
						|||
| 
								 | 
							
								use plans optimized for the first processor, instead of the one it is
							 | 
						|||
| 
								 | 
							
								running on.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   It should be safe to reuse wisdom as long as the hardware and program
							 | 
						|||
| 
								 | 
							
								binaries remain unchanged.  (Actually, the optimal plan may change even
							 | 
						|||
| 
								 | 
							
								between runs of the same binary on identical hardware, due to
							 | 
						|||
| 
								 | 
							
								differences in the virtual memory environment, etcetera.  Users
							 | 
						|||
| 
								 | 
							
								seriously interested in performance should worry about this problem,
							 | 
						|||
| 
								 | 
							
								too.)  It is likely that, if the same wisdom is used for two different
							 | 
						|||
| 
								 | 
							
								program binaries, even running on the same machine, the plans may be
							 | 
						|||
| 
								 | 
							
								sub-optimal because of differing code alignments.  It is therefore wise
							 | 
						|||
| 
								 | 
							
								to recreate wisdom every time an application is recompiled.  The more
							 | 
						|||
| 
								 | 
							
								the underlying hardware and software changes between the creation of
							 | 
						|||
| 
								 | 
							
								wisdom and its use, the greater grows the risk of sub-optimal plans.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Nevertheless, if the choice is between using 'FFTW_ESTIMATE' or using
							 | 
						|||
| 
								 | 
							
								possibly-suboptimal wisdom (created on the same machine, but for a
							 | 
						|||
| 
								 | 
							
								different binary), the wisdom is likely to be better.  For this reason,
							 | 
						|||
| 
								 | 
							
								we provide a function to import wisdom from a standard system-wide
							 | 
						|||
| 
								 | 
							
								location ('/etc/fftw/wisdom' on Unix):
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int fftw_import_system_wisdom(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW also provides a standalone program, 'fftw-wisdom' (described by
							 | 
						|||
| 
								 | 
							
								its own 'man' page on Unix) with which users can create wisdom, e.g.
							 | 
						|||
| 
								 | 
							
								for a canonical set of sizes to store in the system wisdom file.  *Note
							 | 
						|||
| 
								 | 
							
								Wisdom Utilities::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW Reference,  Next: Multi-threaded FFTW,  Prev: Other Important Topics,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4 FFTW Reference
							 | 
						|||
| 
								 | 
							
								****************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This chapter provides a complete reference for all sequential (i.e.,
							 | 
						|||
| 
								 | 
							
								one-processor) FFTW functions.  Parallel transforms are described in
							 | 
						|||
| 
								 | 
							
								later chapters.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Data Types and Files::
							 | 
						|||
| 
								 | 
							
								* Using Plans::
							 | 
						|||
| 
								 | 
							
								* Basic Interface::
							 | 
						|||
| 
								 | 
							
								* Advanced Interface::
							 | 
						|||
| 
								 | 
							
								* Guru Interface::
							 | 
						|||
| 
								 | 
							
								* New-array Execute Functions::
							 | 
						|||
| 
								 | 
							
								* Wisdom::
							 | 
						|||
| 
								 | 
							
								* What FFTW Really Computes::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Data Types and Files,  Next: Using Plans,  Prev: FFTW Reference,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.1 Data Types and Files
							 | 
						|||
| 
								 | 
							
								========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								All programs using FFTW should include its header file:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #include <fftw3.h>
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You must also link to the FFTW library.  On Unix, this means adding
							 | 
						|||
| 
								 | 
							
								'-lfftw3 -lm' at the _end_ of the link command.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Complex numbers::
							 | 
						|||
| 
								 | 
							
								* Precision::
							 | 
						|||
| 
								 | 
							
								* Memory Allocation::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Complex numbers,  Next: Precision,  Prev: Data Types and Files,  Up: Data Types and Files
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.1.1 Complex numbers
							 | 
						|||
| 
								 | 
							
								---------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The default FFTW interface uses 'double' precision for all
							 | 
						|||
| 
								 | 
							
								floating-point numbers, and defines a 'fftw_complex' type to hold
							 | 
						|||
| 
								 | 
							
								complex numbers as:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     typedef double fftw_complex[2];
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, the '[0]' element holds the real part and the '[1]' element
							 | 
						|||
| 
								 | 
							
								holds the imaginary part.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Alternatively, if you have a C compiler (such as 'gcc') that supports
							 | 
						|||
| 
								 | 
							
								the C99 revision of the ANSI C standard, you can use C's new native
							 | 
						|||
| 
								 | 
							
								complex type (which is binary-compatible with the typedef above).  In
							 | 
						|||
| 
								 | 
							
								particular, if you '#include <complex.h>' _before_ '<fftw3.h>', then
							 | 
						|||
| 
								 | 
							
								'fftw_complex' is defined to be the native complex type and you can
							 | 
						|||
| 
								 | 
							
								manipulate it with ordinary arithmetic (e.g.  'x = y * (3+4*I)', where
							 | 
						|||
| 
								 | 
							
								'x' and 'y' are 'fftw_complex' and 'I' is the standard symbol for the
							 | 
						|||
| 
								 | 
							
								imaginary unit);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   C++ has its own 'complex<T>' template class, defined in the standard
							 | 
						|||
| 
								 | 
							
								'<complex>' header file.  Reportedly, the C++ standards committee has
							 | 
						|||
| 
								 | 
							
								recently agreed to mandate that the storage format used for this type be
							 | 
						|||
| 
								 | 
							
								binary-compatible with the C99 type, i.e.  an array 'T[2]' with
							 | 
						|||
| 
								 | 
							
								consecutive real '[0]' and imaginary '[1]' parts.  (See report
							 | 
						|||
| 
								 | 
							
								<http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf
							 | 
						|||
| 
								 | 
							
								WG21/N1388>.)  Although not part of the official standard as of this
							 | 
						|||
| 
								 | 
							
								writing, the proposal stated that: "This solution has been tested with
							 | 
						|||
| 
								 | 
							
								all current major implementations of the standard library and shown to
							 | 
						|||
| 
								 | 
							
								be working."  To the extent that this is true, if you have a variable
							 | 
						|||
| 
								 | 
							
								'complex<double> *x', you can pass it directly to FFTW via
							 | 
						|||
| 
								 | 
							
								'reinterpret_cast<fftw_complex*>(x)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Precision,  Next: Memory Allocation,  Prev: Complex numbers,  Up: Data Types and Files
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.1.2 Precision
							 | 
						|||
| 
								 | 
							
								---------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								You can install single and long-double precision versions of FFTW, which
							 | 
						|||
| 
								 | 
							
								replace 'double' with 'float' and 'long double', respectively (*note
							 | 
						|||
| 
								 | 
							
								Installation and Customization::).  To use these interfaces, you:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Link to the single/long-double libraries; on Unix, '-lfftw3f' or
							 | 
						|||
| 
								 | 
							
								     '-lfftw3l' instead of (or in addition to) '-lfftw3'.  (You can link
							 | 
						|||
| 
								 | 
							
								     to the different-precision libraries simultaneously.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Include the _same_ '<fftw3.h>' header file.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Replace all lowercase instances of 'fftw_' with 'fftwf_' or
							 | 
						|||
| 
								 | 
							
								     'fftwl_' for single or long-double precision, respectively.
							 | 
						|||
| 
								 | 
							
								     ('fftw_complex' becomes 'fftwf_complex', 'fftw_execute' becomes
							 | 
						|||
| 
								 | 
							
								     'fftwf_execute', etcetera.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Uppercase names, i.e.  names beginning with 'FFTW_', remain the
							 | 
						|||
| 
								 | 
							
								     same.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Replace 'double' with 'float' or 'long double' for subroutine
							 | 
						|||
| 
								 | 
							
								     parameters.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Depending upon your compiler and/or hardware, 'long double' may not
							 | 
						|||
| 
								 | 
							
								be any more precise than 'double' (or may not be supported at all,
							 | 
						|||
| 
								 | 
							
								although it is standard in C99).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We also support using the nonstandard '__float128'
							 | 
						|||
| 
								 | 
							
								quadruple-precision type provided by recent versions of 'gcc' on 32- and
							 | 
						|||
| 
								 | 
							
								64-bit x86 hardware (*note Installation and Customization::).  To use
							 | 
						|||
| 
								 | 
							
								this type, link with '-lfftw3q -lquadmath -lm' (the 'libquadmath'
							 | 
						|||
| 
								 | 
							
								library provided by 'gcc' is needed for quadruple-precision
							 | 
						|||
| 
								 | 
							
								trigonometric functions) and use 'fftwq_' identifiers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Memory Allocation,  Prev: Precision,  Up: Data Types and Files
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.1.3 Memory Allocation
							 | 
						|||
| 
								 | 
							
								-----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void *fftw_malloc(size_t n);
							 | 
						|||
| 
								 | 
							
								     void fftw_free(void *p);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These are functions that behave identically to 'malloc' and 'free',
							 | 
						|||
| 
								 | 
							
								except that they guarantee that the returned pointer obeys any special
							 | 
						|||
| 
								 | 
							
								alignment restrictions imposed by any algorithm in FFTW (e.g.  for SIMD
							 | 
						|||
| 
								 | 
							
								acceleration).  *Note SIMD alignment and fftw_malloc::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Data allocated by 'fftw_malloc' _must_ be deallocated by 'fftw_free'
							 | 
						|||
| 
								 | 
							
								and not by the ordinary 'free'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These routines simply call through to your operating system's
							 | 
						|||
| 
								 | 
							
								'malloc' or, if necessary, its aligned equivalent (e.g.  'memalign'), so
							 | 
						|||
| 
								 | 
							
								you normally need not worry about any significant time or space
							 | 
						|||
| 
								 | 
							
								overhead.  You are _not required_ to use them to allocate your data, but
							 | 
						|||
| 
								 | 
							
								we strongly recommend it.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note: in C++, just as with ordinary 'malloc', you must typecast the
							 | 
						|||
| 
								 | 
							
								output of 'fftw_malloc' to whatever pointer type you are allocating.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We also provide the following two convenience functions to allocate
							 | 
						|||
| 
								 | 
							
								real and complex arrays with 'n' elements, which are equivalent to
							 | 
						|||
| 
								 | 
							
								'(double *) fftw_malloc(sizeof(double) * n)' and '(fftw_complex *)
							 | 
						|||
| 
								 | 
							
								fftw_malloc(sizeof(fftw_complex) * n)', respectively:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     double *fftw_alloc_real(size_t n);
							 | 
						|||
| 
								 | 
							
								     fftw_complex *fftw_alloc_complex(size_t n);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The equivalent functions in other precisions allocate arrays of 'n'
							 | 
						|||
| 
								 | 
							
								elements in that precision.  e.g.  'fftwf_alloc_real(n)' is equivalent
							 | 
						|||
| 
								 | 
							
								to '(float *) fftwf_malloc(sizeof(float) * n)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Using Plans,  Next: Basic Interface,  Prev: Data Types and Files,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.2 Using Plans
							 | 
						|||
| 
								 | 
							
								===============
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Plans for all transform types in FFTW are stored as type 'fftw_plan' (an
							 | 
						|||
| 
								 | 
							
								opaque pointer type), and are created by one of the various planning
							 | 
						|||
| 
								 | 
							
								routines described in the following sections.  An 'fftw_plan' contains
							 | 
						|||
| 
								 | 
							
								all information necessary to compute the transform, including the
							 | 
						|||
| 
								 | 
							
								pointers to the input and output arrays.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute(const fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This executes the 'plan', to compute the corresponding transform on
							 | 
						|||
| 
								 | 
							
								the arrays for which it was planned (which must still exist).  The plan
							 | 
						|||
| 
								 | 
							
								is not modified, and 'fftw_execute' can be called as many times as
							 | 
						|||
| 
								 | 
							
								desired.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To apply a given plan to a different array, you can use the new-array
							 | 
						|||
| 
								 | 
							
								execute interface.  *Note New-array Execute Functions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_execute' (and equivalents) is the only function in FFTW
							 | 
						|||
| 
								 | 
							
								guaranteed to be thread-safe; see *note Thread safety::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This function:
							 | 
						|||
| 
								 | 
							
								     void fftw_destroy_plan(fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								   deallocates the 'plan' and all its associated data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW's planner saves some other persistent data, such as the
							 | 
						|||
| 
								 | 
							
								accumulated wisdom and a list of algorithms available in the current
							 | 
						|||
| 
								 | 
							
								configuration.  If you want to deallocate all of that and reset FFTW to
							 | 
						|||
| 
								 | 
							
								the pristine state it was in when you started your program, you can
							 | 
						|||
| 
								 | 
							
								call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_cleanup(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   After calling 'fftw_cleanup', all existing plans become undefined,
							 | 
						|||
| 
								 | 
							
								and you should not attempt to execute them nor to destroy them.  You can
							 | 
						|||
| 
								 | 
							
								however create and execute/destroy new plans, in which case FFTW starts
							 | 
						|||
| 
								 | 
							
								accumulating wisdom information again.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_cleanup' does not deallocate your plans, however.  To prevent
							 | 
						|||
| 
								 | 
							
								memory leaks, you must still call 'fftw_destroy_plan' before executing
							 | 
						|||
| 
								 | 
							
								'fftw_cleanup'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Occasionally, it may useful to know FFTW's internal "cost" metric
							 | 
						|||
| 
								 | 
							
								that it uses to compare plans to one another; this cost is proportional
							 | 
						|||
| 
								 | 
							
								to an execution time of the plan, in undocumented units, if the plan was
							 | 
						|||
| 
								 | 
							
								created with the 'FFTW_MEASURE' or other timing-based options, or
							 | 
						|||
| 
								 | 
							
								alternatively is a heuristic cost function for 'FFTW_ESTIMATE' plans.
							 | 
						|||
| 
								 | 
							
								(The cost values of measured and estimated plans are not comparable,
							 | 
						|||
| 
								 | 
							
								being in different units.  Also, costs from different FFTW versions or
							 | 
						|||
| 
								 | 
							
								the same version compiled differently may not be in the same units.
							 | 
						|||
| 
								 | 
							
								Plans created from wisdom have a cost of 0 since no timing measurement
							 | 
						|||
| 
								 | 
							
								is performed for them.  Finally, certain problems for which only one
							 | 
						|||
| 
								 | 
							
								top-level algorithm was possible may have required no measurements of
							 | 
						|||
| 
								 | 
							
								the cost of the whole plan, in which case 'fftw_cost' will also return
							 | 
						|||
| 
								 | 
							
								0.)  The cost metric for a given plan is returned by:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     double fftw_cost(const fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The following two routines are provided purely for academic purposes
							 | 
						|||
| 
								 | 
							
								(that is, for entertainment).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_flops(const fftw_plan plan,
							 | 
						|||
| 
								 | 
							
								                     double *add, double *mul, double *fma);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given a 'plan', set 'add', 'mul', and 'fma' to an exact count of the
							 | 
						|||
| 
								 | 
							
								number of floating-point additions, multiplications, and fused
							 | 
						|||
| 
								 | 
							
								multiply-add operations involved in the plan's execution.  The total
							 | 
						|||
| 
								 | 
							
								number of floating-point operations (flops) is 'add + mul + 2*fma', or
							 | 
						|||
| 
								 | 
							
								'add + mul + fma' if the hardware supports fused multiply-add
							 | 
						|||
| 
								 | 
							
								instructions (although the number of FMA operations is only approximate
							 | 
						|||
| 
								 | 
							
								because of compiler voodoo).  (The number of operations should be an
							 | 
						|||
| 
								 | 
							
								integer, but we use 'double' to avoid overflowing 'int' for large
							 | 
						|||
| 
								 | 
							
								transforms; the arguments are of type 'double' even for single and
							 | 
						|||
| 
								 | 
							
								long-double precision versions of FFTW.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
							 | 
						|||
| 
								 | 
							
								     void fftw_print_plan(const fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								     char *fftw_sprint_plan(const fftw_plan plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This outputs a "nerd-readable" representation of the 'plan' to the
							 | 
						|||
| 
								 | 
							
								given file, to 'stdout', or two a newly allocated NUL-terminated string
							 | 
						|||
| 
								 | 
							
								(which the caller is responsible for deallocating with 'free'),
							 | 
						|||
| 
								 | 
							
								respectively.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Basic Interface,  Next: Advanced Interface,  Prev: Using Plans,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3 Basic Interface
							 | 
						|||
| 
								 | 
							
								===================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Recall that the FFTW API is divided into three parts(1): the "basic
							 | 
						|||
| 
								 | 
							
								interface" computes a single transform of contiguous data, the "advanced
							 | 
						|||
| 
								 | 
							
								interface" computes transforms of multiple or strided arrays, and the
							 | 
						|||
| 
								 | 
							
								"guru interface" supports the most general data layouts, multiplicities,
							 | 
						|||
| 
								 | 
							
								and strides.  This section describes the basic interface, which we
							 | 
						|||
| 
								 | 
							
								expect to satisfy the needs of most users.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Complex DFTs::
							 | 
						|||
| 
								 | 
							
								* Planner Flags::
							 | 
						|||
| 
								 | 
							
								* Real-data DFTs::
							 | 
						|||
| 
								 | 
							
								* Real-data DFT Array Format::
							 | 
						|||
| 
								 | 
							
								* Real-to-Real Transforms::
							 | 
						|||
| 
								 | 
							
								* Real-to-Real Transform Kinds::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) Gallia est omnis divisa in partes tres (Julius Caesar).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Complex DFTs,  Next: Planner Flags,  Prev: Basic Interface,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.1 Complex DFTs
							 | 
						|||
| 
								 | 
							
								------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_1d(int n0,
							 | 
						|||
| 
								 | 
							
								                                fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_2d(int n0, int n1,
							 | 
						|||
| 
								 | 
							
								                                fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft(int rank, const int *n,
							 | 
						|||
| 
								 | 
							
								                             fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                             int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plan a complex input/output discrete Fourier transform (DFT) in zero
							 | 
						|||
| 
								 | 
							
								or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Once you have created a plan for a certain transform type and
							 | 
						|||
| 
								 | 
							
								parameters, then creating another plan of the same type and parameters,
							 | 
						|||
| 
								 | 
							
								but for different arrays, is fast and shares constant data with the
							 | 
						|||
| 
								 | 
							
								first plan (if it still exists).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The planner returns 'NULL' if the plan cannot be created.  In the
							 | 
						|||
| 
								 | 
							
								standard FFTW distribution, the basic interface is guaranteed to return
							 | 
						|||
| 
								 | 
							
								a non-'NULL' plan.  A plan may be 'NULL', however, if you are using a
							 | 
						|||
| 
								 | 
							
								customized FFTW configuration supporting a restricted set of transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Arguments
							 | 
						|||
| 
								 | 
							
								.........
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'rank' is the rank of the transform (it should be the size of the
							 | 
						|||
| 
								 | 
							
								     array '*n'), and can be any non-negative integer.  (*Note Complex
							 | 
						|||
| 
								 | 
							
								     Multi-Dimensional DFTs::, for the definition of "rank".)  The
							 | 
						|||
| 
								 | 
							
								     '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
							 | 
						|||
| 
								 | 
							
								     '2', and '3', respectively.  The rank may be zero, which is
							 | 
						|||
| 
								 | 
							
								     equivalent to a rank-1 transform of size 1, i.e.  a copy of one
							 | 
						|||
| 
								 | 
							
								     number from input to output.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'n0', 'n1', 'n2', or 'n[0..rank-1]' (as appropriate for each
							 | 
						|||
| 
								 | 
							
								     routine) specify the size of the transform dimensions.  They can be
							 | 
						|||
| 
								 | 
							
								     any positive integer.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								        - Multi-dimensional arrays are stored in row-major order with
							 | 
						|||
| 
								 | 
							
								          dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
							 | 
						|||
| 
								 | 
							
								          'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
							 | 
						|||
| 
								 | 
							
								          Format::.
							 | 
						|||
| 
								 | 
							
								        - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
							 | 
						|||
| 
								 | 
							
								          11^e 13^f, where e+f is either 0 or 1, and the other exponents
							 | 
						|||
| 
								 | 
							
								          are arbitrary.  Other sizes are computed by means of a slow,
							 | 
						|||
| 
								 | 
							
								          general-purpose algorithm (which nevertheless retains O(n log
							 | 
						|||
| 
								 | 
							
								          n) performance even for prime sizes).  It is possible to
							 | 
						|||
| 
								 | 
							
								          customize FFTW for different array sizes; see *note
							 | 
						|||
| 
								 | 
							
								          Installation and Customization::.  Transforms whose sizes are
							 | 
						|||
| 
								 | 
							
								          powers of 2 are especially fast.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'in' and 'out' point to the input and output arrays of the
							 | 
						|||
| 
								 | 
							
								     transform, which may be the same (yielding an in-place transform).
							 | 
						|||
| 
								 | 
							
								     These arrays are overwritten during planning, unless
							 | 
						|||
| 
								 | 
							
								     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
							 | 
						|||
| 
								 | 
							
								     initialized, but they must be allocated.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     If 'in == out', the transform is "in-place" and the input array is
							 | 
						|||
| 
								 | 
							
								     overwritten.  If 'in != out', the two arrays must not overlap (but
							 | 
						|||
| 
								 | 
							
								     FFTW does not check for this condition).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'sign' is the sign of the exponent in the formula that defines the
							 | 
						|||
| 
								 | 
							
								     Fourier transform.  It can be -1 (= 'FFTW_FORWARD') or +1 (=
							 | 
						|||
| 
								 | 
							
								     'FFTW_BACKWARD').
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								     defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW computes an unnormalized transform: computing a forward followed
							 | 
						|||
| 
								 | 
							
								by a backward transform (or vice versa) will result in the original data
							 | 
						|||
| 
								 | 
							
								multiplied by the size of the transform (the product of the dimensions).
							 | 
						|||
| 
								 | 
							
								For more information, see *note What FFTW Really Computes::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Planner Flags,  Next: Real-data DFTs,  Prev: Complex DFTs,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.2 Planner Flags
							 | 
						|||
| 
								 | 
							
								-------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								All of the planner routines in FFTW accept an integer 'flags' argument,
							 | 
						|||
| 
								 | 
							
								which is a bitwise OR ('|') of zero or more of the flag constants
							 | 
						|||
| 
								 | 
							
								defined below.  These flags control the rigor (and time) of the planning
							 | 
						|||
| 
								 | 
							
								process, and can also impose (or lift) restrictions on the type of
							 | 
						|||
| 
								 | 
							
								transform algorithm that is employed.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   _Important:_ the planner overwrites the input array during planning
							 | 
						|||
| 
								 | 
							
								unless a saved plan (*note Wisdom::) is available for that problem, so
							 | 
						|||
| 
								 | 
							
								you should initialize your input data after creating the plan.  The only
							 | 
						|||
| 
								 | 
							
								exceptions to this are the 'FFTW_ESTIMATE' and 'FFTW_WISDOM_ONLY' flags,
							 | 
						|||
| 
								 | 
							
								as mentioned below.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In all cases, if wisdom is available for the given problem that was
							 | 
						|||
| 
								 | 
							
								created with equal-or-greater planning rigor, then the more rigorous
							 | 
						|||
| 
								 | 
							
								wisdom is used.  For example, in 'FFTW_ESTIMATE' mode any available
							 | 
						|||
| 
								 | 
							
								wisdom is used, whereas in 'FFTW_PATIENT' mode only wisdom created in
							 | 
						|||
| 
								 | 
							
								patient or exhaustive mode can be used.  *Note Words of Wisdom-Saving
							 | 
						|||
| 
								 | 
							
								Plans::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Planning-rigor flags
							 | 
						|||
| 
								 | 
							
								....................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_ESTIMATE' specifies that, instead of actual measurements of
							 | 
						|||
| 
								 | 
							
								     different algorithms, a simple heuristic is used to pick a
							 | 
						|||
| 
								 | 
							
								     (probably sub-optimal) plan quickly.  With this flag, the
							 | 
						|||
| 
								 | 
							
								     input/output arrays are not overwritten during planning.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_MEASURE' tells FFTW to find an optimized plan by actually
							 | 
						|||
| 
								 | 
							
								     _computing_ several FFTs and measuring their execution time.
							 | 
						|||
| 
								 | 
							
								     Depending on your machine, this can take some time (often a few
							 | 
						|||
| 
								 | 
							
								     seconds).  'FFTW_MEASURE' is the default planning option.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_PATIENT' is like 'FFTW_MEASURE', but considers a wider range
							 | 
						|||
| 
								 | 
							
								     of algorithms and often produces a "more optimal" plan (especially
							 | 
						|||
| 
								 | 
							
								     for large transforms), but at the expense of several times longer
							 | 
						|||
| 
								 | 
							
								     planning time (especially for large transforms).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_EXHAUSTIVE' is like 'FFTW_PATIENT', but considers an even
							 | 
						|||
| 
								 | 
							
								     wider range of algorithms, including many that we think are
							 | 
						|||
| 
								 | 
							
								     unlikely to be fast, to produce the most optimal plan but with a
							 | 
						|||
| 
								 | 
							
								     substantially increased planning time.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_WISDOM_ONLY' is a special planning mode in which the plan is
							 | 
						|||
| 
								 | 
							
								     only created if wisdom is available for the given problem, and
							 | 
						|||
| 
								 | 
							
								     otherwise a 'NULL' plan is returned.  This can be combined with
							 | 
						|||
| 
								 | 
							
								     other flags, e.g.  'FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan
							 | 
						|||
| 
								 | 
							
								     only if wisdom is available that was created in 'FFTW_PATIENT' or
							 | 
						|||
| 
								 | 
							
								     'FFTW_EXHAUSTIVE' mode.  The 'FFTW_WISDOM_ONLY' flag is intended
							 | 
						|||
| 
								 | 
							
								     for users who need to detect whether wisdom is available; for
							 | 
						|||
| 
								 | 
							
								     example, if wisdom is not available one may wish to allocate new
							 | 
						|||
| 
								 | 
							
								     arrays for planning so that user data is not overwritten.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Algorithm-restriction flags
							 | 
						|||
| 
								 | 
							
								...........................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_DESTROY_INPUT' specifies that an out-of-place transform is
							 | 
						|||
| 
								 | 
							
								     allowed to _overwrite its input_ array with arbitrary data; this
							 | 
						|||
| 
								 | 
							
								     can sometimes allow more efficient algorithms to be employed.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must
							 | 
						|||
| 
								 | 
							
								     _not change its input_ array.  This is ordinarily the _default_,
							 | 
						|||
| 
								 | 
							
								     except for c2r and hc2r (i.e.  complex-to-real) transforms for
							 | 
						|||
| 
								 | 
							
								     which 'FFTW_DESTROY_INPUT' is the default.  In the latter cases,
							 | 
						|||
| 
								 | 
							
								     passing 'FFTW_PRESERVE_INPUT' will attempt to use algorithms that
							 | 
						|||
| 
								 | 
							
								     do not destroy the input, at the expense of worse performance; for
							 | 
						|||
| 
								 | 
							
								     multi-dimensional c2r transforms, however, no input-preserving
							 | 
						|||
| 
								 | 
							
								     algorithms are implemented and the planner will return 'NULL' if
							 | 
						|||
| 
								 | 
							
								     one is requested.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_UNALIGNED' specifies that the algorithm may not impose any
							 | 
						|||
| 
								 | 
							
								     unusual alignment requirements on the input/output arrays (i.e.  no
							 | 
						|||
| 
								 | 
							
								     SIMD may be used).  This flag is normally _not necessary_, since
							 | 
						|||
| 
								 | 
							
								     the planner automatically detects misaligned arrays.  The only use
							 | 
						|||
| 
								 | 
							
								     for this flag is if you want to use the new-array execute interface
							 | 
						|||
| 
								 | 
							
								     to execute a given plan on a different array that may not be
							 | 
						|||
| 
								 | 
							
								     aligned like the original.  (Using 'fftw_malloc' makes this flag
							 | 
						|||
| 
								 | 
							
								     unnecessary even then.  You can also use 'fftw_alignment_of' to
							 | 
						|||
| 
								 | 
							
								     detect whether two arrays are equivalently aligned.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Limiting planning time
							 | 
						|||
| 
								 | 
							
								......................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     extern void fftw_set_timelimit(double seconds);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This function instructs FFTW to spend at most 'seconds' seconds
							 | 
						|||
| 
								 | 
							
								(approximately) in the planner.  If 'seconds == FFTW_NO_TIMELIMIT' (the
							 | 
						|||
| 
								 | 
							
								default value, which is negative), then planning time is unbounded.
							 | 
						|||
| 
								 | 
							
								Otherwise, FFTW plans with a progressively wider range of algorithms
							 | 
						|||
| 
								 | 
							
								until the given time limit is reached or the given range of algorithms
							 | 
						|||
| 
								 | 
							
								is explored, returning the best available plan.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, specifying 'FFTW_PATIENT' first plans in 'FFTW_ESTIMATE'
							 | 
						|||
| 
								 | 
							
								mode, then in 'FFTW_MEASURE' mode, then finally (time permitting) in
							 | 
						|||
| 
								 | 
							
								'FFTW_PATIENT'.  If 'FFTW_EXHAUSTIVE' is specified instead, the planner
							 | 
						|||
| 
								 | 
							
								will further progress to 'FFTW_EXHAUSTIVE' mode.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that the 'seconds' argument specifies only a rough limit; in
							 | 
						|||
| 
								 | 
							
								practice, the planner may use somewhat more time if the time limit is
							 | 
						|||
| 
								 | 
							
								reached when the planner is in the middle of an operation that cannot be
							 | 
						|||
| 
								 | 
							
								interrupted.  At the very least, the planner will complete planning in
							 | 
						|||
| 
								 | 
							
								'FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Real-data DFTs,  Next: Real-data DFT Array Format,  Prev: Planner Flags,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.3 Real-data DFTs
							 | 
						|||
| 
								 | 
							
								--------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_1d(int n0,
							 | 
						|||
| 
								 | 
							
								                                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
							 | 
						|||
| 
								 | 
							
								                                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
							 | 
						|||
| 
								 | 
							
								                                 double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                 unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plan a real-input/complex-output discrete Fourier transform (DFT) in
							 | 
						|||
| 
								 | 
							
								zero or more dimensions, returning an 'fftw_plan' (*note Using Plans::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Once you have created a plan for a certain transform type and
							 | 
						|||
| 
								 | 
							
								parameters, then creating another plan of the same type and parameters,
							 | 
						|||
| 
								 | 
							
								but for different arrays, is fast and shares constant data with the
							 | 
						|||
| 
								 | 
							
								first plan (if it still exists).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The planner returns 'NULL' if the plan cannot be created.  A
							 | 
						|||
| 
								 | 
							
								non-'NULL' plan is always returned by the basic interface unless you are
							 | 
						|||
| 
								 | 
							
								using a customized FFTW configuration supporting a restricted set of
							 | 
						|||
| 
								 | 
							
								transforms, or if you use the 'FFTW_PRESERVE_INPUT' flag with a
							 | 
						|||
| 
								 | 
							
								multi-dimensional out-of-place c2r transform (see below).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Arguments
							 | 
						|||
| 
								 | 
							
								.........
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'rank' is the rank of the transform (it should be the size of the
							 | 
						|||
| 
								 | 
							
								     array '*n'), and can be any non-negative integer.  (*Note Complex
							 | 
						|||
| 
								 | 
							
								     Multi-Dimensional DFTs::, for the definition of "rank".)  The
							 | 
						|||
| 
								 | 
							
								     '_1d', '_2d', and '_3d' planners correspond to a 'rank' of '1',
							 | 
						|||
| 
								 | 
							
								     '2', and '3', respectively.  The rank may be zero, which is
							 | 
						|||
| 
								 | 
							
								     equivalent to a rank-1 transform of size 1, i.e.  a copy of one
							 | 
						|||
| 
								 | 
							
								     real number (with zero imaginary part) from input to output.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'n0', 'n1', 'n2', or 'n[0..rank-1]', (as appropriate for each
							 | 
						|||
| 
								 | 
							
								     routine) specify the size of the transform dimensions.  They can be
							 | 
						|||
| 
								 | 
							
								     any positive integer.  This is different in general from the
							 | 
						|||
| 
								 | 
							
								     _physical_ array dimensions, which are described in *note Real-data
							 | 
						|||
| 
								 | 
							
								     DFT Array Format::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								        - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
							 | 
						|||
| 
								 | 
							
								          11^e 13^f, where e+f is either 0 or 1, and the other exponents
							 | 
						|||
| 
								 | 
							
								          are arbitrary.  Other sizes are computed by means of a slow,
							 | 
						|||
| 
								 | 
							
								          general-purpose algorithm (which nevertheless retains O(n log
							 | 
						|||
| 
								 | 
							
								          n) performance even for prime sizes).  (It is possible to
							 | 
						|||
| 
								 | 
							
								          customize FFTW for different array sizes; see *note
							 | 
						|||
| 
								 | 
							
								          Installation and Customization::.)  Transforms whose sizes are
							 | 
						|||
| 
								 | 
							
								          powers of 2 are especially fast, and it is generally
							 | 
						|||
| 
								 | 
							
								          beneficial for the _last_ dimension of an r2c/c2r transform to
							 | 
						|||
| 
								 | 
							
								          be _even_.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'in' and 'out' point to the input and output arrays of the
							 | 
						|||
| 
								 | 
							
								     transform, which may be the same (yielding an in-place transform).
							 | 
						|||
| 
								 | 
							
								     These arrays are overwritten during planning, unless
							 | 
						|||
| 
								 | 
							
								     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
							 | 
						|||
| 
								 | 
							
								     initialized, but they must be allocated.)  For an in-place
							 | 
						|||
| 
								 | 
							
								     transform, it is important to remember that the real array will
							 | 
						|||
| 
								 | 
							
								     require padding, described in *note Real-data DFT Array Format::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								     defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The inverse transforms, taking complex input (storing the
							 | 
						|||
| 
								 | 
							
								non-redundant half of a logically Hermitian array) to real output, are
							 | 
						|||
| 
								 | 
							
								given by:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_c2r_1d(int n0,
							 | 
						|||
| 
								 | 
							
								                                    fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
							 | 
						|||
| 
								 | 
							
								                                    fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                    fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
							 | 
						|||
| 
								 | 
							
								                                 fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                 unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The arguments are the same as for the r2c transforms, except that the
							 | 
						|||
| 
								 | 
							
								input and output data formats are reversed.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW computes an unnormalized transform: computing an r2c followed by
							 | 
						|||
| 
								 | 
							
								a c2r transform (or vice versa) will result in the original data
							 | 
						|||
| 
								 | 
							
								multiplied by the size of the transform (the product of the logical
							 | 
						|||
| 
								 | 
							
								dimensions).  An r2c transform produces the same output as a
							 | 
						|||
| 
								 | 
							
								'FFTW_FORWARD' complex DFT of the same input, and a c2r transform is
							 | 
						|||
| 
								 | 
							
								correspondingly equivalent to 'FFTW_BACKWARD'.  For more information,
							 | 
						|||
| 
								 | 
							
								see *note What FFTW Really Computes::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Real-data DFT Array Format,  Next: Real-to-Real Transforms,  Prev: Real-data DFTs,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.4 Real-data DFT Array Format
							 | 
						|||
| 
								 | 
							
								--------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The output of a DFT of real data (r2c) contains symmetries that, in
							 | 
						|||
| 
								 | 
							
								principle, make half of the outputs redundant (*note What FFTW Really
							 | 
						|||
| 
								 | 
							
								Computes::).  (Similarly for the input of an inverse c2r transform.)  In
							 | 
						|||
| 
								 | 
							
								practice, it is not possible to entirely realize these savings in an
							 | 
						|||
| 
								 | 
							
								efficient and understandable format that generalizes to
							 | 
						|||
| 
								 | 
							
								multi-dimensional transforms.  Instead, the output of the r2c transforms
							 | 
						|||
| 
								 | 
							
								is _slightly_ over half of the output of the corresponding complex
							 | 
						|||
| 
								 | 
							
								transform.  We do not "pack" the data in any way, but store it as an
							 | 
						|||
| 
								 | 
							
								ordinary array of 'fftw_complex' values.  In fact, this data is simply a
							 | 
						|||
| 
								 | 
							
								subsection of what would be the array in the corresponding complex
							 | 
						|||
| 
								 | 
							
								transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Specifically, for a real transform of d (= 'rank') dimensions n[0] x
							 | 
						|||
| 
								 | 
							
								n[1] x n[2] x ...  x n[d-1] , the complex data is an n[0] x n[1] x n[2]
							 | 
						|||
| 
								 | 
							
								x ...  x (n[d-1]/2 + 1) array of 'fftw_complex' values in row-major
							 | 
						|||
| 
								 | 
							
								order (with the division rounded down).  That is, we only store the
							 | 
						|||
| 
								 | 
							
								_lower_ half (non-negative frequencies), plus one element, of the last
							 | 
						|||
| 
								 | 
							
								dimension of the data from the ordinary complex transform.  (We could
							 | 
						|||
| 
								 | 
							
								have instead taken half of any other dimension, but implementation turns
							 | 
						|||
| 
								 | 
							
								out to be simpler if the last, contiguous, dimension is used.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For an out-of-place transform, the real data is simply an array with
							 | 
						|||
| 
								 | 
							
								physical dimensions n[0] x n[1] x n[2] x ...  x n[d-1] in row-major
							 | 
						|||
| 
								 | 
							
								order.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For an in-place transform, some complications arise since the complex
							 | 
						|||
| 
								 | 
							
								data is slightly larger than the real data.  In this case, the final
							 | 
						|||
| 
								 | 
							
								dimension of the real data must be _padded_ with extra values to
							 | 
						|||
| 
								 | 
							
								accommodate the size of the complex data--two extra if the last
							 | 
						|||
| 
								 | 
							
								dimension is even and one if it is odd.  That is, the last dimension of
							 | 
						|||
| 
								 | 
							
								the real data must physically contain 2 * (n[d-1]/2+1) 'double' values
							 | 
						|||
| 
								 | 
							
								(exactly enough to hold the complex data).  This physical array size
							 | 
						|||
| 
								 | 
							
								does not, however, change the _logical_ array size--only n[d-1] values
							 | 
						|||
| 
								 | 
							
								are actually stored in the last dimension, and n[d-1] is the last
							 | 
						|||
| 
								 | 
							
								dimension passed to the planner.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Real-to-Real Transforms,  Next: Real-to-Real Transform Kinds,  Prev: Real-data DFT Array Format,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.5 Real-to-Real Transforms
							 | 
						|||
| 
								 | 
							
								-----------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind0, fftw_r2r_kind kind1,
							 | 
						|||
| 
								 | 
							
								                                unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
							 | 
						|||
| 
								 | 
							
								                                double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind0,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind1,
							 | 
						|||
| 
								 | 
							
								                                fftw_r2r_kind kind2,
							 | 
						|||
| 
								 | 
							
								                                unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                             const fftw_r2r_kind *kind, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plan a real input/output (r2r) transform of various kinds in zero or
							 | 
						|||
| 
								 | 
							
								more dimensions, returning an 'fftw_plan' (*note Using Plans::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Once you have created a plan for a certain transform type and
							 | 
						|||
| 
								 | 
							
								parameters, then creating another plan of the same type and parameters,
							 | 
						|||
| 
								 | 
							
								but for different arrays, is fast and shares constant data with the
							 | 
						|||
| 
								 | 
							
								first plan (if it still exists).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The planner returns 'NULL' if the plan cannot be created.  A
							 | 
						|||
| 
								 | 
							
								non-'NULL' plan is always returned by the basic interface unless you are
							 | 
						|||
| 
								 | 
							
								using a customized FFTW configuration supporting a restricted set of
							 | 
						|||
| 
								 | 
							
								transforms, or for size-1 'FFTW_REDFT00' kinds (which are not defined).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Arguments
							 | 
						|||
| 
								 | 
							
								.........
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'rank' is the dimensionality of the transform (it should be the
							 | 
						|||
| 
								 | 
							
								     size of the arrays '*n' and '*kind'), and can be any non-negative
							 | 
						|||
| 
								 | 
							
								     integer.  The '_1d', '_2d', and '_3d' planners correspond to a
							 | 
						|||
| 
								 | 
							
								     'rank' of '1', '2', and '3', respectively.  A 'rank' of zero is
							 | 
						|||
| 
								 | 
							
								     equivalent to a copy of one number from input to output.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'n', or 'n0'/'n1'/'n2', or 'n[rank]', respectively, gives the
							 | 
						|||
| 
								 | 
							
								     (physical) size of the transform dimensions.  They can be any
							 | 
						|||
| 
								 | 
							
								     positive integer.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								        - Multi-dimensional arrays are stored in row-major order with
							 | 
						|||
| 
								 | 
							
								          dimensions: 'n0' x 'n1'; or 'n0' x 'n1' x 'n2'; or 'n[0]' x
							 | 
						|||
| 
								 | 
							
								          'n[1]' x ...  x 'n[rank-1]'.  *Note Multi-dimensional Array
							 | 
						|||
| 
								 | 
							
								          Format::.
							 | 
						|||
| 
								 | 
							
								        - FFTW is generally best at handling sizes of the form 2^a 3^b
							 | 
						|||
| 
								 | 
							
								          5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other
							 | 
						|||
| 
								 | 
							
								          exponents are arbitrary.  Other sizes are computed by means of
							 | 
						|||
| 
								 | 
							
								          a slow, general-purpose algorithm (which nevertheless retains
							 | 
						|||
| 
								 | 
							
								          O(n log n) performance even for prime sizes).  (It is possible
							 | 
						|||
| 
								 | 
							
								          to customize FFTW for different array sizes; see *note
							 | 
						|||
| 
								 | 
							
								          Installation and Customization::.)  Transforms whose sizes are
							 | 
						|||
| 
								 | 
							
								          powers of 2 are especially fast.
							 | 
						|||
| 
								 | 
							
								        - For a 'REDFT00' or 'RODFT00' transform kind in a dimension of
							 | 
						|||
| 
								 | 
							
								          size n, it is n-1 or n+1, respectively, that should be
							 | 
						|||
| 
								 | 
							
								          factorizable in the above form.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'in' and 'out' point to the input and output arrays of the
							 | 
						|||
| 
								 | 
							
								     transform, which may be the same (yielding an in-place transform).
							 | 
						|||
| 
								 | 
							
								     These arrays are overwritten during planning, unless
							 | 
						|||
| 
								 | 
							
								     'FFTW_ESTIMATE' is used in the flags.  (The arrays need not be
							 | 
						|||
| 
								 | 
							
								     initialized, but they must be allocated.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'kind', or 'kind0'/'kind1'/'kind2', or 'kind[rank]', is the kind of
							 | 
						|||
| 
								 | 
							
								     r2r transform used for the corresponding dimension.  The valid kind
							 | 
						|||
| 
								 | 
							
								     constants are described in *note Real-to-Real Transform Kinds::.
							 | 
						|||
| 
								 | 
							
								     In a multi-dimensional transform, what is computed is the separable
							 | 
						|||
| 
								 | 
							
								     product formed by taking each transform kind along the
							 | 
						|||
| 
								 | 
							
								     corresponding dimension, one dimension after another.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								     defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Real-to-Real Transform Kinds,  Prev: Real-to-Real Transforms,  Up: Basic Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.3.6 Real-to-Real Transform Kinds
							 | 
						|||
| 
								 | 
							
								----------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW currently supports 11 different r2r transform kinds, specified by
							 | 
						|||
| 
								 | 
							
								one of the constants below.  For the precise definitions of these
							 | 
						|||
| 
								 | 
							
								transforms, see *note What FFTW Really Computes::.  For a more
							 | 
						|||
| 
								 | 
							
								colloquial introduction to these transform kinds, see *note More DFTs of
							 | 
						|||
| 
								 | 
							
								Real Data::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For dimension of size 'n', there is a corresponding "logical"
							 | 
						|||
| 
								 | 
							
								dimension 'N' that determines the normalization (and the optimal
							 | 
						|||
| 
								 | 
							
								factorization); the formula for 'N' is given for each kind below.  Also,
							 | 
						|||
| 
								 | 
							
								with each transform kind is listed its corrsponding inverse transform.
							 | 
						|||
| 
								 | 
							
								FFTW computes unnormalized transforms: a transform followed by its
							 | 
						|||
| 
								 | 
							
								inverse will result in the original data multiplied by 'N' (or the
							 | 
						|||
| 
								 | 
							
								product of the 'N''s for each dimension, in multi-dimensions).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_R2HC' computes a real-input DFT with output in "halfcomplex"
							 | 
						|||
| 
								 | 
							
								     format, i.e.  real and imaginary parts for a transform of size 'n'
							 | 
						|||
| 
								 | 
							
								     stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical
							 | 
						|||
| 
								 | 
							
								     'N=n', inverse is 'FFTW_HC2R'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_HC2R' computes the reverse of 'FFTW_R2HC', above.  (Logical
							 | 
						|||
| 
								 | 
							
								     'N=n', inverse is 'FFTW_R2HC'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_DHT' computes a discrete Hartley transform.  (Logical 'N=n',
							 | 
						|||
| 
								 | 
							
								     inverse is 'FFTW_DHT'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT00' computes an REDFT00 transform, i.e.  a DCT-I.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*(n-1)', inverse is 'FFTW_REDFT00'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT10' computes an REDFT10 transform, i.e.  a DCT-II
							 | 
						|||
| 
								 | 
							
								     (sometimes called "the" DCT). (Logical 'N=2*n', inverse is
							 | 
						|||
| 
								 | 
							
								     'FFTW_REDFT01'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT01' computes an REDFT01 transform, i.e.  a DCT-III
							 | 
						|||
| 
								 | 
							
								     (sometimes called "the" IDCT, being the inverse of DCT-II).
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*n', inverse is 'FFTW_REDFT=10'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_REDFT11' computes an REDFT11 transform, i.e.  a DCT-IV.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*n', inverse is 'FFTW_REDFT11'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT00' computes an RODFT00 transform, i.e.  a DST-I.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*(n+1)', inverse is 'FFTW_RODFT00'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT10' computes an RODFT10 transform, i.e.  a DST-II.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*n', inverse is 'FFTW_RODFT01'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT01' computes an RODFT01 transform, i.e.  a DST-III.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*n', inverse is 'FFTW_RODFT=10'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_RODFT11' computes an RODFT11 transform, i.e.  a DST-IV.
							 | 
						|||
| 
								 | 
							
								     (Logical 'N=2*n', inverse is 'FFTW_RODFT11'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Advanced Interface,  Next: Guru Interface,  Prev: Basic Interface,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.4 Advanced Interface
							 | 
						|||
| 
								 | 
							
								======================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW's "advanced" interface supplements the basic interface with four
							 | 
						|||
| 
								 | 
							
								new planner routines, providing a new level of flexibility: you can plan
							 | 
						|||
| 
								 | 
							
								a transform of multiple arrays simultaneously, operate on non-contiguous
							 | 
						|||
| 
								 | 
							
								(strided) data, and transform a subset of a larger multi-dimensional
							 | 
						|||
| 
								 | 
							
								array.  Other than these additional features, the planner operates in
							 | 
						|||
| 
								 | 
							
								the same fashion as in the basic interface, and the resulting
							 | 
						|||
| 
								 | 
							
								'fftw_plan' is used in the same way (*note Using Plans::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Advanced Complex DFTs::
							 | 
						|||
| 
								 | 
							
								* Advanced Real-data DFTs::
							 | 
						|||
| 
								 | 
							
								* Advanced Real-to-real Transforms::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Advanced Complex DFTs,  Next: Advanced Real-data DFTs,  Prev: Advanced Interface,  Up: Advanced Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.4.1 Advanced Complex DFTs
							 | 
						|||
| 
								 | 
							
								---------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
							 | 
						|||
| 
								 | 
							
								                                  fftw_complex *in, const int *inembed,
							 | 
						|||
| 
								 | 
							
								                                  int istride, int idist,
							 | 
						|||
| 
								 | 
							
								                                  fftw_complex *out, const int *onembed,
							 | 
						|||
| 
								 | 
							
								                                  int ostride, int odist,
							 | 
						|||
| 
								 | 
							
								                                  int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This routine plans multiple multidimensional complex DFTs, and it
							 | 
						|||
| 
								 | 
							
								extends the 'fftw_plan_dft' routine (*note Complex DFTs::) to compute
							 | 
						|||
| 
								 | 
							
								'howmany' transforms, each having rank 'rank' and size 'n'.  In
							 | 
						|||
| 
								 | 
							
								addition, the transform data need not be contiguous, but it may be laid
							 | 
						|||
| 
								 | 
							
								out in memory with an arbitrary stride.  To account for these
							 | 
						|||
| 
								 | 
							
								possibilities, 'fftw_plan_many_dft' adds the new parameters 'howmany',
							 | 
						|||
| 
								 | 
							
								{'i','o'}'nembed', {'i','o'}'stride', and {'i','o'}'dist'.  The FFTW
							 | 
						|||
| 
								 | 
							
								basic interface (*note Complex DFTs::) provides routines specialized for
							 | 
						|||
| 
								 | 
							
								ranks 1, 2, and 3, but the advanced interface handles only the
							 | 
						|||
| 
								 | 
							
								general-rank case.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'howmany' is the (nonnegative) number of transforms to compute.  The
							 | 
						|||
| 
								 | 
							
								resulting plan computes 'howmany' transforms, where the input of the
							 | 
						|||
| 
								 | 
							
								'k'-th transform is at location 'in+k*idist' (in C pointer arithmetic),
							 | 
						|||
| 
								 | 
							
								and its output is at location 'out+k*odist'.  Plans obtained in this way
							 | 
						|||
| 
								 | 
							
								can often be faster than calling FFTW multiple times for the individual
							 | 
						|||
| 
								 | 
							
								transforms.  The basic 'fftw_plan_dft' interface corresponds to
							 | 
						|||
| 
								 | 
							
								'howmany=1' (in which case the 'dist' parameters are ignored).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Each of the 'howmany' transforms has rank 'rank' and size 'n', as in
							 | 
						|||
| 
								 | 
							
								the basic interface.  In addition, the advanced interface allows the
							 | 
						|||
| 
								 | 
							
								input and output arrays of each transform to be row-major subarrays of
							 | 
						|||
| 
								 | 
							
								larger rank-'rank' arrays, described by 'inembed' and 'onembed'
							 | 
						|||
| 
								 | 
							
								parameters, respectively.  {'i','o'}'nembed' must be arrays of length
							 | 
						|||
| 
								 | 
							
								'rank', and 'n' should be elementwise less than or equal to
							 | 
						|||
| 
								 | 
							
								{'i','o'}'nembed'.  Passing 'NULL' for an 'nembed' parameter is
							 | 
						|||
| 
								 | 
							
								equivalent to passing 'n' (i.e.  same physical and logical dimensions,
							 | 
						|||
| 
								 | 
							
								as in the basic interface.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'stride' parameters indicate that the 'j'-th element of the input
							 | 
						|||
| 
								 | 
							
								or output arrays is located at 'j*istride' or 'j*ostride', respectively.
							 | 
						|||
| 
								 | 
							
								(For a multi-dimensional array, 'j' is the ordinary row-major index.)
							 | 
						|||
| 
								 | 
							
								When combined with the 'k'-th transform in a 'howmany' loop, from above,
							 | 
						|||
| 
								 | 
							
								this means that the ('j','k')-th element is at 'j*stride+k*dist'.  (The
							 | 
						|||
| 
								 | 
							
								basic 'fftw_plan_dft' interface corresponds to a stride of 1.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For in-place transforms, the input and output 'stride' and 'dist'
							 | 
						|||
| 
								 | 
							
								parameters should be the same; otherwise, the planner may return 'NULL'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Arrays 'n', 'inembed', and 'onembed' are not used after this function
							 | 
						|||
| 
								 | 
							
								returns.  You can safely free or reuse them.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   *Examples*: One transform of one 5 by 6 array contiguous in memory:
							 | 
						|||
| 
								 | 
							
								        int rank = 2;
							 | 
						|||
| 
								 | 
							
								        int n[] = {5, 6};
							 | 
						|||
| 
								 | 
							
								        int howmany = 1;
							 | 
						|||
| 
								 | 
							
								        int idist = odist = 0; /* unused because howmany = 1 */
							 | 
						|||
| 
								 | 
							
								        int istride = ostride = 1; /* array is contiguous in memory */
							 | 
						|||
| 
								 | 
							
								        int *inembed = n, *onembed = n;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Transform of three 5 by 6 arrays, each contiguous in memory, stored
							 | 
						|||
| 
								 | 
							
								in memory one after another:
							 | 
						|||
| 
								 | 
							
								        int rank = 2;
							 | 
						|||
| 
								 | 
							
								        int n[] = {5, 6};
							 | 
						|||
| 
								 | 
							
								        int howmany = 3;
							 | 
						|||
| 
								 | 
							
								        int idist = odist = n[0]*n[1]; /* = 30, the distance in memory
							 | 
						|||
| 
								 | 
							
								                                          between the first element
							 | 
						|||
| 
								 | 
							
								                                          of the first array and the
							 | 
						|||
| 
								 | 
							
								                                          first element of the second array */
							 | 
						|||
| 
								 | 
							
								        int istride = ostride = 1; /* array is contiguous in memory */
							 | 
						|||
| 
								 | 
							
								        int *inembed = n, *onembed = n;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Transform each column of a 2d array with 10 rows and 3 columns:
							 | 
						|||
| 
								 | 
							
								        int rank = 1; /* not 2: we are computing 1d transforms */
							 | 
						|||
| 
								 | 
							
								        int n[] = {10}; /* 1d transforms of length 10 */
							 | 
						|||
| 
								 | 
							
								        int howmany = 3;
							 | 
						|||
| 
								 | 
							
								        int idist = odist = 1;
							 | 
						|||
| 
								 | 
							
								        int istride = ostride = 3; /* distance between two elements in
							 | 
						|||
| 
								 | 
							
								                                      the same column */
							 | 
						|||
| 
								 | 
							
								        int *inembed = n, *onembed = n;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Advanced Real-data DFTs,  Next: Advanced Real-to-real Transforms,  Prev: Advanced Complex DFTs,  Up: Advanced Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.4.2 Advanced Real-data DFTs
							 | 
						|||
| 
								 | 
							
								-----------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
							 | 
						|||
| 
								 | 
							
								                                      double *in, const int *inembed,
							 | 
						|||
| 
								 | 
							
								                                      int istride, int idist,
							 | 
						|||
| 
								 | 
							
								                                      fftw_complex *out, const int *onembed,
							 | 
						|||
| 
								 | 
							
								                                      int ostride, int odist,
							 | 
						|||
| 
								 | 
							
								                                      unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
							 | 
						|||
| 
								 | 
							
								                                      fftw_complex *in, const int *inembed,
							 | 
						|||
| 
								 | 
							
								                                      int istride, int idist,
							 | 
						|||
| 
								 | 
							
								                                      double *out, const int *onembed,
							 | 
						|||
| 
								 | 
							
								                                      int ostride, int odist,
							 | 
						|||
| 
								 | 
							
								                                      unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Like 'fftw_plan_many_dft', these two functions add 'howmany',
							 | 
						|||
| 
								 | 
							
								'nembed', 'stride', and 'dist' parameters to the 'fftw_plan_dft_r2c' and
							 | 
						|||
| 
								 | 
							
								'fftw_plan_dft_c2r' functions, but otherwise behave the same as the
							 | 
						|||
| 
								 | 
							
								basic interface.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The interpretation of 'howmany', 'stride', and 'dist' are the same as
							 | 
						|||
| 
								 | 
							
								for 'fftw_plan_many_dft', above.  Note that the 'stride' and 'dist' for
							 | 
						|||
| 
								 | 
							
								the real array are in units of 'double', and for the complex array are
							 | 
						|||
| 
								 | 
							
								in units of 'fftw_complex'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If an 'nembed' parameter is 'NULL', it is interpreted as what it
							 | 
						|||
| 
								 | 
							
								would be in the basic interface, as described in *note Real-data DFT
							 | 
						|||
| 
								 | 
							
								Array Format::.  That is, for the complex array the size is assumed to
							 | 
						|||
| 
								 | 
							
								be the same as 'n', but with the last dimension cut roughly in half.
							 | 
						|||
| 
								 | 
							
								For the real array, the size is assumed to be 'n' if the transform is
							 | 
						|||
| 
								 | 
							
								out-of-place, or 'n' with the last dimension "padded" if the transform
							 | 
						|||
| 
								 | 
							
								is in-place.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If an 'nembed' parameter is non-'NULL', it is interpreted as the
							 | 
						|||
| 
								 | 
							
								physical size of the corresponding array, in row-major order, just as
							 | 
						|||
| 
								 | 
							
								for 'fftw_plan_many_dft'.  In this case, each dimension of 'nembed'
							 | 
						|||
| 
								 | 
							
								should be '>=' what it would be in the basic interface (e.g.  the halved
							 | 
						|||
| 
								 | 
							
								or padded 'n').
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Arrays 'n', 'inembed', and 'onembed' are not used after this function
							 | 
						|||
| 
								 | 
							
								returns.  You can safely free or reuse them.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Advanced Real-to-real Transforms,  Prev: Advanced Real-data DFTs,  Up: Advanced Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.4.3 Advanced Real-to-real Transforms
							 | 
						|||
| 
								 | 
							
								--------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
							 | 
						|||
| 
								 | 
							
								                                  double *in, const int *inembed,
							 | 
						|||
| 
								 | 
							
								                                  int istride, int idist,
							 | 
						|||
| 
								 | 
							
								                                  double *out, const int *onembed,
							 | 
						|||
| 
								 | 
							
								                                  int ostride, int odist,
							 | 
						|||
| 
								 | 
							
								                                  const fftw_r2r_kind *kind, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Like 'fftw_plan_many_dft', this functions adds 'howmany', 'nembed',
							 | 
						|||
| 
								 | 
							
								'stride', and 'dist' parameters to the 'fftw_plan_r2r' function, but
							 | 
						|||
| 
								 | 
							
								otherwise behave the same as the basic interface.  The interpretation of
							 | 
						|||
| 
								 | 
							
								those additional parameters are the same as for 'fftw_plan_many_dft'.
							 | 
						|||
| 
								 | 
							
								(Of course, the 'stride' and 'dist' parameters are now in units of
							 | 
						|||
| 
								 | 
							
								'double', not 'fftw_complex'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Arrays 'n', 'inembed', 'onembed', and 'kind' are not used after this
							 | 
						|||
| 
								 | 
							
								function returns.  You can safely free or reuse them.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Guru Interface,  Next: New-array Execute Functions,  Prev: Advanced Interface,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5 Guru Interface
							 | 
						|||
| 
								 | 
							
								==================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The "guru" interface to FFTW is intended to expose as much as possible
							 | 
						|||
| 
								 | 
							
								of the flexibility in the underlying FFTW architecture.  It allows one
							 | 
						|||
| 
								 | 
							
								to compute multi-dimensional "vectors" (loops) of multi-dimensional
							 | 
						|||
| 
								 | 
							
								transforms, where each vector/transform dimension has an independent
							 | 
						|||
| 
								 | 
							
								size and stride.  One can also use more general complex-number formats,
							 | 
						|||
| 
								 | 
							
								e.g.  separate real and imaginary arrays.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For those users who require the flexibility of the guru interface, it
							 | 
						|||
| 
								 | 
							
								is important that they pay special attention to the documentation lest
							 | 
						|||
| 
								 | 
							
								they shoot themselves in the foot.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Interleaved and split arrays::
							 | 
						|||
| 
								 | 
							
								* Guru vector and transform sizes::
							 | 
						|||
| 
								 | 
							
								* Guru Complex DFTs::
							 | 
						|||
| 
								 | 
							
								* Guru Real-data DFTs::
							 | 
						|||
| 
								 | 
							
								* Guru Real-to-real Transforms::
							 | 
						|||
| 
								 | 
							
								* 64-bit Guru Interface::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Interleaved and split arrays,  Next: Guru vector and transform sizes,  Prev: Guru Interface,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.1 Interleaved and split arrays
							 | 
						|||
| 
								 | 
							
								----------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The guru interface supports two representations of complex numbers,
							 | 
						|||
| 
								 | 
							
								which we call the interleaved and the split format.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The "interleaved" format is the same one used by the basic and
							 | 
						|||
| 
								 | 
							
								advanced interfaces, and it is documented in *note Complex numbers::.
							 | 
						|||
| 
								 | 
							
								In the interleaved format, you provide pointers to the real part of a
							 | 
						|||
| 
								 | 
							
								complex number, and the imaginary part understood to be stored in the
							 | 
						|||
| 
								 | 
							
								next memory location.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The "split" format allows separate pointers to the real and imaginary
							 | 
						|||
| 
								 | 
							
								parts of a complex array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Technically, the interleaved format is redundant, because you can
							 | 
						|||
| 
								 | 
							
								always express an interleaved array in terms of a split array with
							 | 
						|||
| 
								 | 
							
								appropriate pointers and strides.  On the other hand, the interleaved
							 | 
						|||
| 
								 | 
							
								format is simpler to use, and it is common in practice.  Hence, FFTW
							 | 
						|||
| 
								 | 
							
								supports it as a special case.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Guru vector and transform sizes,  Next: Guru Complex DFTs,  Prev: Interleaved and split arrays,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.2 Guru vector and transform sizes
							 | 
						|||
| 
								 | 
							
								-------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The guru interface introduces one basic new data structure,
							 | 
						|||
| 
								 | 
							
								'fftw_iodim', that is used to specify sizes and strides for
							 | 
						|||
| 
								 | 
							
								multi-dimensional transforms and vectors:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     typedef struct {
							 | 
						|||
| 
								 | 
							
								          int n;
							 | 
						|||
| 
								 | 
							
								          int is;
							 | 
						|||
| 
								 | 
							
								          int os;
							 | 
						|||
| 
								 | 
							
								     } fftw_iodim;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, 'n' is the size of the dimension, and 'is' and 'os' are the
							 | 
						|||
| 
								 | 
							
								strides of that dimension for the input and output arrays.  (The stride
							 | 
						|||
| 
								 | 
							
								is the separation of consecutive elements along this dimension.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The meaning of the stride parameter depends on the type of the array
							 | 
						|||
| 
								 | 
							
								that the stride refers to.  _If the array is interleaved complex,
							 | 
						|||
| 
								 | 
							
								strides are expressed in units of complex numbers ('fftw_complex').  If
							 | 
						|||
| 
								 | 
							
								the array is split complex or real, strides are expressed in units of
							 | 
						|||
| 
								 | 
							
								real numbers ('double')._  This convention is consistent with the usual
							 | 
						|||
| 
								 | 
							
								pointer arithmetic in the C language.  An interleaved array is denoted
							 | 
						|||
| 
								 | 
							
								by a pointer 'p' to 'fftw_complex', so that 'p+1' points to the next
							 | 
						|||
| 
								 | 
							
								complex number.  Split arrays are denoted by pointers to 'double', in
							 | 
						|||
| 
								 | 
							
								which case pointer arithmetic operates in units of 'sizeof(double)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The guru planner interfaces all take a ('rank', 'dims[rank]') pair
							 | 
						|||
| 
								 | 
							
								describing the transform size, and a ('howmany_rank',
							 | 
						|||
| 
								 | 
							
								'howmany_dims[howmany_rank]') pair describing the "vector" size (a
							 | 
						|||
| 
								 | 
							
								multi-dimensional loop of transforms to perform), where 'dims' and
							 | 
						|||
| 
								 | 
							
								'howmany_dims' are arrays of 'fftw_iodim'.  Each 'n' field must be
							 | 
						|||
| 
								 | 
							
								positive for 'dims' and nonnegative for 'howmany_dims', while both
							 | 
						|||
| 
								 | 
							
								'rank' and 'howmany_rank' must be nonnegative.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, the 'howmany' parameter in the advanced complex-DFT
							 | 
						|||
| 
								 | 
							
								interface corresponds to 'howmany_rank' = 1, 'howmany_dims[0].n' =
							 | 
						|||
| 
								 | 
							
								'howmany', 'howmany_dims[0].is' = 'idist', and 'howmany_dims[0].os' =
							 | 
						|||
| 
								 | 
							
								'odist'.  (To compute a single transform, you can just use
							 | 
						|||
| 
								 | 
							
								'howmany_rank' = 0.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   A row-major multidimensional array with dimensions 'n[rank]' (*note
							 | 
						|||
| 
								 | 
							
								Row-major Format::) corresponds to 'dims[i].n' = 'n[i]' and the
							 | 
						|||
| 
								 | 
							
								recurrence 'dims[i].is' = 'n[i+1] * dims[i+1].is' (similarly for 'os').
							 | 
						|||
| 
								 | 
							
								The stride of the last ('i=rank-1') dimension is the overall stride of
							 | 
						|||
| 
								 | 
							
								the array.  e.g.  to be equivalent to the advanced complex-DFT
							 | 
						|||
| 
								 | 
							
								interface, you would have 'dims[rank-1].is' = 'istride' and
							 | 
						|||
| 
								 | 
							
								'dims[rank-1].os' = 'ostride'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In general, we only guarantee FFTW to return a non-'NULL' plan if the
							 | 
						|||
| 
								 | 
							
								vector and transform dimensions correspond to a set of distinct indices,
							 | 
						|||
| 
								 | 
							
								and for in-place transforms the input/output strides should be the same.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Guru Complex DFTs,  Next: Guru Real-data DFTs,  Prev: Guru vector and transform sizes,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.3 Guru Complex DFTs
							 | 
						|||
| 
								 | 
							
								-----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_dft(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								          int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_split_dft(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          double *ri, double *ii, double *ro, double *io,
							 | 
						|||
| 
								 | 
							
								          unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These two functions plan a complex-data, multi-dimensional DFT for
							 | 
						|||
| 
								 | 
							
								the interleaved and split format, respectively.  Transform dimensions
							 | 
						|||
| 
								 | 
							
								are given by ('rank', 'dims') over a multi-dimensional vector (loop) of
							 | 
						|||
| 
								 | 
							
								dimensions ('howmany_rank', 'howmany_dims').  'dims' and 'howmany_dims'
							 | 
						|||
| 
								 | 
							
								should point to 'fftw_iodim' arrays of length 'rank' and 'howmany_rank',
							 | 
						|||
| 
								 | 
							
								respectively.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In the 'fftw_plan_guru_dft' function, the pointers 'in' and 'out'
							 | 
						|||
| 
								 | 
							
								point to the interleaved input and output arrays, respectively.  The
							 | 
						|||
| 
								 | 
							
								sign can be either -1 (= 'FFTW_FORWARD') or +1 (= 'FFTW_BACKWARD').  If
							 | 
						|||
| 
								 | 
							
								the pointers are equal, the transform is in-place.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In the 'fftw_plan_guru_split_dft' function, 'ri' and 'ii' point to
							 | 
						|||
| 
								 | 
							
								the real and imaginary input arrays, and 'ro' and 'io' point to the real
							 | 
						|||
| 
								 | 
							
								and imaginary output arrays.  The input and output pointers may be the
							 | 
						|||
| 
								 | 
							
								same, indicating an in-place transform.  For example, for 'fftw_complex'
							 | 
						|||
| 
								 | 
							
								pointers 'in' and 'out', the corresponding parameters are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ri = (double *) in;
							 | 
						|||
| 
								 | 
							
								     ii = (double *) in + 1;
							 | 
						|||
| 
								 | 
							
								     ro = (double *) out;
							 | 
						|||
| 
								 | 
							
								     io = (double *) out + 1;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Because 'fftw_plan_guru_split_dft' accepts split arrays, strides are
							 | 
						|||
| 
								 | 
							
								expressed in units of 'double'.  For a contiguous 'fftw_complex' array,
							 | 
						|||
| 
								 | 
							
								the overall stride of the transform should be 2, the distance between
							 | 
						|||
| 
								 | 
							
								consecutive real parts or between consecutive imaginary parts; see *note
							 | 
						|||
| 
								 | 
							
								Guru vector and transform sizes::.  Note that the dimension strides are
							 | 
						|||
| 
								 | 
							
								applied equally to the real and imaginary parts; real and imaginary
							 | 
						|||
| 
								 | 
							
								arrays with different strides are not supported.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There is no 'sign' parameter in 'fftw_plan_guru_split_dft'.  This
							 | 
						|||
| 
								 | 
							
								function always plans for an 'FFTW_FORWARD' transform.  To plan for an
							 | 
						|||
| 
								 | 
							
								'FFTW_BACKWARD' transform, you can exploit the identity that the
							 | 
						|||
| 
								 | 
							
								backwards DFT is equal to the forwards DFT with the real and imaginary
							 | 
						|||
| 
								 | 
							
								parts swapped.  For example, in the case of the 'fftw_complex' arrays
							 | 
						|||
| 
								 | 
							
								above, the 'FFTW_BACKWARD' transform is computed by the parameters:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ri = (double *) in + 1;
							 | 
						|||
| 
								 | 
							
								     ii = (double *) in;
							 | 
						|||
| 
								 | 
							
								     ro = (double *) out + 1;
							 | 
						|||
| 
								 | 
							
								     io = (double *) out;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Guru Real-data DFTs,  Next: Guru Real-to-real Transforms,  Prev: Guru Complex DFTs,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.4 Guru Real-data DFTs
							 | 
						|||
| 
								 | 
							
								-------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_dft_r2c(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								          unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_split_dft_r2c(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          double *in, double *ro, double *io,
							 | 
						|||
| 
								 | 
							
								          unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_dft_c2r(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								          unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_split_dft_c2r(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          double *ri, double *ii, double *out,
							 | 
						|||
| 
								 | 
							
								          unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
							 | 
						|||
| 
								 | 
							
								with transform dimensions given by ('rank', 'dims') over a
							 | 
						|||
| 
								 | 
							
								multi-dimensional vector (loop) of dimensions ('howmany_rank',
							 | 
						|||
| 
								 | 
							
								'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
							 | 
						|||
| 
								 | 
							
								arrays of length 'rank' and 'howmany_rank', respectively.  As for the
							 | 
						|||
| 
								 | 
							
								basic and advanced interfaces, an r2c transform is 'FFTW_FORWARD' and a
							 | 
						|||
| 
								 | 
							
								c2r transform is 'FFTW_BACKWARD'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The _last_ dimension of 'dims' is interpreted specially: that
							 | 
						|||
| 
								 | 
							
								dimension of the real array has size 'dims[rank-1].n', but that
							 | 
						|||
| 
								 | 
							
								dimension of the complex array has size 'dims[rank-1].n/2+1' (division
							 | 
						|||
| 
								 | 
							
								rounded down).  The strides, on the other hand, are taken to be exactly
							 | 
						|||
| 
								 | 
							
								as specified.  It is up to the user to specify the strides appropriately
							 | 
						|||
| 
								 | 
							
								for the peculiar dimensions of the data, and we do not guarantee that
							 | 
						|||
| 
								 | 
							
								the planner will succeed (return non-'NULL') for any dimensions other
							 | 
						|||
| 
								 | 
							
								than those described in *note Real-data DFT Array Format:: and
							 | 
						|||
| 
								 | 
							
								generalized in *note Advanced Real-data DFTs::.  (That is, for an
							 | 
						|||
| 
								 | 
							
								in-place transform, each individual dimension should be able to operate
							 | 
						|||
| 
								 | 
							
								in place.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'in' and 'out' point to the input and output arrays for r2c and c2r
							 | 
						|||
| 
								 | 
							
								transforms, respectively.  For split arrays, 'ri' and 'ii' point to the
							 | 
						|||
| 
								 | 
							
								real and imaginary input arrays for a c2r transform, and 'ro' and 'io'
							 | 
						|||
| 
								 | 
							
								point to the real and imaginary output arrays for an r2c transform.
							 | 
						|||
| 
								 | 
							
								'in' and 'ro' or 'ri' and 'out' may be the same, indicating an in-place
							 | 
						|||
| 
								 | 
							
								transform.  (In-place transforms where 'in' and 'io' or 'ii' and 'out'
							 | 
						|||
| 
								 | 
							
								are the same are not currently supported.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In-place transforms of rank greater than 1 are currently only
							 | 
						|||
| 
								 | 
							
								supported for interleaved arrays.  For split arrays, the planner will
							 | 
						|||
| 
								 | 
							
								return 'NULL'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Guru Real-to-real Transforms,  Next: 64-bit Guru Interface,  Prev: Guru Real-data DFTs,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.5 Guru Real-to-real Transforms
							 | 
						|||
| 
								 | 
							
								----------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
							 | 
						|||
| 
								 | 
							
								                                  int howmany_rank,
							 | 
						|||
| 
								 | 
							
								                                  const fftw_iodim *howmany_dims,
							 | 
						|||
| 
								 | 
							
								                                  double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                  const fftw_r2r_kind *kind,
							 | 
						|||
| 
								 | 
							
								                                  unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Plan a real-to-real (r2r) multi-dimensional 'FFTW_FORWARD' transform
							 | 
						|||
| 
								 | 
							
								with transform dimensions given by ('rank', 'dims') over a
							 | 
						|||
| 
								 | 
							
								multi-dimensional vector (loop) of dimensions ('howmany_rank',
							 | 
						|||
| 
								 | 
							
								'howmany_dims').  'dims' and 'howmany_dims' should point to 'fftw_iodim'
							 | 
						|||
| 
								 | 
							
								arrays of length 'rank' and 'howmany_rank', respectively.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The transform kind of each dimension is given by the 'kind'
							 | 
						|||
| 
								 | 
							
								parameter, which should point to an array of length 'rank'.  Valid
							 | 
						|||
| 
								 | 
							
								'fftw_r2r_kind' constants are given in *note Real-to-Real Transform
							 | 
						|||
| 
								 | 
							
								Kinds::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'in' and 'out' point to the real input and output arrays; they may be
							 | 
						|||
| 
								 | 
							
								the same, indicating an in-place transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'flags' is a bitwise OR ('|') of zero or more planner flags, as
							 | 
						|||
| 
								 | 
							
								defined in *note Planner Flags::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: 64-bit Guru Interface,  Prev: Guru Real-to-real Transforms,  Up: Guru Interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.5.6 64-bit Guru Interface
							 | 
						|||
| 
								 | 
							
								---------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								When compiled in 64-bit mode on a 64-bit architecture (where addresses
							 | 
						|||
| 
								 | 
							
								are 64 bits wide), FFTW uses 64-bit quantities internally for all
							 | 
						|||
| 
								 | 
							
								transform sizes, strides, and so on--you don't have to do anything
							 | 
						|||
| 
								 | 
							
								special to exploit this.  However, in the ordinary FFTW interfaces, you
							 | 
						|||
| 
								 | 
							
								specify the transform size by an 'int' quantity, which is normally only
							 | 
						|||
| 
								 | 
							
								32 bits wide.  This means that, even though FFTW is using 64-bit sizes
							 | 
						|||
| 
								 | 
							
								internally, you cannot specify a single transform dimension larger than
							 | 
						|||
| 
								 | 
							
								2^31-1 numbers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We expect that few users will require transforms larger than this,
							 | 
						|||
| 
								 | 
							
								but, for those who do, we provide a 64-bit version of the guru interface
							 | 
						|||
| 
								 | 
							
								in which all sizes are specified as integers of type 'ptrdiff_t' instead
							 | 
						|||
| 
								 | 
							
								of 'int'.  ('ptrdiff_t' is a signed integer type defined by the C
							 | 
						|||
| 
								 | 
							
								standard to be wide enough to represent address differences, and thus
							 | 
						|||
| 
								 | 
							
								must be at least 64 bits wide on a 64-bit machine.)  We stress that
							 | 
						|||
| 
								 | 
							
								there is _no performance advantage_ to using this interface--the same
							 | 
						|||
| 
								 | 
							
								internal FFTW code is employed regardless--and it is only necessary if
							 | 
						|||
| 
								 | 
							
								you want to specify very large transform sizes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, the 64-bit guru interface is a set of planner routines
							 | 
						|||
| 
								 | 
							
								that are exactly the same as the guru planner routines, except that they
							 | 
						|||
| 
								 | 
							
								are named with 'guru64' instead of 'guru' and they take arguments of
							 | 
						|||
| 
								 | 
							
								type 'fftw_iodim64' instead of 'fftw_iodim'.  For example, instead of
							 | 
						|||
| 
								 | 
							
								'fftw_plan_guru_dft', we have 'fftw_plan_guru64_dft'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_plan_guru64_dft(
							 | 
						|||
| 
								 | 
							
								          int rank, const fftw_iodim64 *dims,
							 | 
						|||
| 
								 | 
							
								          int howmany_rank, const fftw_iodim64 *howmany_dims,
							 | 
						|||
| 
								 | 
							
								          fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								          int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'fftw_iodim64' type is similar to 'fftw_iodim', with the same
							 | 
						|||
| 
								 | 
							
								interpretation, except that it uses type 'ptrdiff_t' instead of type
							 | 
						|||
| 
								 | 
							
								'int'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     typedef struct {
							 | 
						|||
| 
								 | 
							
								          ptrdiff_t n;
							 | 
						|||
| 
								 | 
							
								          ptrdiff_t is;
							 | 
						|||
| 
								 | 
							
								          ptrdiff_t os;
							 | 
						|||
| 
								 | 
							
								     } fftw_iodim64;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Every other 'fftw_plan_guru' function also has a 'fftw_plan_guru64'
							 | 
						|||
| 
								 | 
							
								equivalent, but we do not repeat their documentation here since they are
							 | 
						|||
| 
								 | 
							
								identical to the 32-bit versions except as noted above.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: New-array Execute Functions,  Next: Wisdom,  Prev: Guru Interface,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.6 New-array Execute Functions
							 | 
						|||
| 
								 | 
							
								===============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Normally, one executes a plan for the arrays with which the plan was
							 | 
						|||
| 
								 | 
							
								created, by calling 'fftw_execute(plan)' as described in *note Using
							 | 
						|||
| 
								 | 
							
								Plans::.  However, it is possible for sophisticated users to apply a
							 | 
						|||
| 
								 | 
							
								given plan to a _different_ array using the "new-array execute"
							 | 
						|||
| 
								 | 
							
								functions detailed below, provided that the following conditions are
							 | 
						|||
| 
								 | 
							
								met:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The array size, strides, etcetera are the same (since those are set
							 | 
						|||
| 
								 | 
							
								     by the plan).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The input and output arrays are the same (in-place) or different
							 | 
						|||
| 
								 | 
							
								     (out-of-place) if the plan was originally created to be in-place or
							 | 
						|||
| 
								 | 
							
								     out-of-place, respectively.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * For split arrays, the separations between the real and imaginary
							 | 
						|||
| 
								 | 
							
								     parts, 'ii-ri' and 'io-ro', are the same as they were for the input
							 | 
						|||
| 
								 | 
							
								     and output arrays when the plan was created.  (This condition is
							 | 
						|||
| 
								 | 
							
								     automatically satisfied for interleaved arrays.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The "alignment" of the new input/output arrays is the same as that
							 | 
						|||
| 
								 | 
							
								     of the input/output arrays when the plan was created, unless the
							 | 
						|||
| 
								 | 
							
								     plan was created with the 'FFTW_UNALIGNED' flag.  Here, the
							 | 
						|||
| 
								 | 
							
								     alignment is a platform-dependent quantity (for example, it is the
							 | 
						|||
| 
								 | 
							
								     address modulo 16 if SSE SIMD instructions are used, but the
							 | 
						|||
| 
								 | 
							
								     address modulo 4 for non-SIMD single-precision FFTW on the same
							 | 
						|||
| 
								 | 
							
								     machine).  In general, only arrays allocated with 'fftw_malloc' are
							 | 
						|||
| 
								 | 
							
								     guaranteed to be equally aligned (*note SIMD alignment and
							 | 
						|||
| 
								 | 
							
								     fftw_malloc::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The alignment issue is especially critical, because if you don't use
							 | 
						|||
| 
								 | 
							
								'fftw_malloc' then you may have little control over the alignment of
							 | 
						|||
| 
								 | 
							
								arrays in memory.  For example, neither the C++ 'new' function nor the
							 | 
						|||
| 
								 | 
							
								Fortran 'allocate' statement provide strong enough guarantees about data
							 | 
						|||
| 
								 | 
							
								alignment.  If you don't use 'fftw_malloc', therefore, you probably have
							 | 
						|||
| 
								 | 
							
								to use 'FFTW_UNALIGNED' (which disables most SIMD support).  If
							 | 
						|||
| 
								 | 
							
								possible, it is probably better for you to simply create multiple plans
							 | 
						|||
| 
								 | 
							
								(creating a new plan is quick once one exists for a given size), or
							 | 
						|||
| 
								 | 
							
								better yet re-use the same array for your transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For rare circumstances in which you cannot control the alignment of
							 | 
						|||
| 
								 | 
							
								allocated memory, but wish to determine where a given array is aligned
							 | 
						|||
| 
								 | 
							
								like the original array for which a plan was created, you can use the
							 | 
						|||
| 
								 | 
							
								'fftw_alignment_of' function:
							 | 
						|||
| 
								 | 
							
								     int fftw_alignment_of(double *p);
							 | 
						|||
| 
								 | 
							
								   Two arrays have equivalent alignment (for the purposes of applying a
							 | 
						|||
| 
								 | 
							
								plan) if and only if 'fftw_alignment_of' returns the same value for the
							 | 
						|||
| 
								 | 
							
								corresponding pointers to their data (typecast to 'double*' if
							 | 
						|||
| 
								 | 
							
								necessary).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you are tempted to use the new-array execute interface because you
							 | 
						|||
| 
								 | 
							
								want to transform a known bunch of arrays of the same size, you should
							 | 
						|||
| 
								 | 
							
								probably go use the advanced interface instead (*note Advanced
							 | 
						|||
| 
								 | 
							
								Interface::)).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The new-array execute functions are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_dft(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          fftw_complex *in, fftw_complex *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_split_dft(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          double *ri, double *ii, double *ro, double *io);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_dft_r2c(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          double *in, fftw_complex *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_split_dft_r2c(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          double *in, double *ro, double *io);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_dft_c2r(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          fftw_complex *in, double *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_split_dft_c2r(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          double *ri, double *ii, double *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_execute_r2r(
							 | 
						|||
| 
								 | 
							
								          const fftw_plan p,
							 | 
						|||
| 
								 | 
							
								          double *in, double *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These execute the 'plan' to compute the corresponding transform on
							 | 
						|||
| 
								 | 
							
								the input/output arrays specified by the subsequent arguments.  The
							 | 
						|||
| 
								 | 
							
								input/output array arguments have the same meanings as the ones passed
							 | 
						|||
| 
								 | 
							
								to the guru planner routines in the preceding sections.  The 'plan' is
							 | 
						|||
| 
								 | 
							
								not modified, and these routines can be called as many times as desired,
							 | 
						|||
| 
								 | 
							
								or intermixed with calls to the ordinary 'fftw_execute'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'plan' _must_ have been created for the transform type
							 | 
						|||
| 
								 | 
							
								corresponding to the execute function, e.g.  it must be a complex-DFT
							 | 
						|||
| 
								 | 
							
								plan for 'fftw_execute_dft'.  Any of the planner routines for that
							 | 
						|||
| 
								 | 
							
								transform type, from the basic to the guru interface, could have been
							 | 
						|||
| 
								 | 
							
								used to create the plan, however.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom,  Next: What FFTW Really Computes,  Prev: New-array Execute Functions,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.7 Wisdom
							 | 
						|||
| 
								 | 
							
								==========
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This section documents the FFTW mechanism for saving and restoring plans
							 | 
						|||
| 
								 | 
							
								from disk.  This mechanism is called "wisdom".
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Wisdom Export::
							 | 
						|||
| 
								 | 
							
								* Wisdom Import::
							 | 
						|||
| 
								 | 
							
								* Forgetting Wisdom::
							 | 
						|||
| 
								 | 
							
								* Wisdom Utilities::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom Export,  Next: Wisdom Import,  Prev: Wisdom,  Up: Wisdom
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.7.1 Wisdom Export
							 | 
						|||
| 
								 | 
							
								-------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int fftw_export_wisdom_to_filename(const char *filename);
							 | 
						|||
| 
								 | 
							
								     void fftw_export_wisdom_to_file(FILE *output_file);
							 | 
						|||
| 
								 | 
							
								     char *fftw_export_wisdom_to_string(void);
							 | 
						|||
| 
								 | 
							
								     void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These functions allow you to export all currently accumulated wisdom
							 | 
						|||
| 
								 | 
							
								in a form from which it can be later imported and restored, even during
							 | 
						|||
| 
								 | 
							
								a separate run of the program.  (*Note Words of Wisdom-Saving Plans::.)
							 | 
						|||
| 
								 | 
							
								The current store of wisdom is not affected by calling any of these
							 | 
						|||
| 
								 | 
							
								routines.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_export_wisdom' exports the wisdom to any output medium, as
							 | 
						|||
| 
								 | 
							
								specified by the callback function 'write_char'.  'write_char' is a
							 | 
						|||
| 
								 | 
							
								'putc'-like function that writes the character 'c' to some output; its
							 | 
						|||
| 
								 | 
							
								second parameter is the 'data' pointer passed to 'fftw_export_wisdom'.
							 | 
						|||
| 
								 | 
							
								For convenience, the following three "wrapper" routines are provided:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_export_wisdom_to_filename' writes wisdom to a file named
							 | 
						|||
| 
								 | 
							
								'filename' (which is created or overwritten), returning '1' on success
							 | 
						|||
| 
								 | 
							
								and '0' on failure.  A lower-level function, which requires you to open
							 | 
						|||
| 
								 | 
							
								and close the file yourself (e.g.  if you want to write wisdom to a
							 | 
						|||
| 
								 | 
							
								portion of a larger file) is 'fftw_export_wisdom_to_file'.  This writes
							 | 
						|||
| 
								 | 
							
								the wisdom to the current position in 'output_file', which should be
							 | 
						|||
| 
								 | 
							
								open with write permission; upon exit, the file remains open and is
							 | 
						|||
| 
								 | 
							
								positioned at the end of the wisdom data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_export_wisdom_to_string' returns a pointer to a
							 | 
						|||
| 
								 | 
							
								'NULL'-terminated string holding the wisdom data.  This string is
							 | 
						|||
| 
								 | 
							
								dynamically allocated, and it is the responsibility of the caller to
							 | 
						|||
| 
								 | 
							
								deallocate it with 'free' when it is no longer needed.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   All of these routines export the wisdom in the same format, which we
							 | 
						|||
| 
								 | 
							
								will not document here except to say that it is LISP-like ASCII text
							 | 
						|||
| 
								 | 
							
								that is insensitive to white space.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom Import,  Next: Forgetting Wisdom,  Prev: Wisdom Export,  Up: Wisdom
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.7.2 Wisdom Import
							 | 
						|||
| 
								 | 
							
								-------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int fftw_import_system_wisdom(void);
							 | 
						|||
| 
								 | 
							
								     int fftw_import_wisdom_from_filename(const char *filename);
							 | 
						|||
| 
								 | 
							
								     int fftw_import_wisdom_from_string(const char *input_string);
							 | 
						|||
| 
								 | 
							
								     int fftw_import_wisdom(int (*read_char)(void *), void *data);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These functions import wisdom into a program from data stored by the
							 | 
						|||
| 
								 | 
							
								'fftw_export_wisdom' functions above.  (*Note Words of Wisdom-Saving
							 | 
						|||
| 
								 | 
							
								Plans::.)  The imported wisdom replaces any wisdom already accumulated
							 | 
						|||
| 
								 | 
							
								by the running program.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_import_wisdom' imports wisdom from any input medium, as
							 | 
						|||
| 
								 | 
							
								specified by the callback function 'read_char'.  'read_char' is a
							 | 
						|||
| 
								 | 
							
								'getc'-like function that returns the next character in the input; its
							 | 
						|||
| 
								 | 
							
								parameter is the 'data' pointer passed to 'fftw_import_wisdom'.  If the
							 | 
						|||
| 
								 | 
							
								end of the input data is reached (which should never happen for valid
							 | 
						|||
| 
								 | 
							
								data), 'read_char' should return 'EOF' (as defined in '<stdio.h>').  For
							 | 
						|||
| 
								 | 
							
								convenience, the following three "wrapper" routines are provided:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_import_wisdom_from_filename' reads wisdom from a file named
							 | 
						|||
| 
								 | 
							
								'filename'.  A lower-level function, which requires you to open and
							 | 
						|||
| 
								 | 
							
								close the file yourself (e.g.  if you want to read wisdom from a portion
							 | 
						|||
| 
								 | 
							
								of a larger file) is 'fftw_import_wisdom_from_file'.  This reads wisdom
							 | 
						|||
| 
								 | 
							
								from the current position in 'input_file' (which should be open with
							 | 
						|||
| 
								 | 
							
								read permission); upon exit, the file remains open, but the position of
							 | 
						|||
| 
								 | 
							
								the read pointer is unspecified.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_import_wisdom_from_string' reads wisdom from the
							 | 
						|||
| 
								 | 
							
								'NULL'-terminated string 'input_string'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'fftw_import_system_wisdom' reads wisdom from an
							 | 
						|||
| 
								 | 
							
								implementation-defined standard file ('/etc/fftw/wisdom' on Unix and GNU
							 | 
						|||
| 
								 | 
							
								systems).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The return value of these import routines is '1' if the wisdom was
							 | 
						|||
| 
								 | 
							
								read successfully and '0' otherwise.  Note that, in all of these
							 | 
						|||
| 
								 | 
							
								functions, any data in the input stream past the end of the wisdom data
							 | 
						|||
| 
								 | 
							
								is simply ignored.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Forgetting Wisdom,  Next: Wisdom Utilities,  Prev: Wisdom Import,  Up: Wisdom
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.7.3 Forgetting Wisdom
							 | 
						|||
| 
								 | 
							
								-----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_forget_wisdom(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Calling 'fftw_forget_wisdom' causes all accumulated 'wisdom' to be
							 | 
						|||
| 
								 | 
							
								discarded and its associated memory to be freed.  (New 'wisdom' can
							 | 
						|||
| 
								 | 
							
								still be gathered subsequently, however.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom Utilities,  Prev: Forgetting Wisdom,  Up: Wisdom
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.7.4 Wisdom Utilities
							 | 
						|||
| 
								 | 
							
								----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW includes two standalone utility programs that deal with wisdom.  We
							 | 
						|||
| 
								 | 
							
								merely summarize them here, since they come with their own 'man' pages
							 | 
						|||
| 
								 | 
							
								for Unix and GNU systems (with HTML versions on our web site).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The first program is 'fftw-wisdom' (or 'fftwf-wisdom' in single
							 | 
						|||
| 
								 | 
							
								precision, etcetera), which can be used to create a wisdom file
							 | 
						|||
| 
								 | 
							
								containing plans for any of the transform sizes and types supported by
							 | 
						|||
| 
								 | 
							
								FFTW. It is preferable to create wisdom directly from your executable
							 | 
						|||
| 
								 | 
							
								(*note Caveats in Using Wisdom::), but this program is useful for
							 | 
						|||
| 
								 | 
							
								creating global wisdom files for 'fftw_import_system_wisdom'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The second program is 'fftw-wisdom-to-conf', which takes a wisdom
							 | 
						|||
| 
								 | 
							
								file as input and produces a "configuration routine" as output.  The
							 | 
						|||
| 
								 | 
							
								latter is a C subroutine that you can compile and link into your
							 | 
						|||
| 
								 | 
							
								program, replacing a routine of the same name in the FFTW library, that
							 | 
						|||
| 
								 | 
							
								determines which parts of FFTW are callable by your program.
							 | 
						|||
| 
								 | 
							
								'fftw-wisdom-to-conf' produces a configuration routine that links to
							 | 
						|||
| 
								 | 
							
								only those parts of FFTW needed by the saved plans in the wisdom,
							 | 
						|||
| 
								 | 
							
								greatly reducing the size of statically linked executables (which should
							 | 
						|||
| 
								 | 
							
								only attempt to create plans corresponding to those in the wisdom,
							 | 
						|||
| 
								 | 
							
								however).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: What FFTW Really Computes,  Prev: Wisdom,  Up: FFTW Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8 What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								=============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this section, we provide precise mathematical definitions for the
							 | 
						|||
| 
								 | 
							
								transforms that FFTW computes.  These transform definitions are fairly
							 | 
						|||
| 
								 | 
							
								standard, but some authors follow slightly different conventions for the
							 | 
						|||
| 
								 | 
							
								normalization of the transform (the constant factor in front) and the
							 | 
						|||
| 
								 | 
							
								sign of the complex exponent.  We begin by presenting the
							 | 
						|||
| 
								 | 
							
								one-dimensional (1d) transform definitions, and then give the
							 | 
						|||
| 
								 | 
							
								straightforward extension to multi-dimensional transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* The 1d Discrete Fourier Transform (DFT)::
							 | 
						|||
| 
								 | 
							
								* The 1d Real-data DFT::
							 | 
						|||
| 
								 | 
							
								* 1d Real-even DFTs (DCTs)::
							 | 
						|||
| 
								 | 
							
								* 1d Real-odd DFTs (DSTs)::
							 | 
						|||
| 
								 | 
							
								* 1d Discrete Hartley Transforms (DHTs)::
							 | 
						|||
| 
								 | 
							
								* Multi-dimensional Transforms::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: The 1d Discrete Fourier Transform (DFT),  Next: The 1d Real-data DFT,  Prev: What FFTW Really Computes,  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.1 The 1d Discrete Fourier Transform (DFT)
							 | 
						|||
| 
								 | 
							
								---------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The forward ('FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d
							 | 
						|||
| 
								 | 
							
								complex array X of size n computes an array Y, where:
							 | 
						|||
| 
								 | 
							
								 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
							 | 
						|||
| 
								 | 
							
								   The backward ('FFTW_BACKWARD') DFT computes:
							 | 
						|||
| 
								 | 
							
								 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW computes an unnormalized transform, in that there is no
							 | 
						|||
| 
								 | 
							
								coefficient in front of the summation in the DFT. In other words,
							 | 
						|||
| 
								 | 
							
								applying the forward and then the backward transform will multiply the
							 | 
						|||
| 
								 | 
							
								input by n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   From above, an 'FFTW_FORWARD' transform corresponds to a sign of -1
							 | 
						|||
| 
								 | 
							
								in the exponent of the DFT. Note also that we use the standard
							 | 
						|||
| 
								 | 
							
								"in-order" output ordering--the k-th output corresponds to the frequency
							 | 
						|||
| 
								 | 
							
								k/n (or k/T, where T is your total sampling period).  For those who like
							 | 
						|||
| 
								 | 
							
								to think in terms of positive and negative frequencies, this means that
							 | 
						|||
| 
								 | 
							
								the positive frequencies are stored in the first half of the output and
							 | 
						|||
| 
								 | 
							
								the negative frequencies are stored in backwards order in the second
							 | 
						|||
| 
								 | 
							
								half of the output.  (The frequency -k/n is the same as the frequency
							 | 
						|||
| 
								 | 
							
								(n-k)/n.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: The 1d Real-data DFT,  Next: 1d Real-even DFTs (DCTs),  Prev: The 1d Discrete Fourier Transform (DFT),  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.2 The 1d Real-data DFT
							 | 
						|||
| 
								 | 
							
								--------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of
							 | 
						|||
| 
								 | 
							
								the size 'n' real array X, exactly as defined above, i.e.
							 | 
						|||
| 
								 | 
							
								 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
							 | 
						|||
| 
								 | 
							
								   This output array Y can easily be shown to possess the "Hermitian"
							 | 
						|||
| 
								 | 
							
								symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] =
							 | 
						|||
| 
								 | 
							
								Y[0].
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As a result of this symmetry, half of the output Y is redundant
							 | 
						|||
| 
								 | 
							
								(being the complex conjugate of the other half), and so the 1d r2c
							 | 
						|||
| 
								 | 
							
								transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
							 | 
						|||
| 
								 | 
							
								where the division by 2 is rounded down.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Moreover, the Hermitian symmetry implies that Y[0] and, if n is even,
							 | 
						|||
| 
								 | 
							
								the Y[n/2] element, are purely real.  So, for the 'R2HC' r2r transform,
							 | 
						|||
| 
								 | 
							
								the halfcomplex format does not store the imaginary parts of these
							 | 
						|||
| 
								 | 
							
								elements.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The c2r and 'H2RC' r2r transforms compute the backward DFT of the
							 | 
						|||
| 
								 | 
							
								_complex_ array X with Hermitian symmetry, stored in the r2c/'R2HC'
							 | 
						|||
| 
								 | 
							
								output formats, respectively, where the backward transform is defined
							 | 
						|||
| 
								 | 
							
								exactly as for the complex case:
							 | 
						|||
| 
								 | 
							
								 Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
							 | 
						|||
| 
								 | 
							
								   The outputs 'Y' of this transform can easily be seen to be purely
							 | 
						|||
| 
								 | 
							
								real, and are stored as an array of real numbers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Like FFTW's complex DFT, these transforms are unnormalized.  In other
							 | 
						|||
| 
								 | 
							
								words, applying the real-to-complex (forward) and then the
							 | 
						|||
| 
								 | 
							
								complex-to-real (backward) transform will multiply the input by n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: 1d Real-even DFTs (DCTs),  Next: 1d Real-odd DFTs (DSTs),  Prev: The 1d Real-data DFT,  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.3 1d Real-even DFTs (DCTs)
							 | 
						|||
| 
								 | 
							
								------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The Real-even symmetry DFTs in FFTW are exactly equivalent to the
							 | 
						|||
| 
								 | 
							
								unnormalized forward (and backward) DFTs as defined above, where the
							 | 
						|||
| 
								 | 
							
								input array X of length N is purely real and is also "even" symmetry.
							 | 
						|||
| 
								 | 
							
								In this case, the output array is likewise real and even symmetry.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the case of 'REDFT00', this even symmetry means that X[j] =
							 | 
						|||
| 
								 | 
							
								X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
							 | 
						|||
| 
								 | 
							
								this redundancy, only the first n real numbers are actually stored,
							 | 
						|||
| 
								 | 
							
								where N = 2(n-1).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The proper definition of even symmetry for 'REDFT10', 'REDFT01', and
							 | 
						|||
| 
								 | 
							
								'REDFT11' transforms is somewhat more intricate because of the shifts by
							 | 
						|||
| 
								 | 
							
								1/2 of the input and/or output, although the corresponding boundary
							 | 
						|||
| 
								 | 
							
								conditions are given in *note Real even/odd DFTs (cosine/sine
							 | 
						|||
| 
								 | 
							
								transforms)::.  Because of the even symmetry, however, the sine terms in
							 | 
						|||
| 
								 | 
							
								the DFT all cancel and the remaining cosine terms are written explicitly
							 | 
						|||
| 
								 | 
							
								below.  This formulation often leads people to call such a transform a
							 | 
						|||
| 
								 | 
							
								"discrete cosine transform" (DCT), although it is really just a special
							 | 
						|||
| 
								 | 
							
								case of the DFT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In each of the definitions below, we transform a real array X of
							 | 
						|||
| 
								 | 
							
								length n to a real array Y of length n:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								REDFT00 (DCT-I)
							 | 
						|||
| 
								 | 
							
								...............
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0] +
							 | 
						|||
| 
								 | 
							
								(-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
							 | 
						|||
| 
								 | 
							
								Note that this transform is not defined for n=1.  For n=2, the summation
							 | 
						|||
| 
								 | 
							
								term above is dropped as you might expect.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								REDFT10 (DCT-II)
							 | 
						|||
| 
								 | 
							
								................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'REDFT10' transform (type-II DCT, sometimes called "the" DCT) in FFTW
							 | 
						|||
| 
								 | 
							
								is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k /
							 | 
						|||
| 
								 | 
							
								n)).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								REDFT01 (DCT-III)
							 | 
						|||
| 
								 | 
							
								.................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] = X[0]
							 | 
						|||
| 
								 | 
							
								+ 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)).  In the case
							 | 
						|||
| 
								 | 
							
								of n=1, this reduces to Y[0] = X[0].  Up to a scale factor (see below),
							 | 
						|||
| 
								 | 
							
								this is the inverse of 'REDFT10' ("the" DCT), and so the 'REDFT01'
							 | 
						|||
| 
								 | 
							
								(DCT-III) is sometimes called the "IDCT".
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								REDFT11 (DCT-IV)
							 | 
						|||
| 
								 | 
							
								................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
							 | 
						|||
| 
								 | 
							
								(sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Inverses and Normalization
							 | 
						|||
| 
								 | 
							
								..........................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								These definitions correspond directly to the unnormalized DFTs used
							 | 
						|||
| 
								 | 
							
								elsewhere in FFTW (hence the factors of 2 in front of the summations).
							 | 
						|||
| 
								 | 
							
								The unnormalized inverse of 'REDFT00' is 'REDFT00', of 'REDFT10' is
							 | 
						|||
| 
								 | 
							
								'REDFT01' and vice versa, and of 'REDFT11' is 'REDFT11'.  Each
							 | 
						|||
| 
								 | 
							
								unnormalized inverse results in the original array multiplied by N,
							 | 
						|||
| 
								 | 
							
								where N is the _logical_ DFT size.  For 'REDFT00', N=2(n-1) (note that
							 | 
						|||
| 
								 | 
							
								n=1 is not defined); otherwise, N=2n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In defining the discrete cosine transform, some authors also include
							 | 
						|||
| 
								 | 
							
								additional factors of sqrt(2) (or its inverse) multiplying selected
							 | 
						|||
| 
								 | 
							
								inputs and/or outputs.  This is a mostly cosmetic change that makes the
							 | 
						|||
| 
								 | 
							
								transform orthogonal, but sacrifices the direct equivalence to a
							 | 
						|||
| 
								 | 
							
								symmetric DFT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: 1d Real-odd DFTs (DSTs),  Next: 1d Discrete Hartley Transforms (DHTs),  Prev: 1d Real-even DFTs (DCTs),  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.4 1d Real-odd DFTs (DSTs)
							 | 
						|||
| 
								 | 
							
								-----------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The Real-odd symmetry DFTs in FFTW are exactly equivalent to the
							 | 
						|||
| 
								 | 
							
								unnormalized forward (and backward) DFTs as defined above, where the
							 | 
						|||
| 
								 | 
							
								input array X of length N is purely real and is also "odd" symmetry.  In
							 | 
						|||
| 
								 | 
							
								this case, the output is odd symmetry and purely imaginary.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the case of 'RODFT00', this odd symmetry means that X[j] =
							 | 
						|||
| 
								 | 
							
								-X[N-j], where we take X to be periodic so that X[N] = X[0].  Because of
							 | 
						|||
| 
								 | 
							
								this redundancy, only the first n real numbers starting at j=1 are
							 | 
						|||
| 
								 | 
							
								actually stored (the j=0 element is zero), where N = 2(n+1).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The proper definition of odd symmetry for 'RODFT10', 'RODFT01', and
							 | 
						|||
| 
								 | 
							
								'RODFT11' transforms is somewhat more intricate because of the shifts by
							 | 
						|||
| 
								 | 
							
								1/2 of the input and/or output, although the corresponding boundary
							 | 
						|||
| 
								 | 
							
								conditions are given in *note Real even/odd DFTs (cosine/sine
							 | 
						|||
| 
								 | 
							
								transforms)::.  Because of the odd symmetry, however, the cosine terms
							 | 
						|||
| 
								 | 
							
								in the DFT all cancel and the remaining sine terms are written
							 | 
						|||
| 
								 | 
							
								explicitly below.  This formulation often leads people to call such a
							 | 
						|||
| 
								 | 
							
								transform a "discrete sine transform" (DST), although it is really just
							 | 
						|||
| 
								 | 
							
								a special case of the DFT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In each of the definitions below, we transform a real array X of
							 | 
						|||
| 
								 | 
							
								length n to a real array Y of length n:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								RODFT00 (DST-I)
							 | 
						|||
| 
								 | 
							
								...............
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2 (sum
							 | 
						|||
| 
								 | 
							
								for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								RODFT10 (DST-II)
							 | 
						|||
| 
								 | 
							
								................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
							 | 
						|||
| 
								 | 
							
								(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								RODFT01 (DST-III)
							 | 
						|||
| 
								 | 
							
								.................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
							 | 
						|||
| 
								 | 
							
								(-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
							 | 
						|||
| 
								 | 
							
								n)).  In the case of n=1, this reduces to Y[0] = X[0].
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								RODFT11 (DST-IV)
							 | 
						|||
| 
								 | 
							
								................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An 'RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
							 | 
						|||
| 
								 | 
							
								(sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Inverses and Normalization
							 | 
						|||
| 
								 | 
							
								..........................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								These definitions correspond directly to the unnormalized DFTs used
							 | 
						|||
| 
								 | 
							
								elsewhere in FFTW (hence the factors of 2 in front of the summations).
							 | 
						|||
| 
								 | 
							
								The unnormalized inverse of 'RODFT00' is 'RODFT00', of 'RODFT10' is
							 | 
						|||
| 
								 | 
							
								'RODFT01' and vice versa, and of 'RODFT11' is 'RODFT11'.  Each
							 | 
						|||
| 
								 | 
							
								unnormalized inverse results in the original array multiplied by N,
							 | 
						|||
| 
								 | 
							
								where N is the _logical_ DFT size.  For 'RODFT00', N=2(n+1); otherwise,
							 | 
						|||
| 
								 | 
							
								N=2n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In defining the discrete sine transform, some authors also include
							 | 
						|||
| 
								 | 
							
								additional factors of sqrt(2) (or its inverse) multiplying selected
							 | 
						|||
| 
								 | 
							
								inputs and/or outputs.  This is a mostly cosmetic change that makes the
							 | 
						|||
| 
								 | 
							
								transform orthogonal, but sacrifices the direct equivalence to an
							 | 
						|||
| 
								 | 
							
								antisymmetric DFT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: 1d Discrete Hartley Transforms (DHTs),  Next: Multi-dimensional Transforms,  Prev: 1d Real-odd DFTs (DSTs),  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.5 1d Discrete Hartley Transforms (DHTs)
							 | 
						|||
| 
								 | 
							
								-------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The discrete Hartley transform (DHT) of a 1d real array X of size n
							 | 
						|||
| 
								 | 
							
								computes a real array Y of the same size, where:
							 | 
						|||
| 
								 | 
							
								Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW computes an unnormalized transform, in that there is no
							 | 
						|||
| 
								 | 
							
								coefficient in front of the summation in the DHT. In other words,
							 | 
						|||
| 
								 | 
							
								applying the transform twice (the DHT is its own inverse) will multiply
							 | 
						|||
| 
								 | 
							
								the input by n.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Multi-dimensional Transforms,  Prev: 1d Discrete Hartley Transforms (DHTs),  Up: What FFTW Really Computes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								4.8.6 Multi-dimensional Transforms
							 | 
						|||
| 
								 | 
							
								----------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The multi-dimensional transforms of FFTW, in general, compute simply the
							 | 
						|||
| 
								 | 
							
								separable product of the given 1d transform along each dimension of the
							 | 
						|||
| 
								 | 
							
								array.  Since each of these transforms is unnormalized, computing the
							 | 
						|||
| 
								 | 
							
								forward followed by the backward/inverse multi-dimensional transform
							 | 
						|||
| 
								 | 
							
								will result in the original array scaled by the product of the
							 | 
						|||
| 
								 | 
							
								normalization factors for each dimension (e.g.  the product of the
							 | 
						|||
| 
								 | 
							
								dimension sizes, for a multi-dimensional DFT).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The definition of FFTW's multi-dimensional DFT of real data (r2c)
							 | 
						|||
| 
								 | 
							
								deserves special attention.  In this case, we logically compute the full
							 | 
						|||
| 
								 | 
							
								multi-dimensional DFT of the input data; since the input data are purely
							 | 
						|||
| 
								 | 
							
								real, the output data have the Hermitian symmetry and therefore only one
							 | 
						|||
| 
								 | 
							
								non-redundant half need be stored.  More specifically, for an n[0] x
							 | 
						|||
| 
								 | 
							
								n[1] x n[2] x ...  x n[d-1] multi-dimensional real-input DFT, the full
							 | 
						|||
| 
								 | 
							
								(logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the
							 | 
						|||
| 
								 | 
							
								symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ...,
							 | 
						|||
| 
								 | 
							
								n[d-1] - k[d-1]]* (where each dimension is periodic).  Because of this
							 | 
						|||
| 
								 | 
							
								symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the _last_
							 | 
						|||
| 
								 | 
							
								dimension (division by 2 is rounded down).  (We could instead have cut
							 | 
						|||
| 
								 | 
							
								any other dimension in half, but the last dimension proved
							 | 
						|||
| 
								 | 
							
								computationally convenient.)  This results in the peculiar array format
							 | 
						|||
| 
								 | 
							
								described in more detail by *note Real-data DFT Array Format::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The multi-dimensional c2r transform is simply the unnormalized
							 | 
						|||
| 
								 | 
							
								inverse of the r2c transform.  i.e.  it is the same as FFTW's complex
							 | 
						|||
| 
								 | 
							
								backward multi-dimensional DFT, operating on a Hermitian input array in
							 | 
						|||
| 
								 | 
							
								the peculiar format mentioned above and outputting a real array (since
							 | 
						|||
| 
								 | 
							
								the DFT output is purely real).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We should remind the user that the separable product of 1d transforms
							 | 
						|||
| 
								 | 
							
								along each dimension, as computed by FFTW, is not always the same thing
							 | 
						|||
| 
								 | 
							
								as the usual multi-dimensional transform.  A multi-dimensional 'R2HC'
							 | 
						|||
| 
								 | 
							
								(or 'HC2R') transform is not identical to the multi-dimensional DFT,
							 | 
						|||
| 
								 | 
							
								requiring some post-processing to combine the requisite real and
							 | 
						|||
| 
								 | 
							
								imaginary parts, as was described in *note The Halfcomplex-format DFT::.
							 | 
						|||
| 
								 | 
							
								Likewise, FFTW's multidimensional 'FFTW_DHT' r2r transform is not the
							 | 
						|||
| 
								 | 
							
								same thing as the logical multi-dimensional discrete Hartley transform
							 | 
						|||
| 
								 | 
							
								defined in the literature, as discussed in *note The Discrete Hartley
							 | 
						|||
| 
								 | 
							
								Transform::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Multi-threaded FFTW,  Next: Distributed-memory FFTW with MPI,  Prev: FFTW Reference,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								5 Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								*********************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this chapter we document the parallel FFTW routines for shared-memory
							 | 
						|||
| 
								 | 
							
								parallel hardware.  These routines, which support parallel one- and
							 | 
						|||
| 
								 | 
							
								multi-dimensional transforms of both real and complex data, are the
							 | 
						|||
| 
								 | 
							
								easiest way to take advantage of multiple processors with FFTW. They
							 | 
						|||
| 
								 | 
							
								work just like the corresponding uniprocessor transform routines, except
							 | 
						|||
| 
								 | 
							
								that you have an extra initialization routine to call, and there is a
							 | 
						|||
| 
								 | 
							
								routine to set the number of threads to employ.  Any program that uses
							 | 
						|||
| 
								 | 
							
								the uniprocessor FFTW can therefore be trivially modified to use the
							 | 
						|||
| 
								 | 
							
								multi-threaded FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   A shared-memory machine is one in which all CPUs can directly access
							 | 
						|||
| 
								 | 
							
								the same main memory, and such machines are now common due to the
							 | 
						|||
| 
								 | 
							
								ubiquity of multi-core CPUs.  FFTW's multi-threading support allows you
							 | 
						|||
| 
								 | 
							
								to utilize these additional CPUs transparently from a single program.
							 | 
						|||
| 
								 | 
							
								However, this does not necessarily translate into performance
							 | 
						|||
| 
								 | 
							
								gains--when multiple threads/CPUs are employed, there is an overhead
							 | 
						|||
| 
								 | 
							
								required for synchronization that may outweigh the computatational
							 | 
						|||
| 
								 | 
							
								parallelism.  Therefore, you can only benefit from threads if your
							 | 
						|||
| 
								 | 
							
								problem is sufficiently large.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Installation and Supported Hardware/Software::
							 | 
						|||
| 
								 | 
							
								* Usage of Multi-threaded FFTW::
							 | 
						|||
| 
								 | 
							
								* How Many Threads to Use?::
							 | 
						|||
| 
								 | 
							
								* Thread safety::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Installation and Supported Hardware/Software,  Next: Usage of Multi-threaded FFTW,  Prev: Multi-threaded FFTW,  Up: Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								5.1 Installation and Supported Hardware/Software
							 | 
						|||
| 
								 | 
							
								================================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								All of the FFTW threads code is located in the 'threads' subdirectory of
							 | 
						|||
| 
								 | 
							
								the FFTW package.  On Unix systems, the FFTW threads libraries and
							 | 
						|||
| 
								 | 
							
								header files can be automatically configured, compiled, and installed
							 | 
						|||
| 
								 | 
							
								along with the uniprocessor FFTW libraries simply by including
							 | 
						|||
| 
								 | 
							
								'--enable-threads' in the flags to the 'configure' script (*note
							 | 
						|||
| 
								 | 
							
								Installation on Unix::), or '--enable-openmp' to use OpenMP
							 | 
						|||
| 
								 | 
							
								(http://www.openmp.org) threads.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The threads routines require your operating system to have some sort
							 | 
						|||
| 
								 | 
							
								of shared-memory threads support.  Specifically, the FFTW threads
							 | 
						|||
| 
								 | 
							
								package works with POSIX threads (available on most Unix variants, from
							 | 
						|||
| 
								 | 
							
								GNU/Linux to MacOS X) and Win32 threads.  OpenMP threads, which are
							 | 
						|||
| 
								 | 
							
								supported in many common compilers (e.g.  gcc) are also supported, and
							 | 
						|||
| 
								 | 
							
								may give better performance on some systems.  (OpenMP threads are also
							 | 
						|||
| 
								 | 
							
								useful if you are employing OpenMP in your own code, in order to
							 | 
						|||
| 
								 | 
							
								minimize conflicts between threading models.)  If you have a
							 | 
						|||
| 
								 | 
							
								shared-memory machine that uses a different threads API, it should be a
							 | 
						|||
| 
								 | 
							
								simple matter of programming to include support for it; see the file
							 | 
						|||
| 
								 | 
							
								'threads/threads.c' for more detail.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You can compile FFTW with _both_ '--enable-threads' and
							 | 
						|||
| 
								 | 
							
								'--enable-openmp' at the same time, since they install libraries with
							 | 
						|||
| 
								 | 
							
								different names ('fftw3_threads' and 'fftw3_omp', as described below).
							 | 
						|||
| 
								 | 
							
								However, your programs may only link to _one_ of these two libraries at
							 | 
						|||
| 
								 | 
							
								a time.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Ideally, of course, you should also have multiple processors in order
							 | 
						|||
| 
								 | 
							
								to get any benefit from the threaded transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Usage of Multi-threaded FFTW,  Next: How Many Threads to Use?,  Prev: Installation and Supported Hardware/Software,  Up: Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								5.2 Usage of Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Here, it is assumed that the reader is already familiar with the usage
							 | 
						|||
| 
								 | 
							
								of the uniprocessor FFTW routines, described elsewhere in this manual.
							 | 
						|||
| 
								 | 
							
								We only describe what one has to change in order to use the
							 | 
						|||
| 
								 | 
							
								multi-threaded routines.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, programs using the parallel complex transforms should be
							 | 
						|||
| 
								 | 
							
								linked with '-lfftw3_threads -lfftw3 -lm' on Unix, or '-lfftw3_omp
							 | 
						|||
| 
								 | 
							
								-lfftw3 -lm' if you compiled with OpenMP. You will also need to link
							 | 
						|||
| 
								 | 
							
								with whatever library is responsible for threads on your system (e.g.
							 | 
						|||
| 
								 | 
							
								'-lpthread' on GNU/Linux) or include whatever compiler flag enables
							 | 
						|||
| 
								 | 
							
								OpenMP (e.g.  '-fopenmp' with gcc).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, before calling _any_ FFTW routines, you should call the
							 | 
						|||
| 
								 | 
							
								function:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int fftw_init_threads(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This function, which need only be called once, performs any one-time
							 | 
						|||
| 
								 | 
							
								initialization required to use threads on your system.  It returns zero
							 | 
						|||
| 
								 | 
							
								if there was some error (which should not happen under normal
							 | 
						|||
| 
								 | 
							
								circumstances) and a non-zero value otherwise.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Third, before creating a plan that you want to parallelize, you
							 | 
						|||
| 
								 | 
							
								should call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_plan_with_nthreads(int nthreads);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'nthreads' argument indicates the number of threads you want FFTW
							 | 
						|||
| 
								 | 
							
								to use (or actually, the maximum number).  All plans subsequently
							 | 
						|||
| 
								 | 
							
								created with any planner routine will use that many threads.  You can
							 | 
						|||
| 
								 | 
							
								call 'fftw_plan_with_nthreads', create some plans, call
							 | 
						|||
| 
								 | 
							
								'fftw_plan_with_nthreads' again with a different argument, and create
							 | 
						|||
| 
								 | 
							
								some more plans for a new number of threads.  Plans already created
							 | 
						|||
| 
								 | 
							
								before a call to 'fftw_plan_with_nthreads' are unaffected.  If you pass
							 | 
						|||
| 
								 | 
							
								an 'nthreads' argument of '1' (the default), threads are disabled for
							 | 
						|||
| 
								 | 
							
								subsequent plans.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You can determine the current number of threads that the planner can
							 | 
						|||
| 
								 | 
							
								use by calling:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int fftw_planner_nthreads(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   With OpenMP, to configure FFTW to use all of the currently running
							 | 
						|||
| 
								 | 
							
								OpenMP threads (set by 'omp_set_num_threads(nthreads)' or by the
							 | 
						|||
| 
								 | 
							
								'OMP_NUM_THREADS' environment variable), you can do:
							 | 
						|||
| 
								 | 
							
								'fftw_plan_with_nthreads(omp_get_max_threads())'.  (The 'omp_' OpenMP
							 | 
						|||
| 
								 | 
							
								functions are declared via '#include <omp.h>'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given a plan, you then execute it as usual with 'fftw_execute(plan)',
							 | 
						|||
| 
								 | 
							
								and the execution will use the number of threads specified when the plan
							 | 
						|||
| 
								 | 
							
								was created.  When done, you destroy it as usual with
							 | 
						|||
| 
								 | 
							
								'fftw_destroy_plan'.  As described in *note Thread safety::, plan
							 | 
						|||
| 
								 | 
							
								_execution_ is thread-safe, but plan creation and destruction are _not_:
							 | 
						|||
| 
								 | 
							
								you should create/destroy plans only from a single thread, but can
							 | 
						|||
| 
								 | 
							
								safely execute multiple plans in parallel.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There is one additional routine: if you want to get rid of all memory
							 | 
						|||
| 
								 | 
							
								and other resources allocated internally by FFTW, you can call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_cleanup_threads(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   which is much like the 'fftw_cleanup()' function except that it also
							 | 
						|||
| 
								 | 
							
								gets rid of threads-related data.  You must _not_ execute any previously
							 | 
						|||
| 
								 | 
							
								created plans after calling this function.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We should also mention one other restriction: if you save wisdom from
							 | 
						|||
| 
								 | 
							
								a program using the multi-threaded FFTW, that wisdom _cannot be used_ by
							 | 
						|||
| 
								 | 
							
								a program using only the single-threaded FFTW (i.e.  not calling
							 | 
						|||
| 
								 | 
							
								'fftw_init_threads').  *Note Words of Wisdom-Saving Plans::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Finally, FFTW provides a optional callback interface that allows you
							 | 
						|||
| 
								 | 
							
								to replace its parallel threading backend at runtime:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_threads_set_callback(
							 | 
						|||
| 
								 | 
							
								         void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data),
							 | 
						|||
| 
								 | 
							
								         void *data);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This routine (which is _not_ threadsafe and should generally be
							 | 
						|||
| 
								 | 
							
								called before creating any FFTW plans) allows you to provide a function
							 | 
						|||
| 
								 | 
							
								'parallel_loop' that executes parallel work for FFTW: it should call the
							 | 
						|||
| 
								 | 
							
								function 'work(jobdata + elsize*i)' for 'i' from '0' to 'njobs-1',
							 | 
						|||
| 
								 | 
							
								possibly in parallel.  (The 'data' pointer supplied to
							 | 
						|||
| 
								 | 
							
								'fftw_threads_set_callback' is passed through to your 'parallel_loop'
							 | 
						|||
| 
								 | 
							
								function.)  For example, if you link to an FFTW threads library built to
							 | 
						|||
| 
								 | 
							
								use POSIX threads, but you want it to use OpenMP instead (because you
							 | 
						|||
| 
								 | 
							
								are using OpenMP elsewhere in your program and want to avoid competing
							 | 
						|||
| 
								 | 
							
								threads), you can call 'fftw_threads_set_callback' with the callback
							 | 
						|||
| 
								 | 
							
								function:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data)
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								     #pragma omp parallel for
							 | 
						|||
| 
								 | 
							
								         for (int i = 0; i < njobs; ++i)
							 | 
						|||
| 
								 | 
							
								             work(jobdata + elsize * i);
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The same mechanism could be used in order to make FFTW use a
							 | 
						|||
| 
								 | 
							
								threading backend implemented via Intel TBB, Apple GCD, or Cilk, for
							 | 
						|||
| 
								 | 
							
								example.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: How Many Threads to Use?,  Next: Thread safety,  Prev: Usage of Multi-threaded FFTW,  Up: Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								5.3 How Many Threads to Use?
							 | 
						|||
| 
								 | 
							
								============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								There is a fair amount of overhead involved in synchronizing threads, so
							 | 
						|||
| 
								 | 
							
								the optimal number of threads to use depends upon the size of the
							 | 
						|||
| 
								 | 
							
								transform as well as on the number of processors you have.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As a general rule, you don't want to use more threads than you have
							 | 
						|||
| 
								 | 
							
								processors.  (Using more threads will work, but there will be extra
							 | 
						|||
| 
								 | 
							
								overhead with no benefit.)  In fact, if the problem size is too small,
							 | 
						|||
| 
								 | 
							
								you may want to use fewer threads than you have processors.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You will have to experiment with your system to see what level of
							 | 
						|||
| 
								 | 
							
								parallelization is best for your problem size.  Typically, the problem
							 | 
						|||
| 
								 | 
							
								will have to involve at least a few thousand data points before threads
							 | 
						|||
| 
								 | 
							
								become beneficial.  If you plan with 'FFTW_PATIENT', it will
							 | 
						|||
| 
								 | 
							
								automatically disable threads for sizes that don't benefit from
							 | 
						|||
| 
								 | 
							
								parallelization.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Thread safety,  Prev: How Many Threads to Use?,  Up: Multi-threaded FFTW
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								5.4 Thread safety
							 | 
						|||
| 
								 | 
							
								=================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Users writing multi-threaded programs (including OpenMP) must concern
							 | 
						|||
| 
								 | 
							
								themselves with the "thread safety" of the libraries they use--that is,
							 | 
						|||
| 
								 | 
							
								whether it is safe to call routines in parallel from multiple threads.
							 | 
						|||
| 
								 | 
							
								FFTW can be used in such an environment, but some care must be taken
							 | 
						|||
| 
								 | 
							
								because the planner routines share data (e.g.  wisdom and trigonometric
							 | 
						|||
| 
								 | 
							
								tables) between calls and plans.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The upshot is that the only thread-safe routine in FFTW is
							 | 
						|||
| 
								 | 
							
								'fftw_execute' (and the new-array variants thereof).  All other routines
							 | 
						|||
| 
								 | 
							
								(e.g.  the planner) should only be called from one thread at a time.
							 | 
						|||
| 
								 | 
							
								So, for example, you can wrap a semaphore lock around any calls to the
							 | 
						|||
| 
								 | 
							
								planner; even more simply, you can just create all of your plans from
							 | 
						|||
| 
								 | 
							
								one thread.  We do not think this should be an important restriction
							 | 
						|||
| 
								 | 
							
								(FFTW is designed for the situation where the only performance-sensitive
							 | 
						|||
| 
								 | 
							
								code is the actual execution of the transform), and the benefits of
							 | 
						|||
| 
								 | 
							
								shared data between plans are great.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note also that, since the plan is not modified by 'fftw_execute', it
							 | 
						|||
| 
								 | 
							
								is safe to execute the _same plan_ in parallel by multiple threads.
							 | 
						|||
| 
								 | 
							
								However, since a given plan operates by default on a fixed array, you
							 | 
						|||
| 
								 | 
							
								need to use one of the new-array execute functions (*note New-array
							 | 
						|||
| 
								 | 
							
								Execute Functions::) so that different threads compute the transform of
							 | 
						|||
| 
								 | 
							
								different data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (Users should note that these comments only apply to programs using
							 | 
						|||
| 
								 | 
							
								shared-memory threads or OpenMP. Parallelism using MPI or forked
							 | 
						|||
| 
								 | 
							
								processes involves a separate address-space and global variables for
							 | 
						|||
| 
								 | 
							
								each process, and is not susceptible to problems of this sort.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The FFTW planner is intended to be called from a single thread.  If
							 | 
						|||
| 
								 | 
							
								you really must call it from multiple threads, you are expected to grab
							 | 
						|||
| 
								 | 
							
								whatever lock makes sense for your application, with the understanding
							 | 
						|||
| 
								 | 
							
								that you may be holding that lock for a long time, which is undesirable.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Neither strategy works, however, in the following situation.  The
							 | 
						|||
| 
								 | 
							
								"application" is structured as a set of "plugins" which are unaware of
							 | 
						|||
| 
								 | 
							
								each other, and for whatever reason the "plugins" cannot coordinate on
							 | 
						|||
| 
								 | 
							
								grabbing the lock.  (This is not a technical problem, but an
							 | 
						|||
| 
								 | 
							
								organizational one.  The "plugins" are written by independent agents,
							 | 
						|||
| 
								 | 
							
								and from the perspective of each plugin's author, each plugin is using
							 | 
						|||
| 
								 | 
							
								FFTW correctly from a single thread.)  To cope with this situation,
							 | 
						|||
| 
								 | 
							
								starting from FFTW-3.3.5, FFTW supports an API to make the planner
							 | 
						|||
| 
								 | 
							
								thread-safe:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_make_planner_thread_safe(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This call operates by brute force: It just installs a hook that wraps
							 | 
						|||
| 
								 | 
							
								a lock (chosen by us) around all planner calls.  So there is no magic
							 | 
						|||
| 
								 | 
							
								and you get the worst of all worlds.  The planner is still
							 | 
						|||
| 
								 | 
							
								single-threaded, but you cannot choose which lock to use.  The planner
							 | 
						|||
| 
								 | 
							
								still holds the lock for a long time, but you cannot impose a timeout on
							 | 
						|||
| 
								 | 
							
								lock acquisition.  As of FFTW-3.3.5 and FFTW-3.3.6, this call does not
							 | 
						|||
| 
								 | 
							
								work when using OpenMP as threading substrate.  (Suggestions on what to
							 | 
						|||
| 
								 | 
							
								do about this bug are welcome.)  _Do not use
							 | 
						|||
| 
								 | 
							
								'fftw_make_planner_thread_safe' unless there is no other choice,_ such
							 | 
						|||
| 
								 | 
							
								as in the application/plugin situation.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Distributed-memory FFTW with MPI,  Next: Calling FFTW from Modern Fortran,  Prev: Multi-threaded FFTW,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6 Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								**********************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this chapter we document the parallel FFTW routines for parallel
							 | 
						|||
| 
								 | 
							
								systems supporting the MPI message-passing interface.  Unlike the
							 | 
						|||
| 
								 | 
							
								shared-memory threads described in the previous chapter, MPI allows you
							 | 
						|||
| 
								 | 
							
								to use _distributed-memory_ parallelism, where each CPU has its own
							 | 
						|||
| 
								 | 
							
								separate memory, and which can scale up to clusters of many thousands of
							 | 
						|||
| 
								 | 
							
								processors.  This capability comes at a price, however: each process
							 | 
						|||
| 
								 | 
							
								only stores a _portion_ of the data to be transformed, which means that
							 | 
						|||
| 
								 | 
							
								the data structures and programming-interface are quite different from
							 | 
						|||
| 
								 | 
							
								the serial or threads versions of FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Distributed-memory parallelism is especially useful when you are
							 | 
						|||
| 
								 | 
							
								transforming arrays so large that they do not fit into the memory of a
							 | 
						|||
| 
								 | 
							
								single processor.  The storage per-process required by FFTW's MPI
							 | 
						|||
| 
								 | 
							
								routines is proportional to the total array size divided by the number
							 | 
						|||
| 
								 | 
							
								of processes.  Conversely, distributed-memory parallelism can easily
							 | 
						|||
| 
								 | 
							
								pose an unacceptably high communications overhead for small problems;
							 | 
						|||
| 
								 | 
							
								the threshold problem size for which parallelism becomes advantageous
							 | 
						|||
| 
								 | 
							
								will depend on the precise problem you are interested in, your hardware,
							 | 
						|||
| 
								 | 
							
								and your MPI implementation.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   A note on terminology: in MPI, you divide the data among a set of
							 | 
						|||
| 
								 | 
							
								"processes" which each run in their own memory address space.
							 | 
						|||
| 
								 | 
							
								Generally, each process runs on a different physical processor, but this
							 | 
						|||
| 
								 | 
							
								is not required.  A set of processes in MPI is described by an opaque
							 | 
						|||
| 
								 | 
							
								data structure called a "communicator," the most common of which is the
							 | 
						|||
| 
								 | 
							
								predefined communicator 'MPI_COMM_WORLD' which refers to _all_
							 | 
						|||
| 
								 | 
							
								processes.  For more information on these and other concepts common to
							 | 
						|||
| 
								 | 
							
								all MPI programs, we refer the reader to the documentation at the MPI
							 | 
						|||
| 
								 | 
							
								home page (http://www.mcs.anl.gov/research/projects/mpi/).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We assume in this chapter that the reader is familiar with the usage
							 | 
						|||
| 
								 | 
							
								of the serial (uniprocessor) FFTW, and focus only on the concepts new to
							 | 
						|||
| 
								 | 
							
								the MPI interface.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Installation::
							 | 
						|||
| 
								 | 
							
								* Linking and Initializing MPI FFTW::
							 | 
						|||
| 
								 | 
							
								* 2d MPI example::
							 | 
						|||
| 
								 | 
							
								* MPI Data Distribution::
							 | 
						|||
| 
								 | 
							
								* Multi-dimensional MPI DFTs of Real Data::
							 | 
						|||
| 
								 | 
							
								* Other Multi-dimensional Real-data MPI Transforms::
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Transposes::
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Wisdom::
							 | 
						|||
| 
								 | 
							
								* Avoiding MPI Deadlocks::
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Performance Tips::
							 | 
						|||
| 
								 | 
							
								* Combining MPI and Threads::
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Reference::
							 | 
						|||
| 
								 | 
							
								* FFTW MPI Fortran Interface::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Installation,  Next: Linking and Initializing MPI FFTW,  Prev: Distributed-memory FFTW with MPI,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.1 FFTW MPI Installation
							 | 
						|||
| 
								 | 
							
								=========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								All of the FFTW MPI code is located in the 'mpi' subdirectory of the
							 | 
						|||
| 
								 | 
							
								FFTW package.  On Unix systems, the FFTW MPI libraries and header files
							 | 
						|||
| 
								 | 
							
								are automatically configured, compiled, and installed along with the
							 | 
						|||
| 
								 | 
							
								uniprocessor FFTW libraries simply by including '--enable-mpi' in the
							 | 
						|||
| 
								 | 
							
								flags to the 'configure' script (*note Installation on Unix::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Any implementation of the MPI standard, version 1 or later, should
							 | 
						|||
| 
								 | 
							
								work with FFTW. The 'configure' script will attempt to automatically
							 | 
						|||
| 
								 | 
							
								detect how to compile and link code using your MPI implementation.  In
							 | 
						|||
| 
								 | 
							
								some cases, especially if you have multiple different MPI
							 | 
						|||
| 
								 | 
							
								implementations installed or have an unusual MPI software package, you
							 | 
						|||
| 
								 | 
							
								may need to provide this information explicitly.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Most commonly, one compiles MPI code by invoking a special compiler
							 | 
						|||
| 
								 | 
							
								command, typically 'mpicc' for C code.  The 'configure' script knows the
							 | 
						|||
| 
								 | 
							
								most common names for this command, but you can specify the MPI
							 | 
						|||
| 
								 | 
							
								compilation command explicitly by setting the 'MPICC' variable, as in
							 | 
						|||
| 
								 | 
							
								'./configure MPICC=mpicc ...'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If, instead of a special compiler command, you need to link a certain
							 | 
						|||
| 
								 | 
							
								library, you can specify the link command via the 'MPILIBS' variable, as
							 | 
						|||
| 
								 | 
							
								in './configure MPILIBS=-lmpi ...'.  Note that if your MPI library is
							 | 
						|||
| 
								 | 
							
								installed in a non-standard location (one the compiler does not know
							 | 
						|||
| 
								 | 
							
								about by default), you may also have to specify the location of the
							 | 
						|||
| 
								 | 
							
								library and header files via 'LDFLAGS' and 'CPPFLAGS' variables,
							 | 
						|||
| 
								 | 
							
								respectively, as in './configure LDFLAGS=-L/path/to/mpi/libs
							 | 
						|||
| 
								 | 
							
								CPPFLAGS=-I/path/to/mpi/include ...'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Linking and Initializing MPI FFTW,  Next: 2d MPI example,  Prev: FFTW MPI Installation,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.2 Linking and Initializing MPI FFTW
							 | 
						|||
| 
								 | 
							
								=====================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Programs using the MPI FFTW routines should be linked with '-lfftw3_mpi
							 | 
						|||
| 
								 | 
							
								-lfftw3 -lm' on Unix in double precision, '-lfftw3f_mpi -lfftw3f -lm' in
							 | 
						|||
| 
								 | 
							
								single precision, and so on (*note Precision::).  You will also need to
							 | 
						|||
| 
								 | 
							
								link with whatever library is responsible for MPI on your system; in
							 | 
						|||
| 
								 | 
							
								most MPI implementations, there is a special compiler alias named
							 | 
						|||
| 
								 | 
							
								'mpicc' to compile and link MPI code.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Before calling any FFTW routines except possibly 'fftw_init_threads'
							 | 
						|||
| 
								 | 
							
								(*note Combining MPI and Threads::), but after calling 'MPI_Init', you
							 | 
						|||
| 
								 | 
							
								should call the function:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_init(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If, at the end of your program, you want to get rid of all memory and
							 | 
						|||
| 
								 | 
							
								other resources allocated internally by FFTW, for both the serial and
							 | 
						|||
| 
								 | 
							
								MPI routines, you can call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_cleanup(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   which is much like the 'fftw_cleanup()' function except that it also
							 | 
						|||
| 
								 | 
							
								gets rid of FFTW's MPI-related data.  You must _not_ execute any
							 | 
						|||
| 
								 | 
							
								previously created plans after calling this function.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: 2d MPI example,  Next: MPI Data Distribution,  Prev: Linking and Initializing MPI FFTW,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.3 2d MPI example
							 | 
						|||
| 
								 | 
							
								==================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Before we document the FFTW MPI interface in detail, we begin with a
							 | 
						|||
| 
								 | 
							
								simple example outlining how one would perform a two-dimensional 'N0' by
							 | 
						|||
| 
								 | 
							
								'N1' complex DFT.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #include <fftw3-mpi.h>
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int main(int argc, char **argv)
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         const ptrdiff_t N0 = ..., N1 = ...;
							 | 
						|||
| 
								 | 
							
								         fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								         fftw_complex *data;
							 | 
						|||
| 
								 | 
							
								         ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         MPI_Init(&argc, &argv);
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_init();
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* get local data size and allocate */
							 | 
						|||
| 
								 | 
							
								         alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                              &local_n0, &local_0_start);
							 | 
						|||
| 
								 | 
							
								         data = fftw_alloc_complex(alloc_local);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* create plan for in-place forward DFT */
							 | 
						|||
| 
								 | 
							
								         plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                     FFTW_FORWARD, FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* initialize data to some function my_function(x,y) */
							 | 
						|||
| 
								 | 
							
								         for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
							 | 
						|||
| 
								 | 
							
								            data[i*N1 + j] = my_function(local_0_start + i, j);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* compute transforms, in-place, as many times as desired */
							 | 
						|||
| 
								 | 
							
								         fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         MPI_Finalize();
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As can be seen above, the MPI interface follows the same basic style
							 | 
						|||
| 
								 | 
							
								of allocate/plan/execute/destroy as the serial FFTW routines.  All of
							 | 
						|||
| 
								 | 
							
								the MPI-specific routines are prefixed with 'fftw_mpi_' instead of
							 | 
						|||
| 
								 | 
							
								'fftw_'.  There are a few important differences, however:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, we must call 'fftw_mpi_init()' after calling 'MPI_Init'
							 | 
						|||
| 
								 | 
							
								(required in all MPI programs) and before calling any other 'fftw_mpi_'
							 | 
						|||
| 
								 | 
							
								routine.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, when we create the plan with 'fftw_mpi_plan_dft_2d',
							 | 
						|||
| 
								 | 
							
								analogous to 'fftw_plan_dft_2d', we pass an additional argument: the
							 | 
						|||
| 
								 | 
							
								communicator, indicating which processes will participate in the
							 | 
						|||
| 
								 | 
							
								transform (here 'MPI_COMM_WORLD', indicating all processes).  Whenever
							 | 
						|||
| 
								 | 
							
								you create, execute, or destroy a plan for an MPI transform, you must
							 | 
						|||
| 
								 | 
							
								call the corresponding FFTW routine on _all_ processes in the
							 | 
						|||
| 
								 | 
							
								communicator for that transform.  (That is, these are _collective_
							 | 
						|||
| 
								 | 
							
								calls.)  Note that the plan for the MPI transform uses the standard
							 | 
						|||
| 
								 | 
							
								'fftw_execute' and 'fftw_destroy' routines (on the other hand, there are
							 | 
						|||
| 
								 | 
							
								MPI-specific new-array execute functions documented below).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Third, all of the FFTW MPI routines take 'ptrdiff_t' arguments
							 | 
						|||
| 
								 | 
							
								instead of 'int' as for the serial FFTW. 'ptrdiff_t' is a standard C
							 | 
						|||
| 
								 | 
							
								integer type which is (at least) 32 bits wide on a 32-bit machine and 64
							 | 
						|||
| 
								 | 
							
								bits wide on a 64-bit machine.  This is to make it easy to specify very
							 | 
						|||
| 
								 | 
							
								large parallel transforms on a 64-bit machine.  (You can specify 64-bit
							 | 
						|||
| 
								 | 
							
								transform sizes in the serial FFTW, too, but only by using the 'guru64'
							 | 
						|||
| 
								 | 
							
								planner interface.  *Note 64-bit Guru Interface::.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Fourth, and most importantly, you don't allocate the entire
							 | 
						|||
| 
								 | 
							
								two-dimensional array on each process.  Instead, you call
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_local_size_2d' to find out what _portion_ of the array resides
							 | 
						|||
| 
								 | 
							
								on each processor, and how much space to allocate.  Here, the portion of
							 | 
						|||
| 
								 | 
							
								the array on each process is a 'local_n0' by 'N1' slice of the total
							 | 
						|||
| 
								 | 
							
								array, starting at index 'local_0_start'.  The total number of
							 | 
						|||
| 
								 | 
							
								'fftw_complex' numbers to allocate is given by the 'alloc_local' return
							 | 
						|||
| 
								 | 
							
								value, which _may_ be greater than 'local_n0 * N1' (in case some
							 | 
						|||
| 
								 | 
							
								intermediate calculations require additional storage).  The data
							 | 
						|||
| 
								 | 
							
								distribution in FFTW's MPI interface is described in more detail by the
							 | 
						|||
| 
								 | 
							
								next section.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given the portion of the array that resides on the local process, it
							 | 
						|||
| 
								 | 
							
								is straightforward to initialize the data (here to a function
							 | 
						|||
| 
								 | 
							
								'myfunction') and otherwise manipulate it.  Of course, at the end of the
							 | 
						|||
| 
								 | 
							
								program you may want to output the data somehow, but synchronizing this
							 | 
						|||
| 
								 | 
							
								output is up to you and is beyond the scope of this manual.  (One good
							 | 
						|||
| 
								 | 
							
								way to output a large multi-dimensional distributed array in MPI to a
							 | 
						|||
| 
								 | 
							
								portable binary file is to use the free HDF5 library; see the HDF home
							 | 
						|||
| 
								 | 
							
								page (http://www.hdfgroup.org/).)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Data Distribution,  Next: Multi-dimensional MPI DFTs of Real Data,  Prev: 2d MPI example,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.4 MPI Data Distribution
							 | 
						|||
| 
								 | 
							
								=========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The most important concept to understand in using FFTW's MPI interface
							 | 
						|||
| 
								 | 
							
								is the data distribution.  With a serial or multithreaded FFT, all of
							 | 
						|||
| 
								 | 
							
								the inputs and outputs are stored as a single contiguous chunk of
							 | 
						|||
| 
								 | 
							
								memory.  With a distributed-memory FFT, the inputs and outputs are
							 | 
						|||
| 
								 | 
							
								broken into disjoint blocks, one per process.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, FFTW uses a _1d block distribution_ of the data,
							 | 
						|||
| 
								 | 
							
								distributed along the _first dimension_.  For example, if you want to
							 | 
						|||
| 
								 | 
							
								perform a 100 x 200 complex DFT, distributed over 4 processes, each
							 | 
						|||
| 
								 | 
							
								process will get a 25 x 200 slice of the data.  That is, process 0 will
							 | 
						|||
| 
								 | 
							
								get rows 0 through 24, process 1 will get rows 25 through 49, process 2
							 | 
						|||
| 
								 | 
							
								will get rows 50 through 74, and process 3 will get rows 75 through 99.
							 | 
						|||
| 
								 | 
							
								If you take the same array but distribute it over 3 processes, then it
							 | 
						|||
| 
								 | 
							
								is not evenly divisible so the different processes will have unequal
							 | 
						|||
| 
								 | 
							
								chunks.  FFTW's default choice in this case is to assign 34 rows to
							 | 
						|||
| 
								 | 
							
								processes 0 and 1, and 32 rows to process 2.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW provides several 'fftw_mpi_local_size' routines that you can
							 | 
						|||
| 
								 | 
							
								call to find out what portion of an array is stored on the current
							 | 
						|||
| 
								 | 
							
								process.  In most cases, you should use the default block sizes picked
							 | 
						|||
| 
								 | 
							
								by FFTW, but it is also possible to specify your own block size.  For
							 | 
						|||
| 
								 | 
							
								example, with a 100 x 200 array on three processes, you can tell FFTW to
							 | 
						|||
| 
								 | 
							
								use a block size of 40, which would assign 40 rows to processes 0 and 1,
							 | 
						|||
| 
								 | 
							
								and 20 rows to process 2.  FFTW's default is to divide the data equally
							 | 
						|||
| 
								 | 
							
								among the processes if possible, and as best it can otherwise.  The rows
							 | 
						|||
| 
								 | 
							
								are always assigned in "rank order," i.e.  process 0 gets the first
							 | 
						|||
| 
								 | 
							
								block of rows, then process 1, and so on.  (You can change this by using
							 | 
						|||
| 
								 | 
							
								'MPI_Comm_split' to create a new communicator with re-ordered
							 | 
						|||
| 
								 | 
							
								processes.)  However, you should always call the 'fftw_mpi_local_size'
							 | 
						|||
| 
								 | 
							
								routines, if possible, rather than trying to predict FFTW's distribution
							 | 
						|||
| 
								 | 
							
								choices.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, it is critical that you allocate the storage size that
							 | 
						|||
| 
								 | 
							
								is returned by 'fftw_mpi_local_size', which is _not_ necessarily the
							 | 
						|||
| 
								 | 
							
								size of the local slice of the array.  The reason is that intermediate
							 | 
						|||
| 
								 | 
							
								steps of FFTW's algorithms involve transposing the array and
							 | 
						|||
| 
								 | 
							
								redistributing the data, so at these intermediate steps FFTW may require
							 | 
						|||
| 
								 | 
							
								more local storage space (albeit always proportional to the total size
							 | 
						|||
| 
								 | 
							
								divided by the number of processes).  The 'fftw_mpi_local_size'
							 | 
						|||
| 
								 | 
							
								functions know how much storage is required for these intermediate steps
							 | 
						|||
| 
								 | 
							
								and tell you the correct amount to allocate.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Basic and advanced distribution interfaces::
							 | 
						|||
| 
								 | 
							
								* Load balancing::
							 | 
						|||
| 
								 | 
							
								* Transposed distributions::
							 | 
						|||
| 
								 | 
							
								* One-dimensional distributions::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Basic and advanced distribution interfaces,  Next: Load balancing,  Prev: MPI Data Distribution,  Up: MPI Data Distribution
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.4.1 Basic and advanced distribution interfaces
							 | 
						|||
| 
								 | 
							
								------------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								As with the planner interface, the 'fftw_mpi_local_size' distribution
							 | 
						|||
| 
								 | 
							
								interface is broken into basic and advanced ('_many') interfaces, where
							 | 
						|||
| 
								 | 
							
								the latter allows you to specify the block size manually and also to
							 | 
						|||
| 
								 | 
							
								request block sizes when computing multiple transforms simultaneously.
							 | 
						|||
| 
								 | 
							
								These functions are documented more exhaustively by the FFTW MPI
							 | 
						|||
| 
								 | 
							
								Reference, but we summarize the basic ideas here using a couple of
							 | 
						|||
| 
								 | 
							
								two-dimensional examples.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the 100 x 200 complex-DFT example, above, we would find the
							 | 
						|||
| 
								 | 
							
								distribution by calling the following function in the basic interface:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given the total size of the data to be transformed (here, 'n0 = 100'
							 | 
						|||
| 
								 | 
							
								and 'n1 = 200') and an MPI communicator ('comm'), this function provides
							 | 
						|||
| 
								 | 
							
								three numbers.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, it describes the shape of the local data: the current process
							 | 
						|||
| 
								 | 
							
								should store a 'local_n0' by 'n1' slice of the overall dataset, in
							 | 
						|||
| 
								 | 
							
								row-major order ('n1' dimension contiguous), starting at index
							 | 
						|||
| 
								 | 
							
								'local_0_start'.  That is, if the total dataset is viewed as a 'n0' by
							 | 
						|||
| 
								 | 
							
								'n1' matrix, the current process should store the rows 'local_0_start'
							 | 
						|||
| 
								 | 
							
								to 'local_0_start+local_n0-1'.  Obviously, if you are running with only
							 | 
						|||
| 
								 | 
							
								a single MPI process, that process will store the entire array:
							 | 
						|||
| 
								 | 
							
								'local_0_start' will be zero and 'local_n0' will be 'n0'.  *Note
							 | 
						|||
| 
								 | 
							
								Row-major Format::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, the return value is the total number of data elements (e.g.,
							 | 
						|||
| 
								 | 
							
								complex numbers for a complex DFT) that should be allocated for the
							 | 
						|||
| 
								 | 
							
								input and output arrays on the current process (ideally with
							 | 
						|||
| 
								 | 
							
								'fftw_malloc' or an 'fftw_alloc' function, to ensure optimal alignment).
							 | 
						|||
| 
								 | 
							
								It might seem that this should always be equal to 'local_n0 * n1', but
							 | 
						|||
| 
								 | 
							
								this is _not_ the case.  FFTW's distributed FFT algorithms require data
							 | 
						|||
| 
								 | 
							
								redistributions at intermediate stages of the transform, and in some
							 | 
						|||
| 
								 | 
							
								circumstances this may require slightly larger local storage.  This is
							 | 
						|||
| 
								 | 
							
								discussed in more detail below, under *note Load balancing::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The advanced-interface 'local_size' function for multidimensional
							 | 
						|||
| 
								 | 
							
								transforms returns the same three things ('local_n0', 'local_0_start',
							 | 
						|||
| 
								 | 
							
								and the total number of elements to allocate), but takes more inputs:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t block0,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t *local_n0,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The two-dimensional case above corresponds to 'rnk = 2' and an array
							 | 
						|||
| 
								 | 
							
								'n' of length 2 with 'n[0] = n0' and 'n[1] = n1'.  This routine is for
							 | 
						|||
| 
								 | 
							
								any 'rnk > 1'; one-dimensional transforms have their own interface
							 | 
						|||
| 
								 | 
							
								because they work slightly differently, as discussed below.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, the advanced interface allows you to perform multiple
							 | 
						|||
| 
								 | 
							
								transforms at once, of interleaved data, as specified by the 'howmany'
							 | 
						|||
| 
								 | 
							
								parameter.  ('hoamany' is 1 for a single transform.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, here you can specify your desired block size in the 'n0'
							 | 
						|||
| 
								 | 
							
								dimension, 'block0'.  To use FFTW's default block size, pass
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_DEFAULT_BLOCK' (0) for 'block0'.  Otherwise, on 'P' processes,
							 | 
						|||
| 
								 | 
							
								FFTW will return 'local_n0' equal to 'block0' on the first 'P / block0'
							 | 
						|||
| 
								 | 
							
								processes (rounded down), return 'local_n0' equal to 'n0 - block0 * (P /
							 | 
						|||
| 
								 | 
							
								block0)' on the next process, and 'local_n0' equal to zero on any
							 | 
						|||
| 
								 | 
							
								remaining processes.  In general, we recommend using the default block
							 | 
						|||
| 
								 | 
							
								size (which corresponds to 'n0 / P', rounded up).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, suppose you have 'P = 4' processes and 'n0 = 21'.  The
							 | 
						|||
| 
								 | 
							
								default will be a block size of '6', which will give 'local_n0 = 6' on
							 | 
						|||
| 
								 | 
							
								the first three processes and 'local_n0 = 3' on the last process.
							 | 
						|||
| 
								 | 
							
								Instead, however, you could specify 'block0 = 5' if you wanted, which
							 | 
						|||
| 
								 | 
							
								would give 'local_n0 = 5' on processes 0 to 2, 'local_n0 = 6' on process
							 | 
						|||
| 
								 | 
							
								3.  (This choice, while it may look superficially more "balanced," has
							 | 
						|||
| 
								 | 
							
								the same critical path as FFTW's default but requires more
							 | 
						|||
| 
								 | 
							
								communications.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Load balancing,  Next: Transposed distributions,  Prev: Basic and advanced distribution interfaces,  Up: MPI Data Distribution
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.4.2 Load balancing
							 | 
						|||
| 
								 | 
							
								--------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Ideally, when you parallelize a transform over some P processes, each
							 | 
						|||
| 
								 | 
							
								process should end up with work that takes equal time.  Otherwise, all
							 | 
						|||
| 
								 | 
							
								of the processes end up waiting on whichever process is slowest.  This
							 | 
						|||
| 
								 | 
							
								goal is known as "load balancing."  In this section, we describe the
							 | 
						|||
| 
								 | 
							
								circumstances under which FFTW is able to load-balance well, and in
							 | 
						|||
| 
								 | 
							
								particular how you should choose your transform size in order to load
							 | 
						|||
| 
								 | 
							
								balance.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Load balancing is especially difficult when you are parallelizing
							 | 
						|||
| 
								 | 
							
								over heterogeneous machines; for example, if one of your processors is a
							 | 
						|||
| 
								 | 
							
								old 486 and another is a Pentium IV, obviously you should give the
							 | 
						|||
| 
								 | 
							
								Pentium more work to do than the 486 since the latter is much slower.
							 | 
						|||
| 
								 | 
							
								FFTW does not deal with this problem, however--it assumes that your
							 | 
						|||
| 
								 | 
							
								processes run on hardware of comparable speed, and that the goal is
							 | 
						|||
| 
								 | 
							
								therefore to divide the problem as equally as possible.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For a multi-dimensional complex DFT, FFTW can divide the problem
							 | 
						|||
| 
								 | 
							
								equally among the processes if: (i) the _first_ dimension 'n0' is
							 | 
						|||
| 
								 | 
							
								divisible by P; and (ii), the _product_ of the subsequent dimensions is
							 | 
						|||
| 
								 | 
							
								divisible by P. (For the advanced interface, where you can specify
							 | 
						|||
| 
								 | 
							
								multiple simultaneous transforms via some "vector" length 'howmany', a
							 | 
						|||
| 
								 | 
							
								factor of 'howmany' is included in the product of the subsequent
							 | 
						|||
| 
								 | 
							
								dimensions.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For a one-dimensional complex DFT, the length 'N' of the data should
							 | 
						|||
| 
								 | 
							
								be divisible by P _squared_ to be able to divide the problem equally
							 | 
						|||
| 
								 | 
							
								among the processes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Transposed distributions,  Next: One-dimensional distributions,  Prev: Load balancing,  Up: MPI Data Distribution
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.4.3 Transposed distributions
							 | 
						|||
| 
								 | 
							
								------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Internally, FFTW's MPI transform algorithms work by first computing
							 | 
						|||
| 
								 | 
							
								transforms of the data local to each process, then by globally
							 | 
						|||
| 
								 | 
							
								_transposing_ the data in some fashion to redistribute the data among
							 | 
						|||
| 
								 | 
							
								the processes, transforming the new data local to each process, and
							 | 
						|||
| 
								 | 
							
								transposing back.  For example, a two-dimensional 'n0' by 'n1' array,
							 | 
						|||
| 
								 | 
							
								distributed across the 'n0' dimension, is transformd by: (i)
							 | 
						|||
| 
								 | 
							
								transforming the 'n1' dimension, which are local to each process; (ii)
							 | 
						|||
| 
								 | 
							
								transposing to an 'n1' by 'n0' array, distributed across the 'n1'
							 | 
						|||
| 
								 | 
							
								dimension; (iii) transforming the 'n0' dimension, which is now local to
							 | 
						|||
| 
								 | 
							
								each process; (iv) transposing back.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   However, in many applications it is acceptable to compute a
							 | 
						|||
| 
								 | 
							
								multidimensional DFT whose results are produced in transposed order
							 | 
						|||
| 
								 | 
							
								(e.g., 'n1' by 'n0' in two dimensions).  This provides a significant
							 | 
						|||
| 
								 | 
							
								performance advantage, because it means that the final transposition
							 | 
						|||
| 
								 | 
							
								step can be omitted.  FFTW supports this optimization, which you specify
							 | 
						|||
| 
								 | 
							
								by passing the flag 'FFTW_MPI_TRANSPOSED_OUT' to the planner routines.
							 | 
						|||
| 
								 | 
							
								To compute the inverse transform of transposed output, you specify
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_TRANSPOSED_IN' to tell it that the input is transposed.  In
							 | 
						|||
| 
								 | 
							
								this section, we explain how to interpret the output format of such a
							 | 
						|||
| 
								 | 
							
								transform.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Suppose you have are transforming multi-dimensional data with (at
							 | 
						|||
| 
								 | 
							
								least two) dimensions n[0] x n[1] x n[2] x ...  x n[d-1] .  As always,
							 | 
						|||
| 
								 | 
							
								it is distributed along the first dimension n[0] .  Now, if we compute
							 | 
						|||
| 
								 | 
							
								its DFT with the 'FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output
							 | 
						|||
| 
								 | 
							
								data are stored with the first _two_ dimensions transposed: n[1] x n[0]
							 | 
						|||
| 
								 | 
							
								x n[2] x ...  x n[d-1] , distributed along the n[1] dimension.
							 | 
						|||
| 
								 | 
							
								Conversely, if we take the n[1] x n[0] x n[2] x ...  x n[d-1] data and
							 | 
						|||
| 
								 | 
							
								transform it with the 'FFTW_MPI_TRANSPOSED_IN' flag, then the format
							 | 
						|||
| 
								 | 
							
								goes back to the original n[0] x n[1] x n[2] x ...  x n[d-1] array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are two ways to find the portion of the transposed array that
							 | 
						|||
| 
								 | 
							
								resides on the current process.  First, you can simply call the
							 | 
						|||
| 
								 | 
							
								appropriate 'local_size' function, passing n[1] x n[0] x n[2] x ...  x
							 | 
						|||
| 
								 | 
							
								n[d-1] (the transposed dimensions).  This would mean calling the
							 | 
						|||
| 
								 | 
							
								'local_size' function twice, once for the transposed and once for the
							 | 
						|||
| 
								 | 
							
								non-transposed dimensions.  Alternatively, you can call one of the
							 | 
						|||
| 
								 | 
							
								'local_size_transposed' functions, which returns both the non-transposed
							 | 
						|||
| 
								 | 
							
								and transposed data distribution from a single call.  For example, for a
							 | 
						|||
| 
								 | 
							
								3d transform with transposed output (or input), you might call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_3d_transposed(
							 | 
						|||
| 
								 | 
							
								                     ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                     ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                     ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Here, 'local_n0' and 'local_0_start' give the size and starting index
							 | 
						|||
| 
								 | 
							
								of the 'n0' dimension for the _non_-transposed data, as in the previous
							 | 
						|||
| 
								 | 
							
								sections.  For _transposed_ data (e.g.  the output for
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_TRANSPOSED_OUT'), 'local_n1' and 'local_1_start' give the size
							 | 
						|||
| 
								 | 
							
								and starting index of the 'n1' dimension, which is the first dimension
							 | 
						|||
| 
								 | 
							
								of the transposed data ('n1' by 'n0' by 'n2').
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (Note that 'FFTW_MPI_TRANSPOSED_IN' is completely equivalent to
							 | 
						|||
| 
								 | 
							
								performing 'FFTW_MPI_TRANSPOSED_OUT' and passing the first two
							 | 
						|||
| 
								 | 
							
								dimensions to the planner in reverse order, or vice versa.  If you pass
							 | 
						|||
| 
								 | 
							
								_both_ the 'FFTW_MPI_TRANSPOSED_IN' and 'FFTW_MPI_TRANSPOSED_OUT' flags,
							 | 
						|||
| 
								 | 
							
								it is equivalent to swapping the first two dimensions passed to the
							 | 
						|||
| 
								 | 
							
								planner and passing _neither_ flag.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: One-dimensional distributions,  Prev: Transposed distributions,  Up: MPI Data Distribution
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.4.4 One-dimensional distributions
							 | 
						|||
| 
								 | 
							
								-----------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								For one-dimensional distributed DFTs using FFTW, matters are slightly
							 | 
						|||
| 
								 | 
							
								more complicated because the data distribution is more closely tied to
							 | 
						|||
| 
								 | 
							
								how the algorithm works.  In particular, you can no longer pass an
							 | 
						|||
| 
								 | 
							
								arbitrary block size and must accept FFTW's default; also, the block
							 | 
						|||
| 
								 | 
							
								sizes may be different for input and output.  Also, the data
							 | 
						|||
| 
								 | 
							
								distribution depends on the flags and transform direction, in order for
							 | 
						|||
| 
								 | 
							
								forward and backward transforms to work correctly.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                     int sign, unsigned flags,
							 | 
						|||
| 
								 | 
							
								                     ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
							 | 
						|||
| 
								 | 
							
								                     ptrdiff_t *local_no, ptrdiff_t *local_o_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This function computes the data distribution for a 1d transform of
							 | 
						|||
| 
								 | 
							
								size 'n0' with the given transform 'sign' and 'flags'.  Both input and
							 | 
						|||
| 
								 | 
							
								output data use block distributions.  The input on the current process
							 | 
						|||
| 
								 | 
							
								will consist of 'local_ni' numbers starting at index 'local_i_start';
							 | 
						|||
| 
								 | 
							
								e.g.  if only a single process is used, then 'local_ni' will be 'n0' and
							 | 
						|||
| 
								 | 
							
								'local_i_start' will be '0'.  Similarly for the output, with 'local_no'
							 | 
						|||
| 
								 | 
							
								numbers starting at index 'local_o_start'.  The return value of
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_local_size_1d' will be the total number of elements to
							 | 
						|||
| 
								 | 
							
								allocate on the current process (which might be slightly larger than the
							 | 
						|||
| 
								 | 
							
								local size due to intermediate steps in the algorithm).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As mentioned above (*note Load balancing::), the data will be divided
							 | 
						|||
| 
								 | 
							
								equally among the processes if 'n0' is divisible by the _square_ of the
							 | 
						|||
| 
								 | 
							
								number of processes.  In this case, 'local_ni' will equal 'local_no'.
							 | 
						|||
| 
								 | 
							
								Otherwise, they may be different.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For some applications, such as convolutions, the order of the output
							 | 
						|||
| 
								 | 
							
								data is irrelevant.  In this case, performance can be improved by
							 | 
						|||
| 
								 | 
							
								specifying that the output data be stored in an FFTW-defined "scrambled"
							 | 
						|||
| 
								 | 
							
								format.  (In particular, this is the analogue of transposed output in
							 | 
						|||
| 
								 | 
							
								the multidimensional case: scrambled output saves a communications
							 | 
						|||
| 
								 | 
							
								step.)  If you pass 'FFTW_MPI_SCRAMBLED_OUT' in the flags, then the
							 | 
						|||
| 
								 | 
							
								output is stored in this (undocumented) scrambled order.  Conversely, to
							 | 
						|||
| 
								 | 
							
								perform the inverse transform of data in scrambled order, pass the
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_SCRAMBLED_IN' flag.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In MPI FFTW, only composite sizes 'n0' can be parallelized; we have
							 | 
						|||
| 
								 | 
							
								not yet implemented a parallel algorithm for large prime sizes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Multi-dimensional MPI DFTs of Real Data,  Next: Other Multi-dimensional Real-data MPI Transforms,  Prev: MPI Data Distribution,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.5 Multi-dimensional MPI DFTs of Real Data
							 | 
						|||
| 
								 | 
							
								===========================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW's MPI interface also supports multi-dimensional DFTs of real data,
							 | 
						|||
| 
								 | 
							
								similar to the serial r2c and c2r interfaces.  (Parallel one-dimensional
							 | 
						|||
| 
								 | 
							
								real-data DFTs are not currently supported; you must use a complex
							 | 
						|||
| 
								 | 
							
								transform and set the imaginary parts of the inputs to zero.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The key points to understand for r2c and c2r MPI transforms (compared
							 | 
						|||
| 
								 | 
							
								to the MPI complex DFTs or the serial r2c/c2r transforms), are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1] x
							 | 
						|||
| 
								 | 
							
								     n[2] x ...  x n[d-1] real data to/from n[0] x n[1] x n[2] x ...  x
							 | 
						|||
| 
								 | 
							
								     (n[d-1]/2 + 1) complex data: the last dimension of the complex data
							 | 
						|||
| 
								 | 
							
								     is cut in half (rounded down), plus one.  As for the serial
							 | 
						|||
| 
								 | 
							
								     transforms, the sizes you pass to the 'plan_dft_r2c' and
							 | 
						|||
| 
								 | 
							
								     'plan_dft_c2r' are the n[0] x n[1] x n[2] x ...  x n[d-1]
							 | 
						|||
| 
								 | 
							
								     dimensions of the real data.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
							 | 
						|||
| 
								 | 
							
								     x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x ...
							 | 
						|||
| 
								 | 
							
								     x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
							 | 
						|||
| 
								 | 
							
								     _padded_ to make it the same size as the complex output.  This is
							 | 
						|||
| 
								 | 
							
								     much like the in-place serial r2c/c2r interface (*note
							 | 
						|||
| 
								 | 
							
								     Multi-Dimensional DFTs of Real Data::), except that in MPI the
							 | 
						|||
| 
								 | 
							
								     padding is required even for out-of-place data.  The extra padding
							 | 
						|||
| 
								 | 
							
								     numbers are ignored by FFTW (they are _not_ like zero-padding the
							 | 
						|||
| 
								 | 
							
								     transform to a larger size); they are only used to determine the
							 | 
						|||
| 
								 | 
							
								     data layout.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The data distribution in MPI for _both_ the real and complex data
							 | 
						|||
| 
								 | 
							
								     is determined by the shape of the _complex_ data.  That is, you
							 | 
						|||
| 
								 | 
							
								     call the appropriate 'local size' function for the n[0] x n[1] x
							 | 
						|||
| 
								 | 
							
								     n[2] x ...  x (n[d-1]/2 + 1) complex data, and then use the _same_
							 | 
						|||
| 
								 | 
							
								     distribution for the real data except that the last complex
							 | 
						|||
| 
								 | 
							
								     dimension is replaced by a (padded) real dimension of twice the
							 | 
						|||
| 
								 | 
							
								     length.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example suppose we are performing an out-of-place r2c transform
							 | 
						|||
| 
								 | 
							
								of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x M
							 | 
						|||
| 
								 | 
							
								x N/2+1 complex data.  Similar to the example in *note 2d MPI example::,
							 | 
						|||
| 
								 | 
							
								we might do something like:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #include <fftw3-mpi.h>
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int main(int argc, char **argv)
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         const ptrdiff_t L = ..., M = ..., N = ...;
							 | 
						|||
| 
								 | 
							
								         fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								         double *rin;
							 | 
						|||
| 
								 | 
							
								         fftw_complex *cout;
							 | 
						|||
| 
								 | 
							
								         ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         MPI_Init(&argc, &argv);
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_init();
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* get local data size and allocate */
							 | 
						|||
| 
								 | 
							
								         alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                              &local_n0, &local_0_start);
							 | 
						|||
| 
								 | 
							
								         rin = fftw_alloc_real(2 * alloc_local);
							 | 
						|||
| 
								 | 
							
								         cout = fftw_alloc_complex(alloc_local);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* create plan for out-of-place r2c DFT */
							 | 
						|||
| 
								 | 
							
								         plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                         FFTW_MEASURE);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* initialize rin to some function my_func(x,y,z) */
							 | 
						|||
| 
								 | 
							
								         for (i = 0; i < local_n0; ++i)
							 | 
						|||
| 
								 | 
							
								            for (j = 0; j < M; ++j)
							 | 
						|||
| 
								 | 
							
								              for (k = 0; k < N; ++k)
							 | 
						|||
| 
								 | 
							
								            rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* compute transforms as many times as desired */
							 | 
						|||
| 
								 | 
							
								         fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         MPI_Finalize();
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that we allocated 'rin' using 'fftw_alloc_real' with an argument
							 | 
						|||
| 
								 | 
							
								of '2 * alloc_local': since 'alloc_local' is the number of _complex_
							 | 
						|||
| 
								 | 
							
								values to allocate, the number of _real_ values is twice as many.  The
							 | 
						|||
| 
								 | 
							
								'rin' array is then local_n0 x M x 2(N/2+1) in row-major order, so its
							 | 
						|||
| 
								 | 
							
								'(i,j,k)' element is at the index '(i*M + j) * (2*(N/2+1)) + k' (*note
							 | 
						|||
| 
								 | 
							
								Multi-dimensional Array Format::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As for the complex transforms, improved performance can be obtained
							 | 
						|||
| 
								 | 
							
								by specifying that the output is the transpose of the input or vice
							 | 
						|||
| 
								 | 
							
								versa (*note Transposed distributions::).  In our L x M x N r2c example,
							 | 
						|||
| 
								 | 
							
								including 'FFTW_TRANSPOSED_OUT' in the flags means that the input would
							 | 
						|||
| 
								 | 
							
								be a padded L x M x 2(N/2+1) real array distributed over the 'L'
							 | 
						|||
| 
								 | 
							
								dimension, while the output would be a M x L x N/2+1 complex array
							 | 
						|||
| 
								 | 
							
								distributed over the 'M' dimension.  To perform the inverse c2r
							 | 
						|||
| 
								 | 
							
								transform with the same data distributions, you would use the
							 | 
						|||
| 
								 | 
							
								'FFTW_TRANSPOSED_IN' flag.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Other Multi-dimensional Real-data MPI Transforms,  Next: FFTW MPI Transposes,  Prev: Multi-dimensional MPI DFTs of Real Data,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.6 Other multi-dimensional Real-Data MPI Transforms
							 | 
						|||
| 
								 | 
							
								====================================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW's MPI interface also supports multi-dimensional 'r2r' transforms of
							 | 
						|||
| 
								 | 
							
								all kinds supported by the serial interface (e.g.  discrete cosine and
							 | 
						|||
| 
								 | 
							
								sine transforms, discrete Hartley transforms, etc.).  Only
							 | 
						|||
| 
								 | 
							
								multi-dimensional 'r2r' transforms, not one-dimensional transforms, are
							 | 
						|||
| 
								 | 
							
								currently parallelized.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These are used much like the multidimensional complex DFTs discussed
							 | 
						|||
| 
								 | 
							
								above, except that the data is real rather than complex, and one needs
							 | 
						|||
| 
								 | 
							
								to pass an r2r transform kind ('fftw_r2r_kind') for each dimension as in
							 | 
						|||
| 
								 | 
							
								the serial FFTW (*note More DFTs of Real Data::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, one might perform a two-dimensional L x M that is an
							 | 
						|||
| 
								 | 
							
								REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the
							 | 
						|||
| 
								 | 
							
								second dimension with code like:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         const ptrdiff_t L = ..., M = ...;
							 | 
						|||
| 
								 | 
							
								         fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								         double *data;
							 | 
						|||
| 
								 | 
							
								         ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* get local data size and allocate */
							 | 
						|||
| 
								 | 
							
								         alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                              &local_n0, &local_0_start);
							 | 
						|||
| 
								 | 
							
								         data = fftw_alloc_real(alloc_local);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* create plan for in-place REDFT10 x RODFT10 */
							 | 
						|||
| 
								 | 
							
								         plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD,
							 | 
						|||
| 
								 | 
							
								                                     FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* initialize data to some function my_function(x,y) */
							 | 
						|||
| 
								 | 
							
								         for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j)
							 | 
						|||
| 
								 | 
							
								            data[i*M + j] = my_function(local_0_start + i, j);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         /* compute transforms, in-place, as many times as desired */
							 | 
						|||
| 
								 | 
							
								         fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Notice that we use the same 'local_size' functions as we did for
							 | 
						|||
| 
								 | 
							
								complex data, only now we interpret the sizes in terms of real rather
							 | 
						|||
| 
								 | 
							
								than complex values, and correspondingly use 'fftw_alloc_real'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Transposes,  Next: FFTW MPI Wisdom,  Prev: Other Multi-dimensional Real-data MPI Transforms,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.7 FFTW MPI Transposes
							 | 
						|||
| 
								 | 
							
								=======================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The FFTW's MPI Fourier transforms rely on one or more _global
							 | 
						|||
| 
								 | 
							
								transposition_ step for their communications.  For example, the
							 | 
						|||
| 
								 | 
							
								multidimensional transforms work by transforming along some dimensions,
							 | 
						|||
| 
								 | 
							
								then transposing to make the first dimension local and transforming
							 | 
						|||
| 
								 | 
							
								that, then transposing back.  Because global transposition of a
							 | 
						|||
| 
								 | 
							
								block-distributed matrix has many other potential uses besides FFTs,
							 | 
						|||
| 
								 | 
							
								FFTW's transpose routines can be called directly, as documented in this
							 | 
						|||
| 
								 | 
							
								section.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Basic distributed-transpose interface::
							 | 
						|||
| 
								 | 
							
								* Advanced distributed-transpose interface::
							 | 
						|||
| 
								 | 
							
								* An improved replacement for MPI_Alltoall::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Basic distributed-transpose interface,  Next: Advanced distributed-transpose interface,  Prev: FFTW MPI Transposes,  Up: FFTW MPI Transposes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.7.1 Basic distributed-transpose interface
							 | 
						|||
| 
								 | 
							
								-------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In particular, suppose that we have an 'n0' by 'n1' array in row-major
							 | 
						|||
| 
								 | 
							
								order, block-distributed across the 'n0' dimension.  To transpose this
							 | 
						|||
| 
								 | 
							
								into an 'n1' by 'n0' array block-distributed across the 'n1' dimension,
							 | 
						|||
| 
								 | 
							
								we would create a plan by calling the following function:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                       double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                       MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The input and output arrays ('in' and 'out') can be the same.  The
							 | 
						|||
| 
								 | 
							
								transpose is actually executed by calling 'fftw_execute' on the plan, as
							 | 
						|||
| 
								 | 
							
								usual.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'flags' are the usual FFTW planner flags, but support two
							 | 
						|||
| 
								 | 
							
								additional flags: 'FFTW_MPI_TRANSPOSED_OUT' and/or
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_TRANSPOSED_IN'.  What these flags indicate, for transpose
							 | 
						|||
| 
								 | 
							
								plans, is that the output and/or input, respectively, are _locally_
							 | 
						|||
| 
								 | 
							
								transposed.  That is, on each process input data is normally stored as a
							 | 
						|||
| 
								 | 
							
								'local_n0' by 'n1' array in row-major order, but for an
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as 'n1' by
							 | 
						|||
| 
								 | 
							
								'local_n0' in row-major order.  Similarly, 'FFTW_MPI_TRANSPOSED_OUT'
							 | 
						|||
| 
								 | 
							
								means that the output is 'n0' by 'local_n1' instead of 'local_n1' by
							 | 
						|||
| 
								 | 
							
								'n0'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To determine the local size of the array on each process before and
							 | 
						|||
| 
								 | 
							
								after the transpose, as well as the amount of storage that must be
							 | 
						|||
| 
								 | 
							
								allocated, one should call 'fftw_mpi_local_size_2d_transposed', just as
							 | 
						|||
| 
								 | 
							
								for a 2d DFT as described in the previous section:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_2d_transposed
							 | 
						|||
| 
								 | 
							
								                     (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                      ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Again, the return value is the local storage to allocate, which in
							 | 
						|||
| 
								 | 
							
								this case is the number of _real_ ('double') values rather than complex
							 | 
						|||
| 
								 | 
							
								numbers as in the previous examples.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Advanced distributed-transpose interface,  Next: An improved replacement for MPI_Alltoall,  Prev: Basic distributed-transpose interface,  Up: FFTW MPI Transposes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.7.2 Advanced distributed-transpose interface
							 | 
						|||
| 
								 | 
							
								----------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The above routines are for a transpose of a matrix of numbers (of type
							 | 
						|||
| 
								 | 
							
								'double'), using FFTW's default block sizes.  More generally, one can
							 | 
						|||
| 
								 | 
							
								perform transposes of _tuples_ of numbers, with user-specified block
							 | 
						|||
| 
								 | 
							
								sizes for the input and output:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_transpose
							 | 
						|||
| 
								 | 
							
								                     (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                      ptrdiff_t block0, ptrdiff_t block1,
							 | 
						|||
| 
								 | 
							
								                      double *in, double *out, MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In this case, one is transposing an 'n0' by 'n1' matrix of
							 | 
						|||
| 
								 | 
							
								'howmany'-tuples (e.g.  'howmany = 2' for complex numbers).  The input
							 | 
						|||
| 
								 | 
							
								is distributed along the 'n0' dimension with block size 'block0', and
							 | 
						|||
| 
								 | 
							
								the 'n1' by 'n0' output is distributed along the 'n1' dimension with
							 | 
						|||
| 
								 | 
							
								block size 'block1'.  If 'FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a
							 | 
						|||
| 
								 | 
							
								block size then FFTW uses its default block size.  To get the local size
							 | 
						|||
| 
								 | 
							
								of the data on each process, you should then call
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_local_size_many_transposed'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: An improved replacement for MPI_Alltoall,  Prev: Advanced distributed-transpose interface,  Up: FFTW MPI Transposes
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.7.3 An improved replacement for MPI_Alltoall
							 | 
						|||
| 
								 | 
							
								----------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								We close this section by noting that FFTW's MPI transpose routines can
							 | 
						|||
| 
								 | 
							
								be thought of as a generalization for the 'MPI_Alltoall' function
							 | 
						|||
| 
								 | 
							
								(albeit only for floating-point types), and in some circumstances can
							 | 
						|||
| 
								 | 
							
								function as an improved replacement.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'MPI_Alltoall' is defined by the MPI standard as:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype,
							 | 
						|||
| 
								 | 
							
								                      void *recvbuf, int recvcnt, MPI_Datatype recvtype,
							 | 
						|||
| 
								 | 
							
								                      MPI_Comm comm);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, for 'double*' arrays 'in' and 'out', consider the
							 | 
						|||
| 
								 | 
							
								call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This is completely equivalent to:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     MPI_Comm_size(comm, &P);
							 | 
						|||
| 
								 | 
							
								     plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								     fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								     fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   That is, computing a P x P transpose on 'P' processes, with a block
							 | 
						|||
| 
								 | 
							
								size of 1, is just a standard all-to-all communication.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   However, using the FFTW routine instead of 'MPI_Alltoall' may have
							 | 
						|||
| 
								 | 
							
								certain advantages.  First of all, FFTW's routine can operate in-place
							 | 
						|||
| 
								 | 
							
								('in == out') whereas 'MPI_Alltoall' can only operate out-of-place.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, even for out-of-place plans, FFTW's routine may be faster,
							 | 
						|||
| 
								 | 
							
								especially if you need to perform the all-to-all communication many
							 | 
						|||
| 
								 | 
							
								times and can afford to use 'FFTW_MEASURE' or 'FFTW_PATIENT'.  It should
							 | 
						|||
| 
								 | 
							
								certainly be no slower, not including the time to create the plan, since
							 | 
						|||
| 
								 | 
							
								one of the possible algorithms that FFTW uses for an out-of-place
							 | 
						|||
| 
								 | 
							
								transpose _is_ simply to call 'MPI_Alltoall'.  However, FFTW also
							 | 
						|||
| 
								 | 
							
								considers several other possible algorithms that, depending on your MPI
							 | 
						|||
| 
								 | 
							
								implementation and your hardware, may be faster.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Wisdom,  Next: Avoiding MPI Deadlocks,  Prev: FFTW MPI Transposes,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.8 FFTW MPI Wisdom
							 | 
						|||
| 
								 | 
							
								===================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be
							 | 
						|||
| 
								 | 
							
								used to save MPI plans as well as to save uniprocessor plans.  However,
							 | 
						|||
| 
								 | 
							
								for MPI there are several unavoidable complications.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, the MPI standard does not guarantee that every process can
							 | 
						|||
| 
								 | 
							
								perform file I/O (at least, not using C stdio routines)--in general, we
							 | 
						|||
| 
								 | 
							
								may only assume that process 0 is capable of I/O.(1) So, if we want to
							 | 
						|||
| 
								 | 
							
								export the wisdom from a single process to a file, we must first export
							 | 
						|||
| 
								 | 
							
								the wisdom to a string, then send it to process 0, then write it to a
							 | 
						|||
| 
								 | 
							
								file.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, in principle we may want to have separate wisdom for every
							 | 
						|||
| 
								 | 
							
								process, since in general the processes may run on different hardware
							 | 
						|||
| 
								 | 
							
								even for a single MPI program.  However, in practice FFTW's MPI code is
							 | 
						|||
| 
								 | 
							
								designed for the case of homogeneous hardware (*note Load balancing::),
							 | 
						|||
| 
								 | 
							
								and in this case it is convenient to use the same wisdom for every
							 | 
						|||
| 
								 | 
							
								process.  Thus, we need a mechanism to synchronize the wisdom.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To address both of these problems, FFTW provides the following two
							 | 
						|||
| 
								 | 
							
								functions:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_gather_wisdom(MPI_Comm comm);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given a communicator 'comm', 'fftw_mpi_broadcast_wisdom' will
							 | 
						|||
| 
								 | 
							
								broadcast the wisdom from process 0 to all other processes.  Conversely,
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_gather_wisdom' will collect wisdom from all processes onto
							 | 
						|||
| 
								 | 
							
								process 0.  (If the plans created for the same problem by different
							 | 
						|||
| 
								 | 
							
								processes are not the same, 'fftw_mpi_gather_wisdom' will arbitrarily
							 | 
						|||
| 
								 | 
							
								choose one of the plans.)  Both of these functions may result in
							 | 
						|||
| 
								 | 
							
								suboptimal plans for different processes if the processes are running on
							 | 
						|||
| 
								 | 
							
								non-identical hardware.  Both of these functions are _collective_ calls,
							 | 
						|||
| 
								 | 
							
								which means that they must be executed by all processes in the
							 | 
						|||
| 
								 | 
							
								communicator.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   So, for example, a typical code snippet to import wisdom from a file
							 | 
						|||
| 
								 | 
							
								and use it on all processes would be:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         int rank;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_init();
							 | 
						|||
| 
								 | 
							
								         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
							 | 
						|||
| 
								 | 
							
								         if (rank == 0) fftw_import_wisdom_from_filename("mywisdom");
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD);
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (Note that we must call 'fftw_mpi_init' before importing any wisdom
							 | 
						|||
| 
								 | 
							
								that might contain MPI plans.)  Similarly, a typical code snippet to
							 | 
						|||
| 
								 | 
							
								export wisdom from all processes to a file is:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         int rank;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_gather_wisdom(MPI_COMM_WORLD);
							 | 
						|||
| 
								 | 
							
								         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
							 | 
						|||
| 
								 | 
							
								         if (rank == 0) fftw_export_wisdom_to_filename("mywisdom");
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) In fact, even this assumption is not technically guaranteed by
							 | 
						|||
| 
								 | 
							
								the standard, although it seems to be universal in actual MPI
							 | 
						|||
| 
								 | 
							
								implementations and is widely assumed by MPI-using software.
							 | 
						|||
| 
								 | 
							
								Technically, you need to query the 'MPI_IO' attribute of
							 | 
						|||
| 
								 | 
							
								'MPI_COMM_WORLD' with 'MPI_Attr_get'.  If this attribute is
							 | 
						|||
| 
								 | 
							
								'MPI_PROC_NULL', no I/O is possible.  If it is 'MPI_ANY_SOURCE', any
							 | 
						|||
| 
								 | 
							
								process can perform I/O. Otherwise, it is the rank of a process that can
							 | 
						|||
| 
								 | 
							
								perform I/O ...  but since it is not guaranteed to yield the _same_ rank
							 | 
						|||
| 
								 | 
							
								on all processes, you have to do an 'MPI_Allreduce' of some kind if you
							 | 
						|||
| 
								 | 
							
								want all processes to agree about which is going to do I/O. And even
							 | 
						|||
| 
								 | 
							
								then, the standard only guarantees that this process can perform output,
							 | 
						|||
| 
								 | 
							
								but not input.  See e.g.  'Parallel Programming with MPI' by P. S.
							 | 
						|||
| 
								 | 
							
								Pacheco, section 8.1.3.  Needless to say, in our experience virtually no
							 | 
						|||
| 
								 | 
							
								MPI programmers worry about this.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Avoiding MPI Deadlocks,  Next: FFTW MPI Performance Tips,  Prev: FFTW MPI Wisdom,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.9 Avoiding MPI Deadlocks
							 | 
						|||
| 
								 | 
							
								==========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								An MPI program can _deadlock_ if one process is waiting for a message
							 | 
						|||
| 
								 | 
							
								from another process that never gets sent.  To avoid deadlocks when
							 | 
						|||
| 
								 | 
							
								using FFTW's MPI routines, it is important to know which functions are
							 | 
						|||
| 
								 | 
							
								_collective_: that is, which functions must _always_ be called in the
							 | 
						|||
| 
								 | 
							
								_same order_ from _every_ process in a given communicator.  (For
							 | 
						|||
| 
								 | 
							
								example, 'MPI_Barrier' is the canonical example of a collective function
							 | 
						|||
| 
								 | 
							
								in the MPI standard.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The functions in FFTW that are _always_ collective are: every
							 | 
						|||
| 
								 | 
							
								function beginning with 'fftw_mpi_plan', as well as
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_broadcast_wisdom' and 'fftw_mpi_gather_wisdom'.  Also, the
							 | 
						|||
| 
								 | 
							
								following functions from the ordinary FFTW interface are collective when
							 | 
						|||
| 
								 | 
							
								they are applied to a plan created by an 'fftw_mpi_plan' function:
							 | 
						|||
| 
								 | 
							
								'fftw_execute', 'fftw_destroy_plan', and 'fftw_flops'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Performance Tips,  Next: Combining MPI and Threads,  Prev: Avoiding MPI Deadlocks,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.10 FFTW MPI Performance Tips
							 | 
						|||
| 
								 | 
							
								==============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this section, we collect a few tips on getting the best performance
							 | 
						|||
| 
								 | 
							
								out of FFTW's MPI transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, because of the 1d block distribution, FFTW's parallelization
							 | 
						|||
| 
								 | 
							
								is currently limited by the size of the first dimension.
							 | 
						|||
| 
								 | 
							
								(Multidimensional block distributions may be supported by a future
							 | 
						|||
| 
								 | 
							
								version.)  More generally, you should ideally arrange the dimensions so
							 | 
						|||
| 
								 | 
							
								that FFTW can divide them equally among the processes.  *Note Load
							 | 
						|||
| 
								 | 
							
								balancing::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, if it is not too inconvenient, you should consider working
							 | 
						|||
| 
								 | 
							
								with transposed output for multidimensional plans, as this saves a
							 | 
						|||
| 
								 | 
							
								considerable amount of communications.  *Note Transposed
							 | 
						|||
| 
								 | 
							
								distributions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Third, the fastest choices are generally either an in-place transform
							 | 
						|||
| 
								 | 
							
								or an out-of-place transform with the 'FFTW_DESTROY_INPUT' flag (which
							 | 
						|||
| 
								 | 
							
								allows the input array to be used as scratch space).  In-place is
							 | 
						|||
| 
								 | 
							
								especially beneficial if the amount of data per process is large.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Fourth, if you have multiple arrays to transform at once, rather than
							 | 
						|||
| 
								 | 
							
								calling FFTW's MPI transforms several times it usually seems to be
							 | 
						|||
| 
								 | 
							
								faster to interleave the data and use the advanced interface.  (This
							 | 
						|||
| 
								 | 
							
								groups the communications together instead of requiring separate
							 | 
						|||
| 
								 | 
							
								messages for each transform.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Combining MPI and Threads,  Next: FFTW MPI Reference,  Prev: FFTW MPI Performance Tips,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.11 Combining MPI and Threads
							 | 
						|||
| 
								 | 
							
								==============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In certain cases, it may be advantageous to combine MPI
							 | 
						|||
| 
								 | 
							
								(distributed-memory) and threads (shared-memory) parallelization.  FFTW
							 | 
						|||
| 
								 | 
							
								supports this, with certain caveats.  For example, if you have a cluster
							 | 
						|||
| 
								 | 
							
								of 4-processor shared-memory nodes, you may want to use threads within
							 | 
						|||
| 
								 | 
							
								the nodes and MPI between the nodes, instead of MPI for all
							 | 
						|||
| 
								 | 
							
								parallelization.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, it is possible to seamlessly combine the MPI FFTW
							 | 
						|||
| 
								 | 
							
								routines with the multi-threaded FFTW routines (*note Multi-threaded
							 | 
						|||
| 
								 | 
							
								FFTW::).  However, some care must be taken in the initialization code,
							 | 
						|||
| 
								 | 
							
								which should look something like this:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int threads_ok;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     int main(int argc, char **argv)
							 | 
						|||
| 
								 | 
							
								     {
							 | 
						|||
| 
								 | 
							
								         int provided;
							 | 
						|||
| 
								 | 
							
								         MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
							 | 
						|||
| 
								 | 
							
								         threads_ok = provided >= MPI_THREAD_FUNNELED;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         if (threads_ok) threads_ok = fftw_init_threads();
							 | 
						|||
| 
								 | 
							
								         fftw_mpi_init();
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         ...
							 | 
						|||
| 
								 | 
							
								         if (threads_ok) fftw_plan_with_nthreads(...);
							 | 
						|||
| 
								 | 
							
								         ...
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								         MPI_Finalize();
							 | 
						|||
| 
								 | 
							
								     }
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   First, note that instead of calling 'MPI_Init', you should call
							 | 
						|||
| 
								 | 
							
								'MPI_Init_threads', which is the initialization routine defined by the
							 | 
						|||
| 
								 | 
							
								MPI-2 standard to indicate to MPI that your program will be
							 | 
						|||
| 
								 | 
							
								multithreaded.  We pass 'MPI_THREAD_FUNNELED', which indicates that we
							 | 
						|||
| 
								 | 
							
								will only call MPI routines from the main thread.  (FFTW will launch
							 | 
						|||
| 
								 | 
							
								additional threads internally, but the extra threads will not call MPI
							 | 
						|||
| 
								 | 
							
								code.)  (You may also pass 'MPI_THREAD_SERIALIZED' or
							 | 
						|||
| 
								 | 
							
								'MPI_THREAD_MULTIPLE', which requests additional multithreading support
							 | 
						|||
| 
								 | 
							
								from the MPI implementation, but this is not required by FFTW.) The
							 | 
						|||
| 
								 | 
							
								'provided' parameter returns what level of threads support is actually
							 | 
						|||
| 
								 | 
							
								supported by your MPI implementation; this _must_ be at least
							 | 
						|||
| 
								 | 
							
								'MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so
							 | 
						|||
| 
								 | 
							
								we define a global variable 'threads_ok' to record this.  You should
							 | 
						|||
| 
								 | 
							
								only call 'fftw_init_threads' or 'fftw_plan_with_nthreads' if
							 | 
						|||
| 
								 | 
							
								'threads_ok' is true.  For more information on thread safety in MPI, see
							 | 
						|||
| 
								 | 
							
								the MPI and Threads
							 | 
						|||
| 
								 | 
							
								(http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the
							 | 
						|||
| 
								 | 
							
								MPI-2 standard.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Second, we must call 'fftw_init_threads' _before_ 'fftw_mpi_init'.
							 | 
						|||
| 
								 | 
							
								This is critical for technical reasons having to do with how FFTW
							 | 
						|||
| 
								 | 
							
								initializes its list of algorithms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Then, if you call 'fftw_plan_with_nthreads(N)', _every_ MPI process
							 | 
						|||
| 
								 | 
							
								will launch (up to) 'N' threads to parallelize its transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, in the hypothetical cluster of 4-processor nodes, you
							 | 
						|||
| 
								 | 
							
								might wish to launch only a single MPI process per node, and then call
							 | 
						|||
| 
								 | 
							
								'fftw_plan_with_nthreads(4)' on each process to use all processors in
							 | 
						|||
| 
								 | 
							
								the nodes.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This may or may not be faster than simply using as many MPI processes
							 | 
						|||
| 
								 | 
							
								as you have processors, however.  On the one hand, using threads within
							 | 
						|||
| 
								 | 
							
								a node eliminates the need for explicit message passing within the node.
							 | 
						|||
| 
								 | 
							
								On the other hand, FFTW's transpose routines are not multi-threaded, and
							 | 
						|||
| 
								 | 
							
								this means that the communications that do take place will not benefit
							 | 
						|||
| 
								 | 
							
								from parallelization within the node.  Moreover, many MPI
							 | 
						|||
| 
								 | 
							
								implementations already have optimizations to exploit shared memory when
							 | 
						|||
| 
								 | 
							
								it is available, so adding the multithreaded FFTW on top of this may be
							 | 
						|||
| 
								 | 
							
								superfluous.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Reference,  Next: FFTW MPI Fortran Interface,  Prev: Combining MPI and Threads,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12 FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								=======================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This chapter provides a complete reference to all FFTW MPI functions,
							 | 
						|||
| 
								 | 
							
								datatypes, and constants.  See also *note FFTW Reference:: for
							 | 
						|||
| 
								 | 
							
								information on functions and types in common with the serial interface.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* MPI Files and Data Types::
							 | 
						|||
| 
								 | 
							
								* MPI Initialization::
							 | 
						|||
| 
								 | 
							
								* Using MPI Plans::
							 | 
						|||
| 
								 | 
							
								* MPI Data Distribution Functions::
							 | 
						|||
| 
								 | 
							
								* MPI Plan Creation::
							 | 
						|||
| 
								 | 
							
								* MPI Wisdom Communication::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Files and Data Types,  Next: MPI Initialization,  Prev: FFTW MPI Reference,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.1 MPI Files and Data Types
							 | 
						|||
| 
								 | 
							
								-------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								All programs using FFTW's MPI support should include its header file:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #include <fftw3-mpi.h>
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that this header file includes the serial-FFTW 'fftw3.h' header
							 | 
						|||
| 
								 | 
							
								file, and also the 'mpi.h' header file for MPI, so you need not include
							 | 
						|||
| 
								 | 
							
								those files separately.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You must also link to _both_ the FFTW MPI library and to the serial
							 | 
						|||
| 
								 | 
							
								FFTW library.  On Unix, this means adding '-lfftw3_mpi -lfftw3 -lm' at
							 | 
						|||
| 
								 | 
							
								the end of the link command.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Different precisions are handled as in the serial interface: *Note
							 | 
						|||
| 
								 | 
							
								Precision::.  That is, 'fftw_' functions become 'fftwf_' (in single
							 | 
						|||
| 
								 | 
							
								precision) etcetera, and the libraries become '-lfftw3f_mpi -lfftw3f
							 | 
						|||
| 
								 | 
							
								-lm' etcetera on Unix.  Long-double precision is supported in MPI, but
							 | 
						|||
| 
								 | 
							
								quad precision ('fftwq_') is not due to the lack of MPI support for this
							 | 
						|||
| 
								 | 
							
								type.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Initialization,  Next: Using MPI Plans,  Prev: MPI Files and Data Types,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.2 MPI Initialization
							 | 
						|||
| 
								 | 
							
								-------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Before calling any other FFTW MPI ('fftw_mpi_') function, and before
							 | 
						|||
| 
								 | 
							
								importing any wisdom for MPI problems, you must call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_init(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If FFTW threads support is used, however, 'fftw_mpi_init' should be
							 | 
						|||
| 
								 | 
							
								called _after_ 'fftw_init_threads' (*note Combining MPI and Threads::).
							 | 
						|||
| 
								 | 
							
								Calling 'fftw_mpi_init' additional times (before 'fftw_mpi_cleanup') has
							 | 
						|||
| 
								 | 
							
								no effect.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you want to deallocate all persistent data and reset FFTW to the
							 | 
						|||
| 
								 | 
							
								pristine state it was in when you started your program, you can call:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_cleanup(void);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (This calls 'fftw_cleanup', so you need not call the serial cleanup
							 | 
						|||
| 
								 | 
							
								routine too, although it is safe to do so.)  After calling
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_cleanup', all existing plans become undefined, and you should
							 | 
						|||
| 
								 | 
							
								not attempt to execute or destroy them.  You must call 'fftw_mpi_init'
							 | 
						|||
| 
								 | 
							
								again after 'fftw_mpi_cleanup' if you want to resume using the MPI FFTW
							 | 
						|||
| 
								 | 
							
								routines.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Using MPI Plans,  Next: MPI Data Distribution Functions,  Prev: MPI Initialization,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.3 Using MPI Plans
							 | 
						|||
| 
								 | 
							
								----------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Once an MPI plan is created, you can execute and destroy it using
							 | 
						|||
| 
								 | 
							
								'fftw_execute', 'fftw_destroy_plan', and the other functions in the
							 | 
						|||
| 
								 | 
							
								serial interface that operate on generic plans (*note Using Plans::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'fftw_execute' and 'fftw_destroy_plan' functions, applied to MPI
							 | 
						|||
| 
								 | 
							
								plans, are _collective_ calls: they must be called for all processes in
							 | 
						|||
| 
								 | 
							
								the communicator that was used to create the plan.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You must _not_ use the serial new-array plan-execution functions
							 | 
						|||
| 
								 | 
							
								'fftw_execute_dft' and so on (*note New-array Execute Functions::) with
							 | 
						|||
| 
								 | 
							
								MPI plans.  Such functions are specialized to the problem type, and
							 | 
						|||
| 
								 | 
							
								there are specific new-array execute functions for MPI plans:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out);
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out);
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out);
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These functions have the same restrictions as those of the serial
							 | 
						|||
| 
								 | 
							
								new-array execute functions.  They are _always_ safe to apply to the
							 | 
						|||
| 
								 | 
							
								_same_ 'in' and 'out' arrays that were used to create the plan.  They
							 | 
						|||
| 
								 | 
							
								can only be applied to new arrarys if those arrays have the same types,
							 | 
						|||
| 
								 | 
							
								dimensions, in-placeness, and alignment as the original arrays, where
							 | 
						|||
| 
								 | 
							
								the best way to ensure the same alignment is to use FFTW's 'fftw_malloc'
							 | 
						|||
| 
								 | 
							
								and related allocation functions for all arrays (*note Memory
							 | 
						|||
| 
								 | 
							
								Allocation::).  Note that distributed transposes (*note FFTW MPI
							 | 
						|||
| 
								 | 
							
								Transposes::) use 'fftw_mpi_execute_r2r', since they count as rank-zero
							 | 
						|||
| 
								 | 
							
								r2r plans from FFTW's perspective.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Data Distribution Functions,  Next: MPI Plan Creation,  Prev: Using MPI Plans,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.4 MPI Data Distribution Functions
							 | 
						|||
| 
								 | 
							
								--------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								As described above (*note MPI Data Distribution::), in order to allocate
							 | 
						|||
| 
								 | 
							
								your arrays, _before_ creating a plan, you must first call one of the
							 | 
						|||
| 
								 | 
							
								following routines to determine the required allocation size and the
							 | 
						|||
| 
								 | 
							
								portion of the array locally stored on a given process.  The 'MPI_Comm'
							 | 
						|||
| 
								 | 
							
								communicator passed here must be equivalent to the communicator used
							 | 
						|||
| 
								 | 
							
								below for plan creation.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The basic interface for multidimensional transforms consists of the
							 | 
						|||
| 
								 | 
							
								functions:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                      MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                                 MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                              ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                                              ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These functions return the number of elements to allocate (complex
							 | 
						|||
| 
								 | 
							
								numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
							 | 
						|||
| 
								 | 
							
								'local_n0' and 'local_0_start' return the portion ('local_0_start' to
							 | 
						|||
| 
								 | 
							
								'local_0_start + local_n0 - 1') of the first dimension of an n[0] x n[1]
							 | 
						|||
| 
								 | 
							
								x n[2] x ...  x n[d-1] array that is stored on the local process.  *Note
							 | 
						|||
| 
								 | 
							
								Basic and advanced distribution interfaces::.  For
							 | 
						|||
| 
								 | 
							
								'FFTW_MPI_TRANSPOSED_OUT' plans, the '_transposed' variants are useful
							 | 
						|||
| 
								 | 
							
								in order to also return the local portion of the first dimension in the
							 | 
						|||
| 
								 | 
							
								n[1] x n[0] x n[2] x ...  x n[d-1] transposed output.  *Note Transposed
							 | 
						|||
| 
								 | 
							
								distributions::.  The advanced interface for multidimensional transforms
							 | 
						|||
| 
								 | 
							
								is:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t block0, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                        ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                                                   ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
							 | 
						|||
| 
								 | 
							
								                                                   ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These differ from the basic interface in only two ways.  First, they
							 | 
						|||
| 
								 | 
							
								allow you to specify block sizes 'block0' and 'block1' (the latter for
							 | 
						|||
| 
								 | 
							
								the transposed output); you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use
							 | 
						|||
| 
								 | 
							
								FFTW's default block size as in the basic interface.  Second, you can
							 | 
						|||
| 
								 | 
							
								pass a 'howmany' parameter, corresponding to the advanced planning
							 | 
						|||
| 
								 | 
							
								interface below: this is for transforms of contiguous 'howmany'-tuples
							 | 
						|||
| 
								 | 
							
								of numbers ('howmany = 1' in the basic interface).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The corresponding basic and advanced routines for one-dimensional
							 | 
						|||
| 
								 | 
							
								transforms (currently only complex DFTs) are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_1d(
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
							 | 
						|||
| 
								 | 
							
								     ptrdiff_t fftw_mpi_local_size_many_1d(
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t n0, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                  MPI_Comm comm, int sign, unsigned flags,
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
							 | 
						|||
| 
								 | 
							
								                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As above, the return value is the number of elements to allocate
							 | 
						|||
| 
								 | 
							
								(complex numbers, for complex DFTs).  The 'local_ni' and 'local_i_start'
							 | 
						|||
| 
								 | 
							
								arguments return the portion ('local_i_start' to 'local_i_start +
							 | 
						|||
| 
								 | 
							
								local_ni - 1') of the 1d array that is stored on this process for the
							 | 
						|||
| 
								 | 
							
								transform _input_, and 'local_no' and 'local_o_start' are the
							 | 
						|||
| 
								 | 
							
								corresponding quantities for the input.  The 'sign' ('FFTW_FORWARD' or
							 | 
						|||
| 
								 | 
							
								'FFTW_BACKWARD') and 'flags' must match the arguments passed when
							 | 
						|||
| 
								 | 
							
								creating a plan.  Although the inputs and outputs have different data
							 | 
						|||
| 
								 | 
							
								distributions in general, it is guaranteed that the _output_ data
							 | 
						|||
| 
								 | 
							
								distribution of an 'FFTW_FORWARD' plan will match the _input_ data
							 | 
						|||
| 
								 | 
							
								distribution of an 'FFTW_BACKWARD' plan and vice versa; similarly for
							 | 
						|||
| 
								 | 
							
								the 'FFTW_MPI_SCRAMBLED_OUT' and 'FFTW_MPI_SCRAMBLED_IN' flags.  *Note
							 | 
						|||
| 
								 | 
							
								One-dimensional distributions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Plan Creation,  Next: MPI Wisdom Communication,  Prev: MPI Data Distribution Functions,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.5 MPI Plan Creation
							 | 
						|||
| 
								 | 
							
								------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Complex-data MPI DFTs
							 | 
						|||
| 
								 | 
							
								.....................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Plans for complex-data DFTs (*note 2d MPI example::) are created by:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    MPI_Comm comm, int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                    fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    MPI_Comm comm, int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                    fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                    MPI_Comm comm, int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                 fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                 MPI_Comm comm, int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                      ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
							 | 
						|||
| 
								 | 
							
								                                      fftw_complex *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                      MPI_Comm comm, int sign, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These are similar to their serial counterparts (*note Complex DFTs::)
							 | 
						|||
| 
								 | 
							
								in specifying the dimensions, sign, and flags of the transform.  The
							 | 
						|||
| 
								 | 
							
								'comm' argument gives an MPI communicator that specifies the set of
							 | 
						|||
| 
								 | 
							
								processes to participate in the transform; plan creation is a collective
							 | 
						|||
| 
								 | 
							
								function that must be called for all processes in the communicator.  The
							 | 
						|||
| 
								 | 
							
								'in' and 'out' pointers refer only to a portion of the overall transform
							 | 
						|||
| 
								 | 
							
								data (*note MPI Data Distribution::) as specified by the 'local_size'
							 | 
						|||
| 
								 | 
							
								functions in the previous section.  Unless 'flags' contains
							 | 
						|||
| 
								 | 
							
								'FFTW_ESTIMATE', these arrays are overwritten during plan creation as
							 | 
						|||
| 
								 | 
							
								for the serial interface.  For multi-dimensional transforms, any
							 | 
						|||
| 
								 | 
							
								dimensions '> 1' are supported; for one-dimensional transforms, only
							 | 
						|||
| 
								 | 
							
								composite (non-prime) 'n0' are currently supported (unlike the serial
							 | 
						|||
| 
								 | 
							
								FFTW). Requesting an unsupported transform size will yield a 'NULL'
							 | 
						|||
| 
								 | 
							
								plan.  (As in the serial interface, highly composite sizes generally
							 | 
						|||
| 
								 | 
							
								yield the best performance.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The advanced-interface 'fftw_mpi_plan_many_dft' additionally allows
							 | 
						|||
| 
								 | 
							
								you to specify the block sizes for the first dimension ('block') of the
							 | 
						|||
| 
								 | 
							
								n[0] x n[1] x n[2] x ...  x n[d-1] input data and the first dimension
							 | 
						|||
| 
								 | 
							
								('tblock') of the n[1] x n[0] x n[2] x ...  x n[d-1] transposed data (at
							 | 
						|||
| 
								 | 
							
								intermediate steps of the transform, and for the output if
							 | 
						|||
| 
								 | 
							
								'FFTW_TRANSPOSED_OUT' is specified in 'flags').  These must be the same
							 | 
						|||
| 
								 | 
							
								block sizes as were passed to the corresponding 'local_size' function;
							 | 
						|||
| 
								 | 
							
								you can pass 'FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
							 | 
						|||
| 
								 | 
							
								as in the basic interface.  Also, the 'howmany' parameter specifies that
							 | 
						|||
| 
								 | 
							
								the transform is of contiguous 'howmany'-tuples rather than individual
							 | 
						|||
| 
								 | 
							
								complex numbers; this corresponds to the same parameter in the serial
							 | 
						|||
| 
								 | 
							
								advanced interface (*note Advanced Complex DFTs::) with 'stride =
							 | 
						|||
| 
								 | 
							
								howmany' and 'dist = 1'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								MPI flags
							 | 
						|||
| 
								 | 
							
								.........
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The 'flags' can be any of those for the serial FFTW (*note Planner
							 | 
						|||
| 
								 | 
							
								Flags::), and in addition may include one or more of the following
							 | 
						|||
| 
								 | 
							
								MPI-specific flags, which improve performance at the cost of changing
							 | 
						|||
| 
								 | 
							
								the output or input data formats.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_MPI_SCRAMBLED_OUT', 'FFTW_MPI_SCRAMBLED_IN': valid for 1d
							 | 
						|||
| 
								 | 
							
								     transforms only, these flags indicate that the output/input of the
							 | 
						|||
| 
								 | 
							
								     transform are in an undocumented "scrambled" order.  A forward
							 | 
						|||
| 
								 | 
							
								     'FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
							 | 
						|||
| 
								 | 
							
								     'FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).  *Note
							 | 
						|||
| 
								 | 
							
								     One-dimensional distributions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'FFTW_MPI_TRANSPOSED_OUT', 'FFTW_MPI_TRANSPOSED_IN': valid for
							 | 
						|||
| 
								 | 
							
								     multidimensional ('rnk > 1') transforms only, these flags specify
							 | 
						|||
| 
								 | 
							
								     that the output or input of an n[0] x n[1] x n[2] x ...  x n[d-1]
							 | 
						|||
| 
								 | 
							
								     transform is transposed to n[1] x n[0] x n[2] x ...  x n[d-1] .
							 | 
						|||
| 
								 | 
							
								     *Note Transposed distributions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Real-data MPI DFTs
							 | 
						|||
| 
								 | 
							
								..................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI
							 | 
						|||
| 
								 | 
							
								DFTs of Real Data::) are created by:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                        double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                        double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                        double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                     double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                                     MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                        fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                        fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                        fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                        MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                     fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                     MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Similar to the serial interface (*note Real-data DFTs::), these
							 | 
						|||
| 
								 | 
							
								transform logically n[0] x n[1] x n[2] x ...  x n[d-1] real data to/from
							 | 
						|||
| 
								 | 
							
								n[0] x n[1] x n[2] x ...  x (n[d-1]/2 + 1) complex data, representing
							 | 
						|||
| 
								 | 
							
								the non-redundant half of the conjugate-symmetry output of a real-input
							 | 
						|||
| 
								 | 
							
								DFT (*note Multi-dimensional Transforms::).  However, the real array
							 | 
						|||
| 
								 | 
							
								must be stored within a padded n[0] x n[1] x n[2] x ...  x [2 (n[d-1]/2
							 | 
						|||
| 
								 | 
							
								+ 1)] array (much like the in-place serial r2c transforms, but here for
							 | 
						|||
| 
								 | 
							
								out-of-place transforms as well).  Currently, only multi-dimensional
							 | 
						|||
| 
								 | 
							
								('rnk > 1') r2c/c2r transforms are supported (requesting a plan for 'rnk
							 | 
						|||
| 
								 | 
							
								= 1' will yield 'NULL').  As explained above (*note Multi-dimensional
							 | 
						|||
| 
								 | 
							
								MPI DFTs of Real Data::), the data distribution of both the real and
							 | 
						|||
| 
								 | 
							
								complex arrays is given by the 'local_size' function called for the
							 | 
						|||
| 
								 | 
							
								dimensions of the _complex_ array.  Similar to the other planning
							 | 
						|||
| 
								 | 
							
								functions, the input and output arrays are overwritten when the plan is
							 | 
						|||
| 
								 | 
							
								created except in 'FFTW_ESTIMATE' mode.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As for the complex DFTs above, there is an advance interface that
							 | 
						|||
| 
								 | 
							
								allows you to manually specify block sizes and to transform contiguous
							 | 
						|||
| 
								 | 
							
								'howmany'-tuples of real/complex numbers:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_dft_r2c
							 | 
						|||
| 
								 | 
							
								                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                    ptrdiff_t iblock, ptrdiff_t oblock,
							 | 
						|||
| 
								 | 
							
								                    double *in, fftw_complex *out,
							 | 
						|||
| 
								 | 
							
								                    MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_dft_c2r
							 | 
						|||
| 
								 | 
							
								                   (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                    ptrdiff_t iblock, ptrdiff_t oblock,
							 | 
						|||
| 
								 | 
							
								                    fftw_complex *in, double *out,
							 | 
						|||
| 
								 | 
							
								                    MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								MPI r2r transforms
							 | 
						|||
| 
								 | 
							
								..................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								There are corresponding plan-creation routines for r2r transforms (*note
							 | 
						|||
| 
								 | 
							
								More DFTs of Real Data::), currently supporting multidimensional ('rnk >
							 | 
						|||
| 
								 | 
							
								1') transforms only ('rnk = 1' will yield a 'NULL' plan):
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                    double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
							 | 
						|||
| 
								 | 
							
								                                    double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                    MPI_Comm comm,
							 | 
						|||
| 
								 | 
							
								                                    fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
							 | 
						|||
| 
								 | 
							
								                                    unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                 double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                 MPI_Comm comm, const fftw_r2r_kind *kind,
							 | 
						|||
| 
								 | 
							
								                                 unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
							 | 
						|||
| 
								 | 
							
								                                      ptrdiff_t iblock, ptrdiff_t oblock,
							 | 
						|||
| 
								 | 
							
								                                      double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                      MPI_Comm comm, const fftw_r2r_kind *kind,
							 | 
						|||
| 
								 | 
							
								                                      unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The parameters are much the same as for the complex DFTs above,
							 | 
						|||
| 
								 | 
							
								except that the arrays are of real numbers (and hence the outputs of the
							 | 
						|||
| 
								 | 
							
								'local_size' data-distribution functions should be interpreted as counts
							 | 
						|||
| 
								 | 
							
								of real rather than complex numbers).  Also, the 'kind' parameters
							 | 
						|||
| 
								 | 
							
								specify the r2r kinds along each dimension as for the serial interface
							 | 
						|||
| 
								 | 
							
								(*note Real-to-Real Transform Kinds::).  *Note Other Multi-dimensional
							 | 
						|||
| 
								 | 
							
								Real-data MPI Transforms::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								MPI transposition
							 | 
						|||
| 
								 | 
							
								.................
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW also provides routines to plan a transpose of a distributed 'n0' by
							 | 
						|||
| 
								 | 
							
								'n1' array of real numbers, or an array of 'howmany'-tuples of real
							 | 
						|||
| 
								 | 
							
								numbers with specified block sizes (*note FFTW MPI Transposes::):
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
							 | 
						|||
| 
								 | 
							
								                                       double *in, double *out,
							 | 
						|||
| 
								 | 
							
								                                       MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								     fftw_plan fftw_mpi_plan_many_transpose
							 | 
						|||
| 
								 | 
							
								                     (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
							 | 
						|||
| 
								 | 
							
								                      ptrdiff_t block0, ptrdiff_t block1,
							 | 
						|||
| 
								 | 
							
								                      double *in, double *out, MPI_Comm comm, unsigned flags);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   These plans are used with the 'fftw_mpi_execute_r2r' new-array
							 | 
						|||
| 
								 | 
							
								execute function (*note Using MPI Plans::), since they count as (rank
							 | 
						|||
| 
								 | 
							
								zero) r2r plans from FFTW's perspective.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: MPI Wisdom Communication,  Prev: MPI Plan Creation,  Up: FFTW MPI Reference
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.12.6 MPI Wisdom Communication
							 | 
						|||
| 
								 | 
							
								-------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								To facilitate synchronizing wisdom among the different MPI processes, we
							 | 
						|||
| 
								 | 
							
								provide two functions:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_gather_wisdom(MPI_Comm comm);
							 | 
						|||
| 
								 | 
							
								     void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'fftw_mpi_gather_wisdom' function gathers all wisdom in the given
							 | 
						|||
| 
								 | 
							
								communicator 'comm' to the process of rank 0 in the communicator: that
							 | 
						|||
| 
								 | 
							
								process obtains the union of all wisdom on all the processes.  As a side
							 | 
						|||
| 
								 | 
							
								effect, some other processes will gain additional wisdom from other
							 | 
						|||
| 
								 | 
							
								processes, but only process 0 will gain the complete union.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom
							 | 
						|||
| 
								 | 
							
								from process 0 in 'comm' to all other processes in the communicator,
							 | 
						|||
| 
								 | 
							
								replacing any wisdom they currently have.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   *Note FFTW MPI Wisdom::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW MPI Fortran Interface,  Prev: FFTW MPI Reference,  Up: Distributed-memory FFTW with MPI
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								6.13 FFTW MPI Fortran Interface
							 | 
						|||
| 
								 | 
							
								===============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The FFTW MPI interface is callable from modern Fortran compilers
							 | 
						|||
| 
								 | 
							
								supporting the Fortran 2003 'iso_c_binding' standard for calling C
							 | 
						|||
| 
								 | 
							
								functions.  As described in *note Calling FFTW from Modern Fortran::,
							 | 
						|||
| 
								 | 
							
								this means that you can directly call FFTW's C interface from Fortran
							 | 
						|||
| 
								 | 
							
								with only minor changes in syntax.  There are, however, a few things
							 | 
						|||
| 
								 | 
							
								specific to the MPI interface to keep in mind:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Instead of including 'fftw3.f03' as in *note Overview of Fortran
							 | 
						|||
| 
								 | 
							
								     interface::, you should 'include 'fftw3-mpi.f03'' (after 'use,
							 | 
						|||
| 
								 | 
							
								     intrinsic :: iso_c_binding' as before).  The 'fftw3-mpi.f03' file
							 | 
						|||
| 
								 | 
							
								     includes 'fftw3.f03', so you should _not_ 'include' them both
							 | 
						|||
| 
								 | 
							
								     yourself.  (You will also want to include the MPI header file,
							 | 
						|||
| 
								 | 
							
								     usually via 'include 'mpif.h'' or similar, although though this is
							 | 
						|||
| 
								 | 
							
								     not needed by 'fftw3-mpi.f03' per se.)  (To use the 'fftwl_' 'long
							 | 
						|||
| 
								 | 
							
								     double' extended-precision routines in supporting compilers, you
							 | 
						|||
| 
								 | 
							
								     should include 'fftw3f-mpi.f03' in _addition_ to 'fftw3-mpi.f03'.
							 | 
						|||
| 
								 | 
							
								     *Note Extended and quadruple precision in Fortran::.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Because of the different storage conventions between C and Fortran,
							 | 
						|||
| 
								 | 
							
								     you reverse the order of your array dimensions when passing them to
							 | 
						|||
| 
								 | 
							
								     FFTW (*note Reversing array dimensions::).  This is merely a
							 | 
						|||
| 
								 | 
							
								     difference in notation and incurs no performance overhead.
							 | 
						|||
| 
								 | 
							
								     However, it means that, whereas in C the _first_ dimension is
							 | 
						|||
| 
								 | 
							
								     distributed, in Fortran the _last_ dimension of your array is
							 | 
						|||
| 
								 | 
							
								     distributed.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * In Fortran, communicators are stored as 'integer' types; there is
							 | 
						|||
| 
								 | 
							
								     no 'MPI_Comm' type, nor is there any way to access a C 'MPI_Comm'.
							 | 
						|||
| 
								 | 
							
								     Fortunately, this is taken care of for you by the FFTW Fortran
							 | 
						|||
| 
								 | 
							
								     interface: whenever the C interface expects an 'MPI_Comm' type, you
							 | 
						|||
| 
								 | 
							
								     should pass the Fortran communicator as an 'integer'.(1)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Because you need to call the 'local_size' function to find out how
							 | 
						|||
| 
								 | 
							
								     much space to allocate, and this may be _larger_ than the local
							 | 
						|||
| 
								 | 
							
								     portion of the array (*note MPI Data Distribution::), you should
							 | 
						|||
| 
								 | 
							
								     _always_ allocate your arrays dynamically using FFTW's allocation
							 | 
						|||
| 
								 | 
							
								     routines as described in *note Allocating aligned memory in
							 | 
						|||
| 
								 | 
							
								     Fortran::.  (Coincidentally, this also provides the best
							 | 
						|||
| 
								 | 
							
								     performance by guaranteeding proper data alignment.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Because all sizes in the MPI FFTW interface are declared as
							 | 
						|||
| 
								 | 
							
								     'ptrdiff_t' in C, you should use 'integer(C_INTPTR_T)' in Fortran
							 | 
						|||
| 
								 | 
							
								     (*note FFTW Fortran type reference::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * In Fortran, because of the language semantics, we generally
							 | 
						|||
| 
								 | 
							
								     recommend using the new-array execute functions for all plans, even
							 | 
						|||
| 
								 | 
							
								     in the common case where you are executing the plan on the same
							 | 
						|||
| 
								 | 
							
								     arrays for which the plan was created (*note Plan execution in
							 | 
						|||
| 
								 | 
							
								     Fortran::).  However, note that in the MPI interface these
							 | 
						|||
| 
								 | 
							
								     functions are changed: 'fftw_execute_dft' becomes
							 | 
						|||
| 
								 | 
							
								     'fftw_mpi_execute_dft', etcetera.  *Note Using MPI Plans::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, here is a Fortran code snippet to perform a distributed
							 | 
						|||
| 
								 | 
							
								L x M complex DFT in-place.  (This assumes you have already initialized
							 | 
						|||
| 
								 | 
							
								MPI with 'MPI_init' and have also performed 'call fftw_mpi_init'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								       include 'fftw3-mpi.f03'
							 | 
						|||
| 
								 | 
							
								       integer(C_INTPTR_T), parameter :: L = ...
							 | 
						|||
| 
								 | 
							
								       integer(C_INTPTR_T), parameter :: M = ...
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: plan, cdata
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
							 | 
						|||
| 
								 | 
							
								       integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     !   get local data size and allocate (note dimension reversal)
							 | 
						|||
| 
								 | 
							
								       alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
							 | 
						|||
| 
								 | 
							
								                                            local_M, local_j_offset)
							 | 
						|||
| 
								 | 
							
								       cdata = fftw_alloc_complex(alloc_local)
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(cdata, data, [L,local_M])
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     !   create MPI plan for in-place forward DFT (note dimension reversal)
							 | 
						|||
| 
								 | 
							
								       plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
							 | 
						|||
| 
								 | 
							
								                                   FFTW_FORWARD, FFTW_MEASURE)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ! initialize data to some function my_function(i,j)
							 | 
						|||
| 
								 | 
							
								       do j = 1, local_M
							 | 
						|||
| 
								 | 
							
								         do i = 1, L
							 | 
						|||
| 
								 | 
							
								           data(i, j) = my_function(i, j + local_j_offset)
							 | 
						|||
| 
								 | 
							
								         end do
							 | 
						|||
| 
								 | 
							
								       end do
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ! compute transform (as many times as desired)
							 | 
						|||
| 
								 | 
							
								       call fftw_mpi_execute_dft(plan, data, data)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       call fftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								       call fftw_free(cdata)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that when we called 'fftw_mpi_local_size_2d' and
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
							 | 
						|||
| 
								 | 
							
								x M Fortran array is viewed by FFTW in C as a M x L array.  This means
							 | 
						|||
| 
								 | 
							
								that the array was distributed over the 'M' dimension, the local portion
							 | 
						|||
| 
								 | 
							
								of which is a L x local_M array in Fortran.  (You must _not_ use an
							 | 
						|||
| 
								 | 
							
								'allocate' statement to allocate an L x local_M array, however; you must
							 | 
						|||
| 
								 | 
							
								allocate 'alloc_local' complex numbers, which may be greater than 'L *
							 | 
						|||
| 
								 | 
							
								local_M', in order to reserve space for intermediate steps of the
							 | 
						|||
| 
								 | 
							
								transform.)  Finally, we mention that because C's array indices are
							 | 
						|||
| 
								 | 
							
								zero-based, the 'local_j_offset' argument can conveniently be
							 | 
						|||
| 
								 | 
							
								interpreted as an offset in the 1-based 'j' index (rather than as a
							 | 
						|||
| 
								 | 
							
								starting index as in C).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If instead you had used the 'ior(FFTW_MEASURE,
							 | 
						|||
| 
								 | 
							
								FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
							 | 
						|||
| 
								 | 
							
								transposed M x local_L array, associated with the _same_ 'cdata'
							 | 
						|||
| 
								 | 
							
								allocation (since the transform is in-place), and which you could
							 | 
						|||
| 
								 | 
							
								declare with:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(cdata, tdata, [M,local_L])
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   where 'local_L' would have been obtained by changing the
							 | 
						|||
| 
								 | 
							
								'fftw_mpi_local_size_2d' call to:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
							 | 
						|||
| 
								 | 
							
								                                local_M, local_j_offset, local_L, local_i_offset)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) Technically, this is because you aren't actually calling the C
							 | 
						|||
| 
								 | 
							
								functions directly.  You are calling wrapper functions that translate
							 | 
						|||
| 
								 | 
							
								the communicator with 'MPI_Comm_f2c' before calling the ordinary C
							 | 
						|||
| 
								 | 
							
								interface.  This is all done transparently, however, since the
							 | 
						|||
| 
								 | 
							
								'fftw3-mpi.f03' interface file renames the wrappers so that they are
							 | 
						|||
| 
								 | 
							
								called in Fortran with the same names as the C interface functions.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Calling FFTW from Modern Fortran,  Next: Calling FFTW from Legacy Fortran,  Prev: Distributed-memory FFTW with MPI,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7 Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								**********************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Fortran 2003 standardized ways for Fortran code to call C libraries, and
							 | 
						|||
| 
								 | 
							
								this allows us to support a direct translation of the FFTW C API into
							 | 
						|||
| 
								 | 
							
								Fortran.  Compared to the legacy Fortran 77 interface (*note Calling
							 | 
						|||
| 
								 | 
							
								FFTW from Legacy Fortran::), this direct interface offers many
							 | 
						|||
| 
								 | 
							
								advantages, especially compile-time type-checking and aligned memory
							 | 
						|||
| 
								 | 
							
								allocation.  As of this writing, support for these C interoperability
							 | 
						|||
| 
								 | 
							
								features seems widespread, having been implemented in nearly all major
							 | 
						|||
| 
								 | 
							
								Fortran compilers (e.g.  GNU, Intel, IBM, Oracle/Solaris, Portland
							 | 
						|||
| 
								 | 
							
								Group, NAG).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This chapter documents that interface.  For the most part, since this
							 | 
						|||
| 
								 | 
							
								interface allows Fortran to call the C interface directly, the usage is
							 | 
						|||
| 
								 | 
							
								identical to C translated to Fortran syntax.  However, there are a few
							 | 
						|||
| 
								 | 
							
								subtle points such as memory allocation, wisdom, and data types that
							 | 
						|||
| 
								 | 
							
								deserve closer attention.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Overview of Fortran interface::
							 | 
						|||
| 
								 | 
							
								* Reversing array dimensions::
							 | 
						|||
| 
								 | 
							
								* FFTW Fortran type reference::
							 | 
						|||
| 
								 | 
							
								* Plan execution in Fortran::
							 | 
						|||
| 
								 | 
							
								* Allocating aligned memory in Fortran::
							 | 
						|||
| 
								 | 
							
								* Accessing the wisdom API from Fortran::
							 | 
						|||
| 
								 | 
							
								* Defining an FFTW module::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Overview of Fortran interface,  Next: Reversing array dimensions,  Prev: Calling FFTW from Modern Fortran,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.1 Overview of Fortran interface
							 | 
						|||
| 
								 | 
							
								=================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW provides a file 'fftw3.f03' that defines Fortran 2003 interfaces
							 | 
						|||
| 
								 | 
							
								for all of its C routines, except for the MPI routines described
							 | 
						|||
| 
								 | 
							
								elsewhere, which can be found in the same directory as 'fftw3.h' (the C
							 | 
						|||
| 
								 | 
							
								header file).  In any Fortran subroutine where you want to use FFTW
							 | 
						|||
| 
								 | 
							
								functions, you should begin with:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								       include 'fftw3.f03'
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This includes the interface definitions and the standard
							 | 
						|||
| 
								 | 
							
								'iso_c_binding' module (which defines the equivalents of C types).  You
							 | 
						|||
| 
								 | 
							
								can also put the FFTW functions into a module if you prefer (*note
							 | 
						|||
| 
								 | 
							
								Defining an FFTW module::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   At this point, you can now call anything in the FFTW C interface
							 | 
						|||
| 
								 | 
							
								directly, almost exactly as in C other than minor changes in syntax.
							 | 
						|||
| 
								 | 
							
								For example:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: plan
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out
							 | 
						|||
| 
								 | 
							
								       plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call fftw_execute_dft(plan, in, out)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call fftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   A few important things to keep in mind are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * FFTW plans are 'type(C_PTR)'.  Other C types are mapped in the
							 | 
						|||
| 
								 | 
							
								     obvious way via the 'iso_c_binding' standard: 'int' turns into
							 | 
						|||
| 
								 | 
							
								     'integer(C_INT)', 'fftw_complex' turns into
							 | 
						|||
| 
								 | 
							
								     'complex(C_DOUBLE_COMPLEX)', 'double' turns into 'real(C_DOUBLE)',
							 | 
						|||
| 
								 | 
							
								     and so on.  *Note FFTW Fortran type reference::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Functions in C become functions in Fortran if they have a return
							 | 
						|||
| 
								 | 
							
								     value, and subroutines in Fortran otherwise.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The ordering of the Fortran array dimensions must be _reversed_
							 | 
						|||
| 
								 | 
							
								     when they are passed to the FFTW plan creation, thanks to
							 | 
						|||
| 
								 | 
							
								     differences in array indexing conventions (*note Multi-dimensional
							 | 
						|||
| 
								 | 
							
								     Array Format::).  This is _unlike_ the legacy Fortran interface
							 | 
						|||
| 
								 | 
							
								     (*note Fortran-interface routines::), which reversed the dimensions
							 | 
						|||
| 
								 | 
							
								     for you.  *Note Reversing array dimensions::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Using ordinary Fortran array declarations like this works, but may
							 | 
						|||
| 
								 | 
							
								     yield suboptimal performance because the data may not be not
							 | 
						|||
| 
								 | 
							
								     aligned to exploit SIMD instructions on modern proessors (*note
							 | 
						|||
| 
								 | 
							
								     SIMD alignment and fftw_malloc::).  Better performance will often
							 | 
						|||
| 
								 | 
							
								     be obtained by allocating with 'fftw_alloc'.  *Note Allocating
							 | 
						|||
| 
								 | 
							
								     aligned memory in Fortran::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Similar to the legacy Fortran interface (*note FFTW Execution in
							 | 
						|||
| 
								 | 
							
								     Fortran::), we currently recommend _not_ using 'fftw_execute' but
							 | 
						|||
| 
								 | 
							
								     rather using the more specialized functions like 'fftw_execute_dft'
							 | 
						|||
| 
								 | 
							
								     (*note New-array Execute Functions::).  However, you should execute
							 | 
						|||
| 
								 | 
							
								     the plan on the 'same arrays' as the ones for which you created the
							 | 
						|||
| 
								 | 
							
								     plan, unless you are especially careful.  *Note Plan execution in
							 | 
						|||
| 
								 | 
							
								     Fortran::.  To prevent you from using 'fftw_execute' by mistake,
							 | 
						|||
| 
								 | 
							
								     the 'fftw3.f03' file does not provide an 'fftw_execute' interface
							 | 
						|||
| 
								 | 
							
								     declaration.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Multiple planner flags are combined with 'ior' (equivalent to '|'
							 | 
						|||
| 
								 | 
							
								     in C). e.g.  'FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes
							 | 
						|||
| 
								 | 
							
								     'ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'.  (You can also use '+' as
							 | 
						|||
| 
								 | 
							
								     long as you don't try to include a given flag more than once.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Extended and quadruple precision in Fortran::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Extended and quadruple precision in Fortran,  Prev: Overview of Fortran interface,  Up: Overview of Fortran interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.1.1 Extended and quadruple precision in Fortran
							 | 
						|||
| 
								 | 
							
								-------------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								If FFTW is compiled in 'long double' (extended) precision (*note
							 | 
						|||
| 
								 | 
							
								Installation and Customization::), you may be able to call the resulting
							 | 
						|||
| 
								 | 
							
								'fftwl_' routines (*note Precision::) from Fortran if your compiler
							 | 
						|||
| 
								 | 
							
								supports the 'C_LONG_DOUBLE_COMPLEX' type code.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Because some Fortran compilers do not support
							 | 
						|||
| 
								 | 
							
								'C_LONG_DOUBLE_COMPLEX', the 'fftwl_' declarations are segregated into a
							 | 
						|||
| 
								 | 
							
								separate interface file 'fftw3l.f03', which you should include _in
							 | 
						|||
| 
								 | 
							
								addition_ to 'fftw3.f03' (which declares precision-independent 'FFTW_'
							 | 
						|||
| 
								 | 
							
								constants):
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								       include 'fftw3.f03'
							 | 
						|||
| 
								 | 
							
								       include 'fftw3l.f03'
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We also support using the nonstandard '__float128'
							 | 
						|||
| 
								 | 
							
								quadruple-precision type provided by recent versions of 'gcc' on 32- and
							 | 
						|||
| 
								 | 
							
								64-bit x86 hardware (*note Installation and Customization::), using the
							 | 
						|||
| 
								 | 
							
								corresponding 'real(16)' and 'complex(16)' types supported by
							 | 
						|||
| 
								 | 
							
								'gfortran'.  The quadruple-precision 'fftwq_' functions (*note
							 | 
						|||
| 
								 | 
							
								Precision::) are declared in a 'fftw3q.f03' interface file, which should
							 | 
						|||
| 
								 | 
							
								be included in addition to 'fftw3.f03', as above.  You should also link
							 | 
						|||
| 
								 | 
							
								with '-lfftw3q -lquadmath -lm' as in C.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Reversing array dimensions,  Next: FFTW Fortran type reference,  Prev: Overview of Fortran interface,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.2 Reversing array dimensions
							 | 
						|||
| 
								 | 
							
								==============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								A minor annoyance in calling FFTW from Fortran is that FFTW's array
							 | 
						|||
| 
								 | 
							
								dimensions are defined in the C convention (row-major order), while
							 | 
						|||
| 
								 | 
							
								Fortran's array dimensions are the opposite convention (column-major
							 | 
						|||
| 
								 | 
							
								order).  *Note Multi-dimensional Array Format::.  This is just a
							 | 
						|||
| 
								 | 
							
								bookkeeping difference, with no effect on performance.  The only
							 | 
						|||
| 
								 | 
							
								consequence of this is that, whenever you create an FFTW plan for a
							 | 
						|||
| 
								 | 
							
								multi-dimensional transform, you must always _reverse the ordering of
							 | 
						|||
| 
								 | 
							
								the dimensions_.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, consider the three-dimensional (L x M x N ) arrays:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To plan a DFT for these arrays using 'fftw_plan_dft_3d', you could
							 | 
						|||
| 
								 | 
							
								do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   That is, from FFTW's perspective this is a N x M x L array.  _No data
							 | 
						|||
| 
								 | 
							
								transposition need occur_, as this is _only notation_.  Similarly, to
							 | 
						|||
| 
								 | 
							
								use the more generic routine 'fftw_plan_dft' with the same arrays, you
							 | 
						|||
| 
								 | 
							
								could do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       integer(C_INT), dimension(3) :: n = [N,M,L]
							 | 
						|||
| 
								 | 
							
								       plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note, by the way, that this is different from the legacy Fortran
							 | 
						|||
| 
								 | 
							
								interface (*note Fortran-interface routines::), which automatically
							 | 
						|||
| 
								 | 
							
								reverses the order of the array dimension for you.  Here, you are
							 | 
						|||
| 
								 | 
							
								calling the C interface directly, so there is no "translation" layer.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   An important thing to keep in mind is the implication of this for
							 | 
						|||
| 
								 | 
							
								multidimensional real-to-complex transforms (*note Multi-Dimensional
							 | 
						|||
| 
								 | 
							
								DFTs of Real Data::).  In C, a multidimensional real-to-complex DFT
							 | 
						|||
| 
								 | 
							
								chops the last dimension roughly in half (N x M x L real input goes to N
							 | 
						|||
| 
								 | 
							
								x M x L/2+1 complex output).  In Fortran, because the array dimension
							 | 
						|||
| 
								 | 
							
								notation is reversed, the _first_ dimension of the complex data is
							 | 
						|||
| 
								 | 
							
								chopped roughly in half.  For example consider the 'r2c' transform of L
							 | 
						|||
| 
								 | 
							
								x M x N real input in Fortran:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: plan
							 | 
						|||
| 
								 | 
							
								       real(C_DOUBLE), dimension(L,M,N) :: in
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out
							 | 
						|||
| 
								 | 
							
								       plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call fftw_execute_dft_r2c(plan, in, out)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Alternatively, for an in-place r2c transform, as described in the C
							 | 
						|||
| 
								 | 
							
								documentation we must _pad_ the _first_ dimension of the real input with
							 | 
						|||
| 
								 | 
							
								an extra two entries (which are ignored by FFTW) so as to leave enough
							 | 
						|||
| 
								 | 
							
								space for the complex output.  The input is _allocated_ as a 2[L/2+1] x
							 | 
						|||
| 
								 | 
							
								M x N array, even though only L x M x N of it is actually used.  In this
							 | 
						|||
| 
								 | 
							
								example, we will allocate the array as a pointer type, using
							 | 
						|||
| 
								 | 
							
								'fftw_alloc' to ensure aligned memory for maximum performance (*note
							 | 
						|||
| 
								 | 
							
								Allocating aligned memory in Fortran::); this also makes it easy to
							 | 
						|||
| 
								 | 
							
								reference the same memory as both a real array and a complex array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       real(C_DOUBLE), pointer :: in(:,:,:)
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:)
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: plan, data
							 | 
						|||
| 
								 | 
							
								       data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T))
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(data, in, [2*(L/2+1),M,N])
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(data, out, [L/2+1,M,N])
							 | 
						|||
| 
								 | 
							
								       plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call fftw_execute_dft_r2c(plan, in, out)
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call fftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								       call fftw_free(data)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW Fortran type reference,  Next: Plan execution in Fortran,  Prev: Reversing array dimensions,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.3 FFTW Fortran type reference
							 | 
						|||
| 
								 | 
							
								===============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The following are the most important type correspondences between the C
							 | 
						|||
| 
								 | 
							
								interface and Fortran:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Plans ('fftw_plan' and variants) are 'type(C_PTR)' (i.e.  an opaque
							 | 
						|||
| 
								 | 
							
								     pointer).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The C floating-point types 'double', 'float', and 'long double'
							 | 
						|||
| 
								 | 
							
								     correspond to 'real(C_DOUBLE)', 'real(C_FLOAT)', and
							 | 
						|||
| 
								 | 
							
								     'real(C_LONG_DOUBLE)', respectively.  The C complex types
							 | 
						|||
| 
								 | 
							
								     'fftw_complex', 'fftwf_complex', and 'fftwl_complex' correspond in
							 | 
						|||
| 
								 | 
							
								     Fortran to 'complex(C_DOUBLE_COMPLEX)', 'complex(C_FLOAT_COMPLEX)',
							 | 
						|||
| 
								 | 
							
								     and 'complex(C_LONG_DOUBLE_COMPLEX)', respectively.  Just as in C
							 | 
						|||
| 
								 | 
							
								     (*note Precision::), the FFTW subroutines and types are prefixed
							 | 
						|||
| 
								 | 
							
								     with 'fftw_', 'fftwf_', and 'fftwl_' for the different precisions,
							 | 
						|||
| 
								 | 
							
								     and link to different libraries ('-lfftw3', '-lfftw3f', and
							 | 
						|||
| 
								 | 
							
								     '-lfftw3l' on Unix), but use the _same_ include file 'fftw3.f03'
							 | 
						|||
| 
								 | 
							
								     and the _same_ constants (all of which begin with 'FFTW_').  The
							 | 
						|||
| 
								 | 
							
								     exception is 'long double' precision, for which you should _also_
							 | 
						|||
| 
								 | 
							
								     include 'fftw3l.f03' (*note Extended and quadruple precision in
							 | 
						|||
| 
								 | 
							
								     Fortran::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The C integer types 'int' and 'unsigned' (used for planner flags)
							 | 
						|||
| 
								 | 
							
								     become 'integer(C_INT)'.  The C integer type 'ptrdiff_t' (e.g.  in
							 | 
						|||
| 
								 | 
							
								     the *note 64-bit Guru Interface::) becomes 'integer(C_INTPTR_T)',
							 | 
						|||
| 
								 | 
							
								     and 'size_t' (in 'fftw_malloc' etc.)  becomes 'integer(C_SIZE_T)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The 'fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::)
							 | 
						|||
| 
								 | 
							
								     becomes 'integer(C_FFTW_R2R_KIND)'.  The various constant values of
							 | 
						|||
| 
								 | 
							
								     the C enumerated type ('FFTW_R2HC' etc.)  become simply integer
							 | 
						|||
| 
								 | 
							
								     constants of the same names in Fortran.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Numeric array pointer arguments (e.g.  'double *') become
							 | 
						|||
| 
								 | 
							
								     'dimension(*), intent(out)' arrays of the same type, or
							 | 
						|||
| 
								 | 
							
								     'dimension(*), intent(in)' if they are pointers to constant data
							 | 
						|||
| 
								 | 
							
								     (e.g.  'const int *').  There are a few exceptions where numeric
							 | 
						|||
| 
								 | 
							
								     pointers refer to scalar outputs (e.g.  for 'fftw_flops'), in which
							 | 
						|||
| 
								 | 
							
								     case they are 'intent(out)' scalar arguments in Fortran too.  For
							 | 
						|||
| 
								 | 
							
								     the new-array execute functions (*note New-array Execute
							 | 
						|||
| 
								 | 
							
								     Functions::), the input arrays are declared 'dimension(*),
							 | 
						|||
| 
								 | 
							
								     intent(inout)', since they can be modified in the case of in-place
							 | 
						|||
| 
								 | 
							
								     or 'FFTW_DESTROY_INPUT' transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Pointer _return_ values (e.g 'double *') become 'type(C_PTR)'.  (If
							 | 
						|||
| 
								 | 
							
								     they are pointers to arrays, as for 'fftw_alloc_real', you can
							 | 
						|||
| 
								 | 
							
								     convert them back to Fortran array pointers with the standard
							 | 
						|||
| 
								 | 
							
								     intrinsic function 'c_f_pointer'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The 'fftw_iodim' type in the guru interface (*note Guru vector and
							 | 
						|||
| 
								 | 
							
								     transform sizes::) becomes 'type(fftw_iodim)' in Fortran, a derived
							 | 
						|||
| 
								 | 
							
								     data type (the Fortran analogue of C's 'struct') with three
							 | 
						|||
| 
								 | 
							
								     'integer(C_INT)' components: 'n', 'is', and 'os', with the same
							 | 
						|||
| 
								 | 
							
								     meanings as in C. The 'fftw_iodim64' type in the 64-bit guru
							 | 
						|||
| 
								 | 
							
								     interface (*note 64-bit Guru Interface::) is the same, except that
							 | 
						|||
| 
								 | 
							
								     its components are of type 'integer(C_INTPTR_T)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Using the wisdom import/export functions from Fortran is a bit
							 | 
						|||
| 
								 | 
							
								     tricky, and is discussed in *note Accessing the wisdom API from
							 | 
						|||
| 
								 | 
							
								     Fortran::.  In brief, the 'FILE *' arguments map to 'type(C_PTR)',
							 | 
						|||
| 
								 | 
							
								     'const char *' to 'character(C_CHAR), dimension(*), intent(in)'
							 | 
						|||
| 
								 | 
							
								     (null-terminated!), and the generic read-char/write-char functions
							 | 
						|||
| 
								 | 
							
								     map to 'type(C_FUNPTR)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You may be wondering if you need to search-and-replace
							 | 
						|||
| 
								 | 
							
								'real(kind(0.0d0))' (or whatever your favorite Fortran spelling of
							 | 
						|||
| 
								 | 
							
								"double precision" is) with 'real(C_DOUBLE)' everywhere in your program,
							 | 
						|||
| 
								 | 
							
								and similarly for 'complex' and 'integer' types.  The answer is no; you
							 | 
						|||
| 
								 | 
							
								can still use your existing types.  As long as these types match their C
							 | 
						|||
| 
								 | 
							
								counterparts, things should work without a hitch.  The worst that can
							 | 
						|||
| 
								 | 
							
								happen, e.g.  in the (unlikely) event of a system where
							 | 
						|||
| 
								 | 
							
								'real(kind(0.0d0))' is different from 'real(C_DOUBLE)', is that the
							 | 
						|||
| 
								 | 
							
								compiler will give you a type-mismatch error.  That is, if you don't use
							 | 
						|||
| 
								 | 
							
								the 'iso_c_binding' kinds you need to accept at least the theoretical
							 | 
						|||
| 
								 | 
							
								possibility of having to change your code in response to compiler errors
							 | 
						|||
| 
								 | 
							
								on some future machine, but you don't need to worry about silently
							 | 
						|||
| 
								 | 
							
								compiling incorrect code that yields runtime errors.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Plan execution in Fortran,  Next: Allocating aligned memory in Fortran,  Prev: FFTW Fortran type reference,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.4 Plan execution in Fortran
							 | 
						|||
| 
								 | 
							
								=============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In C, in order to use a plan, one normally calls 'fftw_execute', which
							 | 
						|||
| 
								 | 
							
								executes the plan to perform the transform on the input/output arrays
							 | 
						|||
| 
								 | 
							
								passed when the plan was created (*note Using Plans::).  The
							 | 
						|||
| 
								 | 
							
								corresponding subroutine call in modern Fortran is:
							 | 
						|||
| 
								 | 
							
								      call fftw_execute(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   However, we have had reports that this causes problems with some
							 | 
						|||
| 
								 | 
							
								recent optimizing Fortran compilers.  The problem is, because the
							 | 
						|||
| 
								 | 
							
								input/output arrays are not passed as explicit arguments to
							 | 
						|||
| 
								 | 
							
								'fftw_execute', the semantics of Fortran (unlike C) allow the compiler
							 | 
						|||
| 
								 | 
							
								to assume that the input/output arrays are not changed by
							 | 
						|||
| 
								 | 
							
								'fftw_execute'.  As a consequence, certain compilers end up
							 | 
						|||
| 
								 | 
							
								repositioning the call to 'fftw_execute', assuming incorrectly that it
							 | 
						|||
| 
								 | 
							
								does nothing to the arrays.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are various workarounds to this, but the safest and simplest
							 | 
						|||
| 
								 | 
							
								thing is to not use 'fftw_execute' in Fortran.  Instead, use the
							 | 
						|||
| 
								 | 
							
								functions described in *note New-array Execute Functions::, which take
							 | 
						|||
| 
								 | 
							
								the input/output arrays as explicit arguments.  For example, if the plan
							 | 
						|||
| 
								 | 
							
								is for a complex-data DFT and was created for the arrays 'in' and 'out',
							 | 
						|||
| 
								 | 
							
								you would do:
							 | 
						|||
| 
								 | 
							
								      call fftw_execute_dft(plan, in, out)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are a few things to be careful of, however:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * You must use the correct type of execute function, matching the way
							 | 
						|||
| 
								 | 
							
								     the plan was created.  Complex DFT plans should use
							 | 
						|||
| 
								 | 
							
								     'fftw_execute_dft', Real-input (r2c) DFT plans should use use
							 | 
						|||
| 
								 | 
							
								     'fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
							 | 
						|||
| 
								 | 
							
								     'fftw_execute_dft_c2r'.  The various r2r plans should use
							 | 
						|||
| 
								 | 
							
								     'fftw_execute_r2r'.  Fortunately, if you use the wrong one you will
							 | 
						|||
| 
								 | 
							
								     get a compile-time type-mismatch error (unlike legacy Fortran).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * You should normally pass the same input/output arrays that were
							 | 
						|||
| 
								 | 
							
								     used when creating the plan.  This is always safe.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * _If_ you pass _different_ input/output arrays compared to those
							 | 
						|||
| 
								 | 
							
								     used when creating the plan, you must abide by all the restrictions
							 | 
						|||
| 
								 | 
							
								     of the new-array execute functions (*note New-array Execute
							 | 
						|||
| 
								 | 
							
								     Functions::).  The most tricky of these is the requirement that the
							 | 
						|||
| 
								 | 
							
								     new arrays have the same alignment as the original arrays; the best
							 | 
						|||
| 
								 | 
							
								     (and possibly only) way to guarantee this is to use the
							 | 
						|||
| 
								 | 
							
								     'fftw_alloc' functions to allocate your arrays (*note Allocating
							 | 
						|||
| 
								 | 
							
								     aligned memory in Fortran::).  Alternatively, you can use the
							 | 
						|||
| 
								 | 
							
								     'FFTW_UNALIGNED' flag when creating the plan, in which case the
							 | 
						|||
| 
								 | 
							
								     plan does not depend on the alignment, but this may sacrifice
							 | 
						|||
| 
								 | 
							
								     substantial performance on architectures (like x86) with SIMD
							 | 
						|||
| 
								 | 
							
								     instructions (*note SIMD alignment and fftw_malloc::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Allocating aligned memory in Fortran,  Next: Accessing the wisdom API from Fortran,  Prev: Plan execution in Fortran,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.5 Allocating aligned memory in Fortran
							 | 
						|||
| 
								 | 
							
								========================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In order to obtain maximum performance in FFTW, you should store your
							 | 
						|||
| 
								 | 
							
								data in arrays that have been specially aligned in memory (*note SIMD
							 | 
						|||
| 
								 | 
							
								alignment and fftw_malloc::).  Enforcing alignment also permits you to
							 | 
						|||
| 
								 | 
							
								safely use the new-array execute functions (*note New-array Execute
							 | 
						|||
| 
								 | 
							
								Functions::) to apply a given plan to more than one pair of in/out
							 | 
						|||
| 
								 | 
							
								arrays.  Unfortunately, standard Fortran arrays do _not_ provide any
							 | 
						|||
| 
								 | 
							
								alignment guarantees.  The _only_ way to allocate aligned memory in
							 | 
						|||
| 
								 | 
							
								standard Fortran is to allocate it with an external C function, like the
							 | 
						|||
| 
								 | 
							
								'fftw_alloc_real' and 'fftw_alloc_complex' functions.  Fortunately,
							 | 
						|||
| 
								 | 
							
								Fortran 2003 provides a simple way to associate such allocated memory
							 | 
						|||
| 
								 | 
							
								with a standard Fortran array pointer that you can then use normally.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We therefore recommend allocating all your input/output arrays using
							 | 
						|||
| 
								 | 
							
								the following technique:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								  1. Declare a 'pointer', 'arr', to your array of the desired type and
							 | 
						|||
| 
								 | 
							
								     dimensions.  For example, 'real(C_DOUBLE), pointer :: a(:,:)' for a
							 | 
						|||
| 
								 | 
							
								     2d real array, or 'complex(C_DOUBLE_COMPLEX), pointer :: a(:,:,:)'
							 | 
						|||
| 
								 | 
							
								     for a 3d complex array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								  2. The number of elements to allocate must be an 'integer(C_SIZE_T)'.
							 | 
						|||
| 
								 | 
							
								     You can either declare a variable of this type, e.g.
							 | 
						|||
| 
								 | 
							
								     'integer(C_SIZE_T) :: sz', to store the number of elements to
							 | 
						|||
| 
								 | 
							
								     allocate, or you can use the 'int(..., C_SIZE_T)' intrinsic
							 | 
						|||
| 
								 | 
							
								     function.  e.g.  set 'sz = L * M * N' or use 'int(L * M * N,
							 | 
						|||
| 
								 | 
							
								     C_SIZE_T)' for an L x M x N array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								  3. Declare a 'type(C_PTR) :: p' to hold the return value from FFTW's
							 | 
						|||
| 
								 | 
							
								     allocation routine.  Set 'p = fftw_alloc_real(sz)' for a real
							 | 
						|||
| 
								 | 
							
								     array, or 'p = fftw_alloc_complex(sz)' for a complex array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								  4. Associate your pointer 'arr' with the allocated memory 'p' using
							 | 
						|||
| 
								 | 
							
								     the standard 'c_f_pointer' subroutine: 'call c_f_pointer(p, arr,
							 | 
						|||
| 
								 | 
							
								     [...dimensions...])', where '[...dimensions...])' are an array of
							 | 
						|||
| 
								 | 
							
								     the dimensions of the array (in the usual Fortran order).  e.g.
							 | 
						|||
| 
								 | 
							
								     'call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array.
							 | 
						|||
| 
								 | 
							
								     (Alternatively, you can omit the dimensions argument if you
							 | 
						|||
| 
								 | 
							
								     specified the shape explicitly when declaring 'arr'.)  You can now
							 | 
						|||
| 
								 | 
							
								     use 'arr' as a usual multidimensional array.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								  5. When you are done using the array, deallocate the memory by 'call
							 | 
						|||
| 
								 | 
							
								     fftw_free(p)' on 'p'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For example, here is how we would allocate an L x M 2d real array:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       real(C_DOUBLE), pointer :: arr(:,:)
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: p
							 | 
						|||
| 
								 | 
							
								       p = fftw_alloc_real(int(L * M, C_SIZE_T))
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(p, arr, [L,M])
							 | 
						|||
| 
								 | 
							
								       _...use arr and arr(i,j) as usual..._
							 | 
						|||
| 
								 | 
							
								       call fftw_free(p)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   and here is an L x M x N 3d complex array:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:)
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: p
							 | 
						|||
| 
								 | 
							
								       p = fftw_alloc_complex(int(L * M * N, C_SIZE_T))
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(p, arr, [L,M,N])
							 | 
						|||
| 
								 | 
							
								       _...use arr and arr(i,j,k) as usual..._
							 | 
						|||
| 
								 | 
							
								       call fftw_free(p)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   See *note Reversing array dimensions:: for an example allocating a
							 | 
						|||
| 
								 | 
							
								single array and associating both real and complex array pointers with
							 | 
						|||
| 
								 | 
							
								it, for in-place real-to-complex transforms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Accessing the wisdom API from Fortran,  Next: Defining an FFTW module,  Prev: Allocating aligned memory in Fortran,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.6 Accessing the wisdom API from Fortran
							 | 
						|||
| 
								 | 
							
								=========================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a
							 | 
						|||
| 
								 | 
							
								"wisdom" API for saving plans to disk so that they can be recreated
							 | 
						|||
| 
								 | 
							
								quickly.  The C API for exporting (*note Wisdom Export::) and importing
							 | 
						|||
| 
								 | 
							
								(*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran,
							 | 
						|||
| 
								 | 
							
								however, because of differences in file I/O and string types between C
							 | 
						|||
| 
								 | 
							
								and Fortran.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Wisdom File Export/Import from Fortran::
							 | 
						|||
| 
								 | 
							
								* Wisdom String Export/Import from Fortran::
							 | 
						|||
| 
								 | 
							
								* Wisdom Generic Export/Import from Fortran::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom File Export/Import from Fortran,  Next: Wisdom String Export/Import from Fortran,  Prev: Accessing the wisdom API from Fortran,  Up: Accessing the wisdom API from Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.6.1 Wisdom File Export/Import from Fortran
							 | 
						|||
| 
								 | 
							
								--------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The easiest way to export and import wisdom is to do so using
							 | 
						|||
| 
								 | 
							
								'fftw_export_wisdom_to_filename' and 'fftw_wisdom_from_filename'.  The
							 | 
						|||
| 
								 | 
							
								only trick is that these require you to pass a C string, which is an
							 | 
						|||
| 
								 | 
							
								array of type 'CHARACTER(C_CHAR)' that is terminated by 'C_NULL_CHAR'.
							 | 
						|||
| 
								 | 
							
								You can call them like this:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       integer(C_INT) :: ret
							 | 
						|||
| 
								 | 
							
								       ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
							 | 
						|||
| 
								 | 
							
								       if (ret .eq. 0) stop 'error exporting wisdom to file'
							 | 
						|||
| 
								 | 
							
								       ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
							 | 
						|||
| 
								 | 
							
								       if (ret .eq. 0) stop 'error importing wisdom from file'
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that prepending 'C_CHAR_' is needed to specify that the literal
							 | 
						|||
| 
								 | 
							
								string is of kind 'C_CHAR', and we null-terminate the string by
							 | 
						|||
| 
								 | 
							
								appending '// C_NULL_CHAR'.  These functions return an 'integer(C_INT)'
							 | 
						|||
| 
								 | 
							
								('ret') which is '0' if an error occurred during export/import and
							 | 
						|||
| 
								 | 
							
								nonzero otherwise.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   It is also possible to use the lower-level routines
							 | 
						|||
| 
								 | 
							
								'fftw_export_wisdom_to_file' and 'fftw_import_wisdom_from_file', which
							 | 
						|||
| 
								 | 
							
								accept parameters of the C type 'FILE*', expressed in Fortran as
							 | 
						|||
| 
								 | 
							
								'type(C_PTR)'.  However, you are then responsible for creating the
							 | 
						|||
| 
								 | 
							
								'FILE*' yourself.  You can do this by using 'iso_c_binding' to define
							 | 
						|||
| 
								 | 
							
								Fortran intefaces for the C library functions 'fopen' and 'fclose',
							 | 
						|||
| 
								 | 
							
								which is a bit strange in Fortran but workable.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom String Export/Import from Fortran,  Next: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom File Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.6.2 Wisdom String Export/Import from Fortran
							 | 
						|||
| 
								 | 
							
								----------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Dealing with FFTW's C string export/import is a bit more painful.  In
							 | 
						|||
| 
								 | 
							
								particular, the 'fftw_export_wisdom_to_string' function requires you to
							 | 
						|||
| 
								 | 
							
								deal with a dynamically allocated C string.  To get its length, you must
							 | 
						|||
| 
								 | 
							
								define an interface to the C 'strlen' function, and to deallocate it you
							 | 
						|||
| 
								 | 
							
								must define an interface to C 'free':
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								       interface
							 | 
						|||
| 
								 | 
							
								         integer(C_INT) function strlen(s) bind(C, name='strlen')
							 | 
						|||
| 
								 | 
							
								           import
							 | 
						|||
| 
								 | 
							
								           type(C_PTR), value :: s
							 | 
						|||
| 
								 | 
							
								         end function strlen
							 | 
						|||
| 
								 | 
							
								         subroutine free(p) bind(C, name='free')
							 | 
						|||
| 
								 | 
							
								           import
							 | 
						|||
| 
								 | 
							
								           type(C_PTR), value :: p
							 | 
						|||
| 
								 | 
							
								         end subroutine free
							 | 
						|||
| 
								 | 
							
								       end interface
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given these definitions, you can then export wisdom to a Fortran
							 | 
						|||
| 
								 | 
							
								character array:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       character(C_CHAR), pointer :: s(:)
							 | 
						|||
| 
								 | 
							
								       integer(C_SIZE_T) :: slen
							 | 
						|||
| 
								 | 
							
								       type(C_PTR) :: p
							 | 
						|||
| 
								 | 
							
								       p = fftw_export_wisdom_to_string()
							 | 
						|||
| 
								 | 
							
								       if (.not. c_associated(p)) stop 'error exporting wisdom'
							 | 
						|||
| 
								 | 
							
								       slen = strlen(p)
							 | 
						|||
| 
								 | 
							
								       call c_f_pointer(p, s, [slen+1])
							 | 
						|||
| 
								 | 
							
								       ...
							 | 
						|||
| 
								 | 
							
								       call free(p)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that 'slen' is the length of the C string, but the length of the
							 | 
						|||
| 
								 | 
							
								array is 'slen+1' because it includes the terminating null character.
							 | 
						|||
| 
								 | 
							
								(You can omit the '+1' if you don't want Fortran to know about the null
							 | 
						|||
| 
								 | 
							
								character.)  The standard 'c_associated' function checks whether 'p' is
							 | 
						|||
| 
								 | 
							
								a null pointer, which is returned by 'fftw_export_wisdom_to_string' if
							 | 
						|||
| 
								 | 
							
								there was an error.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To import wisdom from a string, use 'fftw_import_wisdom_from_string'
							 | 
						|||
| 
								 | 
							
								as usual; note that the argument of this function must be a
							 | 
						|||
| 
								 | 
							
								'character(C_CHAR)' that is terminated by the 'C_NULL_CHAR' character,
							 | 
						|||
| 
								 | 
							
								like the 's' array above.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom Generic Export/Import from Fortran,  Prev: Wisdom String Export/Import from Fortran,  Up: Accessing the wisdom API from Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.6.3 Wisdom Generic Export/Import from Fortran
							 | 
						|||
| 
								 | 
							
								-----------------------------------------------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The most generic wisdom export/import functions allow you to provide an
							 | 
						|||
| 
								 | 
							
								arbitrary callback function to read/write one character at a time in any
							 | 
						|||
| 
								 | 
							
								way you want.  However, your callback function must be written in a
							 | 
						|||
| 
								 | 
							
								special way, using the 'bind(C)' attribute to be passed to a C
							 | 
						|||
| 
								 | 
							
								interface.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In particular, to call the generic wisdom export function
							 | 
						|||
| 
								 | 
							
								'fftw_export_wisdom', you would write a callback subroutine of the form:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       subroutine my_write_char(c, p) bind(C)
							 | 
						|||
| 
								 | 
							
								         use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								         character(C_CHAR), value :: c
							 | 
						|||
| 
								 | 
							
								         type(C_PTR), value :: p
							 | 
						|||
| 
								 | 
							
								         _...write c..._
							 | 
						|||
| 
								 | 
							
								       end subroutine my_write_char
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Given such a subroutine (along with the corresponding interface
							 | 
						|||
| 
								 | 
							
								definition), you could then export wisdom using:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       call fftw_export_wisdom(c_funloc(my_write_char), p)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The standard 'c_funloc' intrinsic converts a Fortran 'bind(C)'
							 | 
						|||
| 
								 | 
							
								subroutine into a C function pointer.  The parameter 'p' is a
							 | 
						|||
| 
								 | 
							
								'type(C_PTR)' to any arbitrary data that you want to pass to
							 | 
						|||
| 
								 | 
							
								'my_write_char' (or 'C_NULL_PTR' if none).  (Note that you can get a C
							 | 
						|||
| 
								 | 
							
								pointer to Fortran data using the intrinsic 'c_loc', and convert it back
							 | 
						|||
| 
								 | 
							
								to a Fortran pointer in 'my_write_char' using 'c_f_pointer'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Similarly, to use the generic 'fftw_import_wisdom', you would define
							 | 
						|||
| 
								 | 
							
								a callback function of the form:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       integer(C_INT) function my_read_char(p) bind(C)
							 | 
						|||
| 
								 | 
							
								         use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								         type(C_PTR), value :: p
							 | 
						|||
| 
								 | 
							
								         character :: c
							 | 
						|||
| 
								 | 
							
								         _...read a character c..._
							 | 
						|||
| 
								 | 
							
								         my_read_char = ichar(c, C_INT)
							 | 
						|||
| 
								 | 
							
								       end function my_read_char
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       ....
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       integer(C_INT) :: ret
							 | 
						|||
| 
								 | 
							
								       ret = fftw_import_wisdom(c_funloc(my_read_char), p)
							 | 
						|||
| 
								 | 
							
								       if (ret .eq. 0) stop 'error importing wisdom'
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Your function can return '-1' if the end of the input is reached.
							 | 
						|||
| 
								 | 
							
								Again, 'p' is an arbitrary 'type(C_PTR' that is passed through to your
							 | 
						|||
| 
								 | 
							
								function.  'fftw_import_wisdom' returns '0' if an error occurred and
							 | 
						|||
| 
								 | 
							
								nonzero otherwise.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Defining an FFTW module,  Prev: Accessing the wisdom API from Fortran,  Up: Calling FFTW from Modern Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								7.7 Defining an FFTW module
							 | 
						|||
| 
								 | 
							
								===========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Rather than using the 'include' statement to include the 'fftw3.f03'
							 | 
						|||
| 
								 | 
							
								interface file in any subroutine where you want to use FFTW, you might
							 | 
						|||
| 
								 | 
							
								prefer to define an FFTW Fortran module.  FFTW does not install itself
							 | 
						|||
| 
								 | 
							
								as a module, primarily because 'fftw3.f03' can be shared between
							 | 
						|||
| 
								 | 
							
								different Fortran compilers while modules (in general) cannot.  However,
							 | 
						|||
| 
								 | 
							
								it is trivial to define your own FFTW module if you want.  Just create a
							 | 
						|||
| 
								 | 
							
								file containing:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       module FFTW3
							 | 
						|||
| 
								 | 
							
								         use, intrinsic :: iso_c_binding
							 | 
						|||
| 
								 | 
							
								         include 'fftw3.f03'
							 | 
						|||
| 
								 | 
							
								       end module
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Compile this file into a module as usual for your compiler (e.g.
							 | 
						|||
| 
								 | 
							
								with 'gfortran -c' you will get a file 'fftw3.mod').  Now, instead of
							 | 
						|||
| 
								 | 
							
								'include 'fftw3.f03'', whenever you want to use FFTW routines you can
							 | 
						|||
| 
								 | 
							
								just do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								       use FFTW3
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   as usual for Fortran modules.  (You still need to link to the FFTW
							 | 
						|||
| 
								 | 
							
								library, of course.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Calling FFTW from Legacy Fortran,  Next: Upgrading from FFTW version 2,  Prev: Calling FFTW from Modern Fortran,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8 Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								**********************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This chapter describes the interface to FFTW callable by Fortran code in
							 | 
						|||
| 
								 | 
							
								older compilers not supporting the Fortran 2003 C interoperability
							 | 
						|||
| 
								 | 
							
								features (*note Calling FFTW from Modern Fortran::).  This interface has
							 | 
						|||
| 
								 | 
							
								the major disadvantage that it is not type-checked, so if you mistake
							 | 
						|||
| 
								 | 
							
								the argument types or ordering then your program will not have any
							 | 
						|||
| 
								 | 
							
								compiler errors, and will likely crash at runtime.  So, greater care is
							 | 
						|||
| 
								 | 
							
								needed.  Also, technically interfacing older Fortran versions to C is
							 | 
						|||
| 
								 | 
							
								nonstandard, but in practice we have found that the techniques used in
							 | 
						|||
| 
								 | 
							
								this chapter have worked with all known Fortran compilers for many
							 | 
						|||
| 
								 | 
							
								years.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The legacy Fortran interface differs from the C interface only in the
							 | 
						|||
| 
								 | 
							
								prefix ('dfftw_' instead of 'fftw_' in double precision) and a few other
							 | 
						|||
| 
								 | 
							
								minor details.  This Fortran interface is included in the FFTW libraries
							 | 
						|||
| 
								 | 
							
								by default, unless a Fortran compiler isn't found on your system or
							 | 
						|||
| 
								 | 
							
								'--disable-fortran' is included in the 'configure' flags.  We assume
							 | 
						|||
| 
								 | 
							
								here that the reader is already familiar with the usage of FFTW in C, as
							 | 
						|||
| 
								 | 
							
								described elsewhere in this manual.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The MPI parallel interface to FFTW is _not_ currently available to
							 | 
						|||
| 
								 | 
							
								legacy Fortran.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Fortran-interface routines::
							 | 
						|||
| 
								 | 
							
								* FFTW Constants in Fortran::
							 | 
						|||
| 
								 | 
							
								* FFTW Execution in Fortran::
							 | 
						|||
| 
								 | 
							
								* Fortran Examples::
							 | 
						|||
| 
								 | 
							
								* Wisdom of Fortran?::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Fortran-interface routines,  Next: FFTW Constants in Fortran,  Prev: Calling FFTW from Legacy Fortran,  Up: Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8.1 Fortran-interface routines
							 | 
						|||
| 
								 | 
							
								==============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Nearly all of the FFTW functions have Fortran-callable equivalents.  The
							 | 
						|||
| 
								 | 
							
								name of the legacy Fortran routine is the same as that of the
							 | 
						|||
| 
								 | 
							
								corresponding C routine, but with the 'fftw_' prefix replaced by
							 | 
						|||
| 
								 | 
							
								'dfftw_'.(1)  The single and long-double precision versions use 'sfftw_'
							 | 
						|||
| 
								 | 
							
								and 'lfftw_', respectively, instead of 'fftwf_' and 'fftwl_'; quadruple
							 | 
						|||
| 
								 | 
							
								precision ('real*16') is available on some systems as 'fftwq_' (*note
							 | 
						|||
| 
								 | 
							
								Precision::).  (Note that 'long double' on x86 hardware is usually at
							 | 
						|||
| 
								 | 
							
								most 80-bit extended precision, _not_ quadruple precision.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For the most part, all of the arguments to the functions are the
							 | 
						|||
| 
								 | 
							
								same, with the following exceptions:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * 'plan' variables (what would be of type 'fftw_plan' in C), must be
							 | 
						|||
| 
								 | 
							
								     declared as a type that is at least as big as a pointer (address)
							 | 
						|||
| 
								 | 
							
								     on your machine.  We recommend using 'integer*8' everywhere, since
							 | 
						|||
| 
								 | 
							
								     this should always be big enough.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Any function that returns a value (e.g.  'fftw_plan_dft') is
							 | 
						|||
| 
								 | 
							
								     converted into a _subroutine_.  The return value is converted into
							 | 
						|||
| 
								 | 
							
								     an additional _first_ parameter of this subroutine.(2)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The Fortran routines expect multi-dimensional arrays to be in
							 | 
						|||
| 
								 | 
							
								     _column-major_ order, which is the ordinary format of Fortran
							 | 
						|||
| 
								 | 
							
								     arrays (*note Multi-dimensional Array Format::).  They do this
							 | 
						|||
| 
								 | 
							
								     transparently and costlessly simply by reversing the order of the
							 | 
						|||
| 
								 | 
							
								     dimensions passed to FFTW, but this has one important consequence
							 | 
						|||
| 
								 | 
							
								     for multi-dimensional real-complex transforms, discussed below.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Wisdom import and export is somewhat more tricky because one cannot
							 | 
						|||
| 
								 | 
							
								     easily pass files or strings between C and Fortran; see *note
							 | 
						|||
| 
								 | 
							
								     Wisdom of Fortran?::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Legacy Fortran cannot use the 'fftw_malloc' dynamic-allocation
							 | 
						|||
| 
								 | 
							
								     routine.  If you want to exploit the SIMD FFTW (*note SIMD
							 | 
						|||
| 
								 | 
							
								     alignment and fftw_malloc::), you'll need to figure out some other
							 | 
						|||
| 
								 | 
							
								     way to ensure that your arrays are at least 16-byte aligned.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * Since Fortran 77 does not have data structures, the 'fftw_iodim'
							 | 
						|||
| 
								 | 
							
								     structure from the guru interface (*note Guru vector and transform
							 | 
						|||
| 
								 | 
							
								     sizes::) must be split into separate arguments.  In particular, any
							 | 
						|||
| 
								 | 
							
								     'fftw_iodim' array arguments in the C guru interface become three
							 | 
						|||
| 
								 | 
							
								     integer array arguments ('n', 'is', and 'os') in the Fortran guru
							 | 
						|||
| 
								 | 
							
								     interface, all of whose lengths should be equal to the
							 | 
						|||
| 
								 | 
							
								     corresponding 'rank' argument.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * The guru planner interface in Fortran does _not_ do any automatic
							 | 
						|||
| 
								 | 
							
								     translation between column-major and row-major; you are responsible
							 | 
						|||
| 
								 | 
							
								     for setting the strides etcetera to correspond to your Fortran
							 | 
						|||
| 
								 | 
							
								     arrays.  However, as a slight bug that we are preserving for
							 | 
						|||
| 
								 | 
							
								     backwards compatibility, the 'plan_guru_r2r' in Fortran _does_
							 | 
						|||
| 
								 | 
							
								     reverse the order of its 'kind' array parameter, so the 'kind'
							 | 
						|||
| 
								 | 
							
								     array of that routine should be in the reverse of the order of the
							 | 
						|||
| 
								 | 
							
								     iodim arrays (see above).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In general, you should take care to use Fortran data types that
							 | 
						|||
| 
								 | 
							
								correspond to (i.e.  are the same size as) the C types used by FFTW. In
							 | 
						|||
| 
								 | 
							
								practice, this correspondence is usually straightforward (i.e.
							 | 
						|||
| 
								 | 
							
								'integer' corresponds to 'int', 'real' corresponds to 'float',
							 | 
						|||
| 
								 | 
							
								etcetera).  The native Fortran double/single-precision complex type
							 | 
						|||
| 
								 | 
							
								should be compatible with 'fftw_complex'/'fftwf_complex'.  Such simple
							 | 
						|||
| 
								 | 
							
								correspondences are assumed in the examples below.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) Technically, Fortran 77 identifiers are not allowed to have more
							 | 
						|||
| 
								 | 
							
								than 6 characters, nor may they contain underscores.  Any compiler that
							 | 
						|||
| 
								 | 
							
								enforces this limitation doesn't deserve to link to FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (2) The reason for this is that some Fortran implementations seem to
							 | 
						|||
| 
								 | 
							
								have trouble with C function return values, and vice versa.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW Constants in Fortran,  Next: FFTW Execution in Fortran,  Prev: Fortran-interface routines,  Up: Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8.2 FFTW Constants in Fortran
							 | 
						|||
| 
								 | 
							
								=============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								When creating plans in FFTW, a number of constants are used to specify
							 | 
						|||
| 
								 | 
							
								options, such as 'FFTW_MEASURE' or 'FFTW_ESTIMATE'.  The same constants
							 | 
						|||
| 
								 | 
							
								must be used with the wrapper routines, but of course the C header files
							 | 
						|||
| 
								 | 
							
								where the constants are defined can't be incorporated directly into
							 | 
						|||
| 
								 | 
							
								Fortran code.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Instead, we have placed Fortran equivalents of the FFTW constant
							 | 
						|||
| 
								 | 
							
								definitions in the file 'fftw3.f', which can be found in the same
							 | 
						|||
| 
								 | 
							
								directory as 'fftw3.h'.  If your Fortran compiler supports a
							 | 
						|||
| 
								 | 
							
								preprocessor of some sort, you should be able to 'include' or '#include'
							 | 
						|||
| 
								 | 
							
								this file; otherwise, you can paste it directly into your code.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In C, you combine different flags (like 'FFTW_PRESERVE_INPUT' and
							 | 
						|||
| 
								 | 
							
								'FFTW_MEASURE') using the ''|'' operator; in Fortran you should just use
							 | 
						|||
| 
								 | 
							
								''+''.  (Take care not to add in the same flag more than once, though.
							 | 
						|||
| 
								 | 
							
								Alternatively, you can use the 'ior' intrinsic function standardized in
							 | 
						|||
| 
								 | 
							
								Fortran 95.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: FFTW Execution in Fortran,  Next: Fortran Examples,  Prev: FFTW Constants in Fortran,  Up: Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8.3 FFTW Execution in Fortran
							 | 
						|||
| 
								 | 
							
								=============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In C, in order to use a plan, one normally calls 'fftw_execute', which
							 | 
						|||
| 
								 | 
							
								executes the plan to perform the transform on the input/output arrays
							 | 
						|||
| 
								 | 
							
								passed when the plan was created (*note Using Plans::).  The
							 | 
						|||
| 
								 | 
							
								corresponding subroutine call in legacy Fortran is:
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   However, we have had reports that this causes problems with some
							 | 
						|||
| 
								 | 
							
								recent optimizing Fortran compilers.  The problem is, because the
							 | 
						|||
| 
								 | 
							
								input/output arrays are not passed as explicit arguments to
							 | 
						|||
| 
								 | 
							
								'dfftw_execute', the semantics of Fortran (unlike C) allow the compiler
							 | 
						|||
| 
								 | 
							
								to assume that the input/output arrays are not changed by
							 | 
						|||
| 
								 | 
							
								'dfftw_execute'.  As a consequence, certain compilers end up optimizing
							 | 
						|||
| 
								 | 
							
								out or repositioning the call to 'dfftw_execute', assuming incorrectly
							 | 
						|||
| 
								 | 
							
								that it does nothing.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are various workarounds to this, but the safest and simplest
							 | 
						|||
| 
								 | 
							
								thing is to not use 'dfftw_execute' in Fortran.  Instead, use the
							 | 
						|||
| 
								 | 
							
								functions described in *note New-array Execute Functions::, which take
							 | 
						|||
| 
								 | 
							
								the input/output arrays as explicit arguments.  For example, if the plan
							 | 
						|||
| 
								 | 
							
								is for a complex-data DFT and was created for the arrays 'in' and 'out',
							 | 
						|||
| 
								 | 
							
								you would do:
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute_dft(plan, in, out)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There are a few things to be careful of, however:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * You must use the correct type of execute function, matching the way
							 | 
						|||
| 
								 | 
							
								     the plan was created.  Complex DFT plans should use
							 | 
						|||
| 
								 | 
							
								     'dfftw_execute_dft', Real-input (r2c) DFT plans should use use
							 | 
						|||
| 
								 | 
							
								     'dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
							 | 
						|||
| 
								 | 
							
								     'dfftw_execute_dft_c2r'.  The various r2r plans should use
							 | 
						|||
| 
								 | 
							
								     'dfftw_execute_r2r'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * You should normally pass the same input/output arrays that were
							 | 
						|||
| 
								 | 
							
								     used when creating the plan.  This is always safe.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * _If_ you pass _different_ input/output arrays compared to those
							 | 
						|||
| 
								 | 
							
								     used when creating the plan, you must abide by all the restrictions
							 | 
						|||
| 
								 | 
							
								     of the new-array execute functions (*note New-array Execute
							 | 
						|||
| 
								 | 
							
								     Functions::).  The most difficult of these, in Fortran, is the
							 | 
						|||
| 
								 | 
							
								     requirement that the new arrays have the same alignment as the
							 | 
						|||
| 
								 | 
							
								     original arrays, because there seems to be no way in legacy Fortran
							 | 
						|||
| 
								 | 
							
								     to obtain guaranteed-aligned arrays (analogous to 'fftw_malloc' in
							 | 
						|||
| 
								 | 
							
								     C). You can, of course, use the 'FFTW_UNALIGNED' flag when creating
							 | 
						|||
| 
								 | 
							
								     the plan, in which case the plan does not depend on the alignment,
							 | 
						|||
| 
								 | 
							
								     but this may sacrifice substantial performance on architectures
							 | 
						|||
| 
								 | 
							
								     (like x86) with SIMD instructions (*note SIMD alignment and
							 | 
						|||
| 
								 | 
							
								     fftw_malloc::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Fortran Examples,  Next: Wisdom of Fortran?,  Prev: FFTW Execution in Fortran,  Up: Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8.4 Fortran Examples
							 | 
						|||
| 
								 | 
							
								====================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In C, you might have something like the following to transform a
							 | 
						|||
| 
								 | 
							
								one-dimensional complex array:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             fftw_complex in[N], out[N];
							 | 
						|||
| 
								 | 
							
								             fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								             fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								             fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In Fortran, you would use the following to accomplish the same thing:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             double complex in, out
							 | 
						|||
| 
								 | 
							
								             dimension in(N), out(N)
							 | 
						|||
| 
								 | 
							
								             integer*8 plan
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute_dft(plan, in, out)
							 | 
						|||
| 
								 | 
							
								             call dfftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Notice how all routines are called as Fortran subroutines, and the
							 | 
						|||
| 
								 | 
							
								plan is returned via the first argument to 'dfftw_plan_dft_1d'.  Notice
							 | 
						|||
| 
								 | 
							
								also that we changed 'fftw_execute' to 'dfftw_execute_dft' (*note FFTW
							 | 
						|||
| 
								 | 
							
								Execution in Fortran::).  To do the same thing, but using 8 threads in
							 | 
						|||
| 
								 | 
							
								parallel (*note Multi-threaded FFTW::), you would simply prefix these
							 | 
						|||
| 
								 | 
							
								calls with:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             integer iret
							 | 
						|||
| 
								 | 
							
								             call dfftw_init_threads(iret)
							 | 
						|||
| 
								 | 
							
								             call dfftw_plan_with_nthreads(8)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (You might want to check the value of 'iret': if it is zero, it
							 | 
						|||
| 
								 | 
							
								indicates an unlikely error during thread initialization.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To check the number of threads currently being used by the planner,
							 | 
						|||
| 
								 | 
							
								you can do the following:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             integer iret
							 | 
						|||
| 
								 | 
							
								             call dfftw_planner_nthreads(iret)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To transform a three-dimensional array in-place with C, you might do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             fftw_complex arr[L][M][N];
							 | 
						|||
| 
								 | 
							
								             fftw_plan plan;
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             plan = fftw_plan_dft_3d(L,M,N, arr,arr,
							 | 
						|||
| 
								 | 
							
								                                     FFTW_FORWARD, FFTW_ESTIMATE);
							 | 
						|||
| 
								 | 
							
								             fftw_execute(plan);
							 | 
						|||
| 
								 | 
							
								             fftw_destroy_plan(plan);
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In Fortran, you would use this instead:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             double complex arr
							 | 
						|||
| 
								 | 
							
								             dimension arr(L,M,N)
							 | 
						|||
| 
								 | 
							
								             integer*8 plan
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
							 | 
						|||
| 
								 | 
							
								            &                       FFTW_FORWARD, FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute_dft(plan, arr, arr)
							 | 
						|||
| 
								 | 
							
								             call dfftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Note that we pass the array dimensions in the "natural" order in both
							 | 
						|||
| 
								 | 
							
								C and Fortran.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To transform a one-dimensional real array in Fortran, you might do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             double precision in
							 | 
						|||
| 
								 | 
							
								             dimension in(N)
							 | 
						|||
| 
								 | 
							
								             double complex out
							 | 
						|||
| 
								 | 
							
								             dimension out(N/2 + 1)
							 | 
						|||
| 
								 | 
							
								             integer*8 plan
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute_dft_r2c(plan, in, out)
							 | 
						|||
| 
								 | 
							
								             call dfftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To transform a two-dimensional real array, out of place, you might
							 | 
						|||
| 
								 | 
							
								use the following:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             double precision in
							 | 
						|||
| 
								 | 
							
								             dimension in(M,N)
							 | 
						|||
| 
								 | 
							
								             double complex out
							 | 
						|||
| 
								 | 
							
								             dimension out(M/2 + 1, N)
							 | 
						|||
| 
								 | 
							
								             integer*8 plan
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
							 | 
						|||
| 
								 | 
							
								             call dfftw_execute_dft_r2c(plan, in, out)
							 | 
						|||
| 
								 | 
							
								             call dfftw_destroy_plan(plan)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   *Important:* Notice that it is the _first_ dimension of the complex
							 | 
						|||
| 
								 | 
							
								output array that is cut in half in Fortran, rather than the last
							 | 
						|||
| 
								 | 
							
								dimension as in C. This is a consequence of the interface routines
							 | 
						|||
| 
								 | 
							
								reversing the order of the array dimensions passed to FFTW so that the
							 | 
						|||
| 
								 | 
							
								Fortran program can use its ordinary column-major order.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Wisdom of Fortran?,  Prev: Fortran Examples,  Up: Calling FFTW from Legacy Fortran
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								8.5 Wisdom of Fortran?
							 | 
						|||
| 
								 | 
							
								======================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this section, we discuss how one can import/export FFTW wisdom (saved
							 | 
						|||
| 
								 | 
							
								plans) to/from a Fortran program; we assume that the reader is already
							 | 
						|||
| 
								 | 
							
								familiar with wisdom, as described in *note Words of Wisdom-Saving
							 | 
						|||
| 
								 | 
							
								Plans::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The basic problem is that is difficult to (portably) pass files and
							 | 
						|||
| 
								 | 
							
								strings between Fortran and C, so we cannot provide a direct Fortran
							 | 
						|||
| 
								 | 
							
								equivalent to the 'fftw_export_wisdom_to_file', etcetera, functions.
							 | 
						|||
| 
								 | 
							
								Fortran interfaces _are_ provided for the functions that do not take
							 | 
						|||
| 
								 | 
							
								file/string arguments, however: 'dfftw_import_system_wisdom',
							 | 
						|||
| 
								 | 
							
								'dfftw_import_wisdom', 'dfftw_export_wisdom', and 'dfftw_forget_wisdom'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   So, for example, to import the system-wide wisdom, you would do:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								             integer isuccess
							 | 
						|||
| 
								 | 
							
								             call dfftw_import_system_wisdom(isuccess)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   As usual, the C return value is turned into a first parameter;
							 | 
						|||
| 
								 | 
							
								'isuccess' is non-zero on success and zero on failure (e.g.  if there is
							 | 
						|||
| 
								 | 
							
								no system wisdom installed).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you want to import/export wisdom from/to an arbitrary file or
							 | 
						|||
| 
								 | 
							
								elsewhere, you can employ the generic 'dfftw_import_wisdom' and
							 | 
						|||
| 
								 | 
							
								'dfftw_export_wisdom' functions, for which you must supply a subroutine
							 | 
						|||
| 
								 | 
							
								to read/write one character at a time.  The FFTW package contains an
							 | 
						|||
| 
								 | 
							
								example file 'doc/f77_wisdom.f' demonstrating how to implement
							 | 
						|||
| 
								 | 
							
								'import_wisdom_from_file' and 'export_wisdom_to_file' subroutines in
							 | 
						|||
| 
								 | 
							
								this way.  (These routines cannot be compiled into the FFTW library
							 | 
						|||
| 
								 | 
							
								itself, lest all FFTW-using programs be required to link with the
							 | 
						|||
| 
								 | 
							
								Fortran I/O library.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Upgrading from FFTW version 2,  Next: Installation and Customization,  Prev: Calling FFTW from Legacy Fortran,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								9 Upgrading from FFTW version 2
							 | 
						|||
| 
								 | 
							
								*******************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In this chapter, we outline the process for updating codes designed for
							 | 
						|||
| 
								 | 
							
								the older FFTW 2 interface to work with FFTW 3.  The interface for FFTW
							 | 
						|||
| 
								 | 
							
								3 is not backwards-compatible with the interface for FFTW 2 and earlier
							 | 
						|||
| 
								 | 
							
								versions; codes written to use those versions will fail to link with
							 | 
						|||
| 
								 | 
							
								FFTW 3.  Nor is it possible to write "compatibility wrappers" to bridge
							 | 
						|||
| 
								 | 
							
								the gap (at least not efficiently), because FFTW 3 has different
							 | 
						|||
| 
								 | 
							
								semantics from previous versions.  However, upgrading should be a
							 | 
						|||
| 
								 | 
							
								straightforward process because the data formats are identical and the
							 | 
						|||
| 
								 | 
							
								overall style of planning/execution is essentially the same.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Unlike FFTW 2, there are no separate header files for real and
							 | 
						|||
| 
								 | 
							
								complex transforms (or even for different precisions) in FFTW 3; all
							 | 
						|||
| 
								 | 
							
								interfaces are defined in the '<fftw3.h>' header file.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Numeric Types
							 | 
						|||
| 
								 | 
							
								=============
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The main difference in data types is that 'fftw_complex' in FFTW 2 was
							 | 
						|||
| 
								 | 
							
								defined as a 'struct' with macros 'c_re' and 'c_im' for accessing the
							 | 
						|||
| 
								 | 
							
								real/imaginary parts.  (This is binary-compatible with FFTW 3 on any
							 | 
						|||
| 
								 | 
							
								machine except perhaps for some older Crays in single precision.)  The
							 | 
						|||
| 
								 | 
							
								equivalent macros for FFTW 3 are:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     #define c_re(c) ((c)[0])
							 | 
						|||
| 
								 | 
							
								     #define c_im(c) ((c)[1])
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This does not work if you are using the C99 complex type, however,
							 | 
						|||
| 
								 | 
							
								unless you insert a 'double*' typecast into the above macros (*note
							 | 
						|||
| 
								 | 
							
								Complex numbers::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Also, FFTW 2 had an 'fftw_real' typedef that was an alias for
							 | 
						|||
| 
								 | 
							
								'double' (in double precision).  In FFTW 3 you should just use 'double'
							 | 
						|||
| 
								 | 
							
								(or whatever precision you are employing).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Plans
							 | 
						|||
| 
								 | 
							
								=====
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The major difference between FFTW 2 and FFTW 3 is in the
							 | 
						|||
| 
								 | 
							
								planning/execution division of labor.  In FFTW 2, plans were found for a
							 | 
						|||
| 
								 | 
							
								given transform size and type, and then could be applied to _any_ arrays
							 | 
						|||
| 
								 | 
							
								and for _any_ multiplicity/stride parameters.  In FFTW 3, you specify
							 | 
						|||
| 
								 | 
							
								the particular arrays, stride parameters, etcetera when creating the
							 | 
						|||
| 
								 | 
							
								plan, and the plan is then executed for _those_ arrays (unless the guru
							 | 
						|||
| 
								 | 
							
								interface is used) and _those_ parameters _only_.  (FFTW 2 had "specific
							 | 
						|||
| 
								 | 
							
								planner" routines that planned for a particular array and stride, but
							 | 
						|||
| 
								 | 
							
								the plan could still be used for other arrays and strides.)  That is,
							 | 
						|||
| 
								 | 
							
								much of the information that was formerly specified at execution time is
							 | 
						|||
| 
								 | 
							
								now specified at planning time.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Like FFTW 2's specific planner routines, the FFTW 3 planner
							 | 
						|||
| 
								 | 
							
								overwrites the input/output arrays unless you use 'FFTW_ESTIMATE'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW 2 had separate data types 'fftw_plan', 'fftwnd_plan',
							 | 
						|||
| 
								 | 
							
								'rfftw_plan', and 'rfftwnd_plan' for complex and real one- and
							 | 
						|||
| 
								 | 
							
								multi-dimensional transforms, and each type had its own 'destroy'
							 | 
						|||
| 
								 | 
							
								function.  In FFTW 3, all plans are of type 'fftw_plan' and all are
							 | 
						|||
| 
								 | 
							
								destroyed by 'fftw_destroy_plan(plan)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Where you formerly used 'fftw_create_plan' and 'fftw_one' to plan and
							 | 
						|||
| 
								 | 
							
								compute a single 1d transform, you would now use 'fftw_plan_dft_1d' to
							 | 
						|||
| 
								 | 
							
								plan the transform.  If you used the generic 'fftw' function to execute
							 | 
						|||
| 
								 | 
							
								the transform with multiplicity ('howmany') and stride parameters, you
							 | 
						|||
| 
								 | 
							
								would now use the advanced interface 'fftw_plan_many_dft' to specify
							 | 
						|||
| 
								 | 
							
								those parameters.  The plans are now executed with 'fftw_execute(plan)',
							 | 
						|||
| 
								 | 
							
								which takes all of its parameters (including the input/output arrays)
							 | 
						|||
| 
								 | 
							
								from the plan.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In-place transforms no longer interpret their output argument as
							 | 
						|||
| 
								 | 
							
								scratch space, nor is there an 'FFTW_IN_PLACE' flag.  You simply pass
							 | 
						|||
| 
								 | 
							
								the same pointer for both the input and output arguments.  (Previously,
							 | 
						|||
| 
								 | 
							
								the output 'ostride' and 'odist' parameters were ignored for in-place
							 | 
						|||
| 
								 | 
							
								transforms; now, if they are specified via the advanced interface, they
							 | 
						|||
| 
								 | 
							
								are significant even in the in-place case, although they should normally
							 | 
						|||
| 
								 | 
							
								equal the corresponding input parameters.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'FFTW_ESTIMATE' and 'FFTW_MEASURE' flags have the same meaning as
							 | 
						|||
| 
								 | 
							
								before, although the planning time will differ.  You may also consider
							 | 
						|||
| 
								 | 
							
								using 'FFTW_PATIENT', which is like 'FFTW_MEASURE' except that it takes
							 | 
						|||
| 
								 | 
							
								more time in order to consider a wider variety of algorithms.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   For multi-dimensional complex DFTs, instead of 'fftwnd_create_plan'
							 | 
						|||
| 
								 | 
							
								(or 'fftw2d_create_plan' or 'fftw3d_create_plan'), followed by
							 | 
						|||
| 
								 | 
							
								'fftwnd_one', you would use 'fftw_plan_dft' (or 'fftw_plan_dft_2d' or
							 | 
						|||
| 
								 | 
							
								'fftw_plan_dft_3d').  followed by 'fftw_execute'.  If you used 'fftwnd'
							 | 
						|||
| 
								 | 
							
								to to specify strides etcetera, you would instead specify these via
							 | 
						|||
| 
								 | 
							
								'fftw_plan_many_dft'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The analogues to 'rfftw_create_plan' and 'rfftw_one' with
							 | 
						|||
| 
								 | 
							
								'FFTW_REAL_TO_COMPLEX' or 'FFTW_COMPLEX_TO_REAL' directions are
							 | 
						|||
| 
								 | 
							
								'fftw_plan_r2r_1d' with kind 'FFTW_R2HC' or 'FFTW_HC2R', followed by
							 | 
						|||
| 
								 | 
							
								'fftw_execute'.  The stride etcetera arguments of 'rfftw' are now in
							 | 
						|||
| 
								 | 
							
								'fftw_plan_many_r2r'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Instead of 'rfftwnd_create_plan' (or 'rfftw2d_create_plan' or
							 | 
						|||
| 
								 | 
							
								'rfftw3d_create_plan') followed by 'rfftwnd_one_real_to_complex' or
							 | 
						|||
| 
								 | 
							
								'rfftwnd_one_complex_to_real', you now use 'fftw_plan_dft_r2c' (or
							 | 
						|||
| 
								 | 
							
								'fftw_plan_dft_r2c_2d' or 'fftw_plan_dft_r2c_3d') or 'fftw_plan_dft_c2r'
							 | 
						|||
| 
								 | 
							
								(or 'fftw_plan_dft_c2r_2d' or 'fftw_plan_dft_c2r_3d'), respectively,
							 | 
						|||
| 
								 | 
							
								followed by 'fftw_execute'.  As usual, the strides etcetera of
							 | 
						|||
| 
								 | 
							
								'rfftwnd_real_to_complex' or 'rfftwnd_complex_to_real' are no specified
							 | 
						|||
| 
								 | 
							
								in the advanced planner routines, 'fftw_plan_many_dft_r2c' or
							 | 
						|||
| 
								 | 
							
								'fftw_plan_many_dft_c2r'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Wisdom
							 | 
						|||
| 
								 | 
							
								======
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In FFTW 2, you had to supply the 'FFTW_USE_WISDOM' flag in order to use
							 | 
						|||
| 
								 | 
							
								wisdom; in FFTW 3, wisdom is always used.  (You could simulate the FFTW
							 | 
						|||
| 
								 | 
							
								2 wisdom-less behavior by calling 'fftw_forget_wisdom' after every
							 | 
						|||
| 
								 | 
							
								planner call.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The FFTW 3 wisdom import/export routines are almost the same as
							 | 
						|||
| 
								 | 
							
								before (although the storage format is entirely different).  There is
							 | 
						|||
| 
								 | 
							
								one significant difference, however.  In FFTW 2, the import routines
							 | 
						|||
| 
								 | 
							
								would never read past the end of the wisdom, so you could store extra
							 | 
						|||
| 
								 | 
							
								data beyond the wisdom in the same file, for example.  In FFTW 3, the
							 | 
						|||
| 
								 | 
							
								file-import routine may read up to a few hundred bytes past the end of
							 | 
						|||
| 
								 | 
							
								the wisdom, so you cannot store other data just beyond it.(1)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Wisdom has been enhanced by additional humility in FFTW 3: whereas
							 | 
						|||
| 
								 | 
							
								FFTW 2 would re-use wisdom for a given transform size regardless of the
							 | 
						|||
| 
								 | 
							
								stride etc., in FFTW 3 wisdom is only used with the strides etc.  for
							 | 
						|||
| 
								 | 
							
								which it was created.  Unfortunately, this means FFTW 3 has to create
							 | 
						|||
| 
								 | 
							
								new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g.
							 | 
						|||
| 
								 | 
							
								one transform of size 1024 also created wisdom for all smaller powers of
							 | 
						|||
| 
								 | 
							
								2, but this no longer occurs).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW 3 also has the new routine 'fftw_import_system_wisdom' to import
							 | 
						|||
| 
								 | 
							
								wisdom from a standard system-wide location.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Memory allocation
							 | 
						|||
| 
								 | 
							
								=================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In FFTW 3, we recommend allocating your arrays with 'fftw_malloc' and
							 | 
						|||
| 
								 | 
							
								deallocating them with 'fftw_free'; this is not required, but allows
							 | 
						|||
| 
								 | 
							
								optimal performance when SIMD acceleration is used.  (Those two
							 | 
						|||
| 
								 | 
							
								functions actually existed in FFTW 2, and worked the same way, but were
							 | 
						|||
| 
								 | 
							
								not documented.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In FFTW 2, there were 'fftw_malloc_hook' and 'fftw_free_hook'
							 | 
						|||
| 
								 | 
							
								functions that allowed the user to replace FFTW's memory-allocation
							 | 
						|||
| 
								 | 
							
								routines (e.g.  to implement different error-handling, since by default
							 | 
						|||
| 
								 | 
							
								FFTW prints an error message and calls 'exit' to abort the program if
							 | 
						|||
| 
								 | 
							
								'malloc' returns 'NULL').  These hooks are not supported in FFTW 3;
							 | 
						|||
| 
								 | 
							
								those few users who require this functionality can just directly modify
							 | 
						|||
| 
								 | 
							
								the memory-allocation routines in FFTW (they are defined in
							 | 
						|||
| 
								 | 
							
								'kernel/alloc.c').
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Fortran interface
							 | 
						|||
| 
								 | 
							
								=================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								In FFTW 2, the subroutine names were obtained by replacing 'fftw_' with
							 | 
						|||
| 
								 | 
							
								'fftw_f77'; in FFTW 3, you replace 'fftw_' with 'dfftw_' (or 'sfftw_' or
							 | 
						|||
| 
								 | 
							
								'lfftw_', depending upon the precision).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In FFTW 3, we have begun recommending that you always declare the
							 | 
						|||
| 
								 | 
							
								type used to store plans as 'integer*8'.  (Too many people didn't notice
							 | 
						|||
| 
								 | 
							
								our instruction to switch from 'integer' to 'integer*8' for 64-bit
							 | 
						|||
| 
								 | 
							
								machines.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In FFTW 3, we provide a 'fftw3.f' "header file" to include in your
							 | 
						|||
| 
								 | 
							
								code (and which is officially installed on Unix systems).  (In FFTW 2,
							 | 
						|||
| 
								 | 
							
								we supplied a 'fftw_f77.i' file, but it was not installed.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Otherwise, the C-Fortran interface relationship is much the same as
							 | 
						|||
| 
								 | 
							
								it was before (e.g.  return values become initial parameters, and
							 | 
						|||
| 
								 | 
							
								multi-dimensional arrays are in column-major order).  Unlike FFTW 2, we
							 | 
						|||
| 
								 | 
							
								do provide some support for wisdom import/export in Fortran (*note
							 | 
						|||
| 
								 | 
							
								Wisdom of Fortran?::).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Threads
							 | 
						|||
| 
								 | 
							
								=======
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Like FFTW 2, only the execution routines are thread-safe.  All planner
							 | 
						|||
| 
								 | 
							
								routines, etcetera, should be called by only a single thread at a time
							 | 
						|||
| 
								 | 
							
								(*note Thread safety::).  _Unlike_ FFTW 2, there is no special
							 | 
						|||
| 
								 | 
							
								'FFTW_THREADSAFE' flag for the planner to allow a given plan to be
							 | 
						|||
| 
								 | 
							
								usable by multiple threads in parallel; this is now the case by default.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The multi-threaded version of FFTW 2 required you to pass the number
							 | 
						|||
| 
								 | 
							
								of threads each time you execute the transform.  The number of threads
							 | 
						|||
| 
								 | 
							
								is now stored in the plan, and is specified before the planner is called
							 | 
						|||
| 
								 | 
							
								by 'fftw_plan_with_nthreads'.  The threads initialization routine used
							 | 
						|||
| 
								 | 
							
								to be called 'fftw_threads_init' and would return zero on success; the
							 | 
						|||
| 
								 | 
							
								new routine is called 'fftw_init_threads' and returns zero on failure.
							 | 
						|||
| 
								 | 
							
								The current number of threads used by the planner can be checked with
							 | 
						|||
| 
								 | 
							
								'fftw_planner_nthreads'.  *Note Multi-threaded FFTW::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   There is no separate threads header file in FFTW 3; all the function
							 | 
						|||
| 
								 | 
							
								prototypes are in '<fftw3.h>'.  However, you still have to link to a
							 | 
						|||
| 
								 | 
							
								separate library ('-lfftw3_threads -lfftw3 -lm' on Unix), as well as to
							 | 
						|||
| 
								 | 
							
								the threading library (e.g.  POSIX threads on Unix).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   ---------- Footnotes ----------
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   (1) We do our own buffering because GNU libc I/O routines are
							 | 
						|||
| 
								 | 
							
								horribly slow for single-character I/O, apparently for thread-safety
							 | 
						|||
| 
								 | 
							
								reasons (whether you are using threads or not).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Installation and Customization,  Next: Acknowledgments,  Prev: Upgrading from FFTW version 2,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								10 Installation and Customization
							 | 
						|||
| 
								 | 
							
								*********************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								This chapter describes the installation and customization of FFTW, the
							 | 
						|||
| 
								 | 
							
								latest version of which may be downloaded from the FFTW home page
							 | 
						|||
| 
								 | 
							
								(http://www.fftw.org).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In principle, FFTW should work on any system with an ANSI C compiler
							 | 
						|||
| 
								 | 
							
								('gcc' is fine).  However, planner time is drastically reduced if FFTW
							 | 
						|||
| 
								 | 
							
								can exploit a hardware cycle counter; FFTW comes with cycle-counter
							 | 
						|||
| 
								 | 
							
								support for all modern general-purpose CPUs, but you may need to add a
							 | 
						|||
| 
								 | 
							
								couple of lines of code if your compiler is not yet supported (*note
							 | 
						|||
| 
								 | 
							
								Cycle Counters::).  (On Unix, there will be a warning at the end of the
							 | 
						|||
| 
								 | 
							
								'configure' output if no cycle counter is found.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Installation of FFTW is simplest if you have a Unix or a GNU system,
							 | 
						|||
| 
								 | 
							
								such as GNU/Linux, and we describe this case in the first section below,
							 | 
						|||
| 
								 | 
							
								including the use of special configuration options to e.g.  install
							 | 
						|||
| 
								 | 
							
								different precisions or exploit optimizations for particular
							 | 
						|||
| 
								 | 
							
								architectures (e.g.  SIMD). Compilation on non-Unix systems is a more
							 | 
						|||
| 
								 | 
							
								manual process, but we outline the procedure in the second section.  It
							 | 
						|||
| 
								 | 
							
								is also likely that pre-compiled binaries will be available for popular
							 | 
						|||
| 
								 | 
							
								systems.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Finally, we describe how you can customize FFTW for particular needs
							 | 
						|||
| 
								 | 
							
								by generating _codelets_ for fast transforms of sizes not supported
							 | 
						|||
| 
								 | 
							
								efficiently by the standard FFTW distribution.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Menu:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								* Installation on Unix::
							 | 
						|||
| 
								 | 
							
								* Installation on non-Unix systems::
							 | 
						|||
| 
								 | 
							
								* Cycle Counters::
							 | 
						|||
| 
								 | 
							
								* Generating your own code::
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Installation on Unix,  Next: Installation on non-Unix systems,  Prev: Installation and Customization,  Up: Installation and Customization
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								10.1 Installation on Unix
							 | 
						|||
| 
								 | 
							
								=========================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW comes with a 'configure' program in the GNU style.  Installation
							 | 
						|||
| 
								 | 
							
								can be as simple as:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     ./configure
							 | 
						|||
| 
								 | 
							
								     make
							 | 
						|||
| 
								 | 
							
								     make install
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This will build the uniprocessor complex and real transform libraries
							 | 
						|||
| 
								 | 
							
								along with the test programs.  (We recommend that you use GNU 'make' if
							 | 
						|||
| 
								 | 
							
								it is available; on some systems it is called 'gmake'.)  The "'make
							 | 
						|||
| 
								 | 
							
								install'" command installs the fftw and rfftw libraries in standard
							 | 
						|||
| 
								 | 
							
								places, and typically requires root privileges (unless you specify a
							 | 
						|||
| 
								 | 
							
								different install directory with the '--prefix' flag to 'configure').
							 | 
						|||
| 
								 | 
							
								You can also type "'make check'" to put the FFTW test programs through
							 | 
						|||
| 
								 | 
							
								their paces.  If you have problems during configuration or compilation,
							 | 
						|||
| 
								 | 
							
								you may want to run "'make distclean'" before trying again; this ensures
							 | 
						|||
| 
								 | 
							
								that you don't have any stale files left over from previous compilation
							 | 
						|||
| 
								 | 
							
								attempts.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'configure' script chooses the 'gcc' compiler by default, if it
							 | 
						|||
| 
								 | 
							
								is available; you can select some other compiler with:
							 | 
						|||
| 
								 | 
							
								     ./configure CC="<the name of your C compiler>"
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'configure' script knows good 'CFLAGS' (C compiler flags) for a
							 | 
						|||
| 
								 | 
							
								few systems.  If your system is not known, the 'configure' script will
							 | 
						|||
| 
								 | 
							
								print out a warning.  In this case, you should re-configure FFTW with
							 | 
						|||
| 
								 | 
							
								the command
							 | 
						|||
| 
								 | 
							
								     ./configure CFLAGS="<write your CFLAGS here>"
							 | 
						|||
| 
								 | 
							
								   and then compile as usual.  If you do find an optimal set of 'CFLAGS'
							 | 
						|||
| 
								 | 
							
								for your system, please let us know what they are (along with the output
							 | 
						|||
| 
								 | 
							
								of 'config.guess') so that we can include them in future releases.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   'configure' supports all the standard flags defined by the GNU Coding
							 | 
						|||
| 
								 | 
							
								Standards; see the 'INSTALL' file in FFTW or the GNU web page
							 | 
						|||
| 
								 | 
							
								(http://www.gnu.org/prep/standards/html_node/index.html).  Note
							 | 
						|||
| 
								 | 
							
								especially '--help' to list all flags and '--enable-shared' to create
							 | 
						|||
| 
								 | 
							
								shared, rather than static, libraries.  'configure' also accepts a few
							 | 
						|||
| 
								 | 
							
								FFTW-specific flags, particularly:
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-float': Produces a single-precision version of FFTW
							 | 
						|||
| 
								 | 
							
								     ('float') instead of the default double-precision ('double').
							 | 
						|||
| 
								 | 
							
								     *Note Precision::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-long-double': Produces a long-double precision version of
							 | 
						|||
| 
								 | 
							
								     FFTW ('long double') instead of the default double-precision
							 | 
						|||
| 
								 | 
							
								     ('double').  The 'configure' script will halt with an error message
							 | 
						|||
| 
								 | 
							
								     if 'long double' is the same size as 'double' on your
							 | 
						|||
| 
								 | 
							
								     machine/compiler.  *Note Precision::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-quad-precision': Produces a quadruple-precision version
							 | 
						|||
| 
								 | 
							
								     of FFTW using the nonstandard '__float128' type provided by 'gcc'
							 | 
						|||
| 
								 | 
							
								     4.6 or later on x86, x86-64, and Itanium architectures, instead of
							 | 
						|||
| 
								 | 
							
								     the default double-precision ('double').  The 'configure' script
							 | 
						|||
| 
								 | 
							
								     will halt with an error message if the compiler is not 'gcc'
							 | 
						|||
| 
								 | 
							
								     version 4.6 or later or if 'gcc''s 'libquadmath' library is not
							 | 
						|||
| 
								 | 
							
								     installed.  *Note Precision::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-threads': Enables compilation and installation of the
							 | 
						|||
| 
								 | 
							
								     FFTW threads library (*note Multi-threaded FFTW::), which provides
							 | 
						|||
| 
								 | 
							
								     a simple interface to parallel transforms for SMP systems.  By
							 | 
						|||
| 
								 | 
							
								     default, the threads routines are not compiled.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-openmp': Like '--enable-threads', but using OpenMP
							 | 
						|||
| 
								 | 
							
								     compiler directives in order to induce parallelism rather than
							 | 
						|||
| 
								 | 
							
								     spawning its own threads directly, and installing an 'fftw3_omp'
							 | 
						|||
| 
								 | 
							
								     library rather than an 'fftw3_threads' library (*note
							 | 
						|||
| 
								 | 
							
								     Multi-threaded FFTW::).  You can use both '--enable-openmp' and
							 | 
						|||
| 
								 | 
							
								     '--enable-threads' since they compile/install libraries with
							 | 
						|||
| 
								 | 
							
								     different names.  By default, the OpenMP routines are not compiled.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--with-combined-threads': By default, if '--enable-threads' is
							 | 
						|||
| 
								 | 
							
								     used, the threads support is compiled into a separate library that
							 | 
						|||
| 
								 | 
							
								     must be linked in addition to the main FFTW library.  This is so
							 | 
						|||
| 
								 | 
							
								     that users of the serial library do not need to link the system
							 | 
						|||
| 
								 | 
							
								     threads libraries.  If '--with-combined-threads' is specified,
							 | 
						|||
| 
								 | 
							
								     however, then no separate threads library is created, and threads
							 | 
						|||
| 
								 | 
							
								     are included in the main FFTW library.  This is mainly useful under
							 | 
						|||
| 
								 | 
							
								     Windows, where no system threads library is required and
							 | 
						|||
| 
								 | 
							
								     inter-library dependencies are problematic.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-mpi': Enables compilation and installation of the FFTW
							 | 
						|||
| 
								 | 
							
								     MPI library (*note Distributed-memory FFTW with MPI::), which
							 | 
						|||
| 
								 | 
							
								     provides parallel transforms for distributed-memory systems with
							 | 
						|||
| 
								 | 
							
								     MPI. (By default, the MPI routines are not compiled.)  *Note FFTW
							 | 
						|||
| 
								 | 
							
								     MPI Installation::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--disable-fortran': Disables inclusion of legacy-Fortran wrapper
							 | 
						|||
| 
								 | 
							
								     routines (*note Calling FFTW from Legacy Fortran::) in the standard
							 | 
						|||
| 
								 | 
							
								     FFTW libraries.  These wrapper routines increase the library size
							 | 
						|||
| 
								 | 
							
								     by only a negligible amount, so they are included by default as
							 | 
						|||
| 
								 | 
							
								     long as the 'configure' script finds a Fortran compiler on your
							 | 
						|||
| 
								 | 
							
								     system.  (To specify a particular Fortran compiler foo, pass
							 | 
						|||
| 
								 | 
							
								     'F77='foo to 'configure'.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--with-g77-wrappers': By default, when Fortran wrappers are
							 | 
						|||
| 
								 | 
							
								     included, the wrappers employ the linking conventions of the
							 | 
						|||
| 
								 | 
							
								     Fortran compiler detected by the 'configure' script.  If this
							 | 
						|||
| 
								 | 
							
								     compiler is GNU 'g77', however, then _two_ versions of the wrappers
							 | 
						|||
| 
								 | 
							
								     are included: one with 'g77''s idiosyncratic convention of
							 | 
						|||
| 
								 | 
							
								     appending two underscores to identifiers, and one with the more
							 | 
						|||
| 
								 | 
							
								     common convention of appending only a single underscore.  This way,
							 | 
						|||
| 
								 | 
							
								     the same FFTW library will work with both 'g77' and other Fortran
							 | 
						|||
| 
								 | 
							
								     compilers, such as GNU 'gfortran'.  However, the converse is not
							 | 
						|||
| 
								 | 
							
								     true: if you configure with a different compiler, then the
							 | 
						|||
| 
								 | 
							
								     'g77'-compatible wrappers are not included.  By specifying
							 | 
						|||
| 
								 | 
							
								     '--with-g77-wrappers', the 'g77'-compatible wrappers are included
							 | 
						|||
| 
								 | 
							
								     in addition to wrappers for whatever Fortran compiler 'configure'
							 | 
						|||
| 
								 | 
							
								     finds.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--with-slow-timer': Disables the use of hardware cycle counters,
							 | 
						|||
| 
								 | 
							
								     and falls back on 'gettimeofday' or 'clock'.  This greatly worsens
							 | 
						|||
| 
								 | 
							
								     performance, and should generally not be used (unless you don't
							 | 
						|||
| 
								 | 
							
								     have a cycle counter but still really want an optimized plan
							 | 
						|||
| 
								 | 
							
								     regardless of the time).  *Note Cycle Counters::.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   * '--enable-sse' (single precision), '--enable-sse2' (single,
							 | 
						|||
| 
								 | 
							
								     double), '--enable-avx' (single, double), '--enable-avx2' (single,
							 | 
						|||
| 
								 | 
							
								     double), '--enable-avx512' (single, double),
							 | 
						|||
| 
								 | 
							
								     '--enable-avx-128-fma', '--enable-kcvi' (single),
							 | 
						|||
| 
								 | 
							
								     '--enable-altivec' (single), '--enable-vsx' (single, double),
							 | 
						|||
| 
								 | 
							
								     '--enable-neon' (single, double on aarch64),
							 | 
						|||
| 
								 | 
							
								     '--enable-generic-simd128', and '--enable-generic-simd256':
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								     Enable various SIMD instruction sets.  You need compiler that
							 | 
						|||
| 
								 | 
							
								     supports the given SIMD extensions, but FFTW will try to detect at
							 | 
						|||
| 
								 | 
							
								     runtime whether the CPU supports these extensions.  That is, you
							 | 
						|||
| 
								 | 
							
								     can compile with'--enable-avx' and the code will still run on a CPU
							 | 
						|||
| 
								 | 
							
								     without AVX support.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								        - These options require a compiler supporting SIMD extensions,
							 | 
						|||
| 
								 | 
							
								          and compiler support is always a bit flaky: see the FFTW FAQ
							 | 
						|||
| 
								 | 
							
								          for a list of compiler versions that have problems compiling
							 | 
						|||
| 
								 | 
							
								          FFTW.
							 | 
						|||
| 
								 | 
							
								        - Because of the large variety of ARM processors and ABIs, FFTW
							 | 
						|||
| 
								 | 
							
								          does not attempt to guess the correct 'gcc' flags for
							 | 
						|||
| 
								 | 
							
								          generating NEON code.  In general, you will have to provide
							 | 
						|||
| 
								 | 
							
								          them on the command line.  This command line is known to have
							 | 
						|||
| 
								 | 
							
								          worked at least once:
							 | 
						|||
| 
								 | 
							
								               ./configure --with-slow-timer --host=arm-linux-gnueabi \
							 | 
						|||
| 
								 | 
							
								                 --enable-single --enable-neon \
							 | 
						|||
| 
								 | 
							
								                 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To force 'configure' to use a particular C compiler foo (instead of
							 | 
						|||
| 
								 | 
							
								the default, usually 'gcc'), pass 'CC='foo to the 'configure' script;
							 | 
						|||
| 
								 | 
							
								you may also need to set the flags via the variable 'CFLAGS' as
							 | 
						|||
| 
								 | 
							
								described above.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Installation on non-Unix systems,  Next: Cycle Counters,  Prev: Installation on Unix,  Up: Installation and Customization
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								10.2 Installation on non-Unix systems
							 | 
						|||
| 
								 | 
							
								=====================================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								It should be relatively straightforward to compile FFTW even on non-Unix
							 | 
						|||
| 
								 | 
							
								systems lacking the niceties of a 'configure' script.  Basically, you
							 | 
						|||
| 
								 | 
							
								need to edit the 'config.h' header (copy it from 'config.h.in') to
							 | 
						|||
| 
								 | 
							
								'#define' the various options and compiler characteristics, and then
							 | 
						|||
| 
								 | 
							
								compile all the '.c' files in the relevant directories.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'config.h' header contains about 100 options to set, each one
							 | 
						|||
| 
								 | 
							
								initially an '#undef', each documented with a comment, and most of them
							 | 
						|||
| 
								 | 
							
								fairly obvious.  For most of the options, you should simply '#define'
							 | 
						|||
| 
								 | 
							
								them to '1' if they are applicable, although a few options require a
							 | 
						|||
| 
								 | 
							
								particular value (e.g.  'SIZEOF_LONG_LONG' should be defined to the size
							 | 
						|||
| 
								 | 
							
								of the 'long long' type, in bytes, or zero if it is not supported).  We
							 | 
						|||
| 
								 | 
							
								will likely post some sample 'config.h' files for various operating
							 | 
						|||
| 
								 | 
							
								systems and compilers for you to use (at least as a starting point).
							 | 
						|||
| 
								 | 
							
								Please let us know if you have to hand-create a configuration file
							 | 
						|||
| 
								 | 
							
								(and/or a pre-compiled binary) that you want to share.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To create the FFTW library, you will then need to compile all of the
							 | 
						|||
| 
								 | 
							
								'.c' files in the 'kernel', 'dft', 'dft/scalar', 'dft/scalar/codelets',
							 | 
						|||
| 
								 | 
							
								'rdft', 'rdft/scalar', 'rdft/scalar/r2cf', 'rdft/scalar/r2cb',
							 | 
						|||
| 
								 | 
							
								'rdft/scalar/r2r', 'reodft', and 'api' directories.  If you are
							 | 
						|||
| 
								 | 
							
								compiling with SIMD support (e.g.  you defined 'HAVE_SSE2' in
							 | 
						|||
| 
								 | 
							
								'config.h'), then you also need to compile the '.c' files in the
							 | 
						|||
| 
								 | 
							
								'simd-support', '{dft,rdft}/simd', '{dft,rdft}/simd/*' directories.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Once these files are all compiled, link them into a library, or a
							 | 
						|||
| 
								 | 
							
								shared library, or directly into your program.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   To compile the FFTW test program, additionally compile the code in
							 | 
						|||
| 
								 | 
							
								the 'libbench2/' directory, and link it into a library.  Then compile
							 | 
						|||
| 
								 | 
							
								the code in the 'tests/' directory and link it to the 'libbench2' and
							 | 
						|||
| 
								 | 
							
								FFTW libraries.  To compile the 'fftw-wisdom' (command-line) tool (*note
							 | 
						|||
| 
								 | 
							
								Wisdom Utilities::), compile 'tools/fftw-wisdom.c' and link it to the
							 | 
						|||
| 
								 | 
							
								'libbench2' and FFTW libraries
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Cycle Counters,  Next: Generating your own code,  Prev: Installation on non-Unix systems,  Up: Installation and Customization
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								10.3 Cycle Counters
							 | 
						|||
| 
								 | 
							
								===================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW's planner actually executes and times different possible FFT
							 | 
						|||
| 
								 | 
							
								algorithms in order to pick the fastest plan for a given n.  In order to
							 | 
						|||
| 
								 | 
							
								do this in as short a time as possible, however, the timer must have a
							 | 
						|||
| 
								 | 
							
								very high resolution, and to accomplish this we employ the hardware
							 | 
						|||
| 
								 | 
							
								"cycle counters" that are available on most CPUs.  Currently, FFTW
							 | 
						|||
| 
								 | 
							
								supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC
							 | 
						|||
| 
								 | 
							
								(SPARC v9), IA64, PA-RISC, and MIPS processors.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Access to the cycle counters, unfortunately, is a compiler and/or
							 | 
						|||
| 
								 | 
							
								operating-system dependent task, often requiring inline assembly
							 | 
						|||
| 
								 | 
							
								language, and it may be that your compiler is not supported.  If you are
							 | 
						|||
| 
								 | 
							
								_not_ supported, FFTW will by default fall back on its estimator
							 | 
						|||
| 
								 | 
							
								(effectively using 'FFTW_ESTIMATE' for all plans).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You can add support by editing the file 'kernel/cycle.h'; normally,
							 | 
						|||
| 
								 | 
							
								this will involve adapting one of the examples already present in order
							 | 
						|||
| 
								 | 
							
								to use the inline-assembler syntax for your C compiler, and will only
							 | 
						|||
| 
								 | 
							
								require a couple of lines of code.  Anyone adding support for a new
							 | 
						|||
| 
								 | 
							
								system to 'cycle.h' is encouraged to email us at <fftw@fftw.org>.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If a cycle counter is not available on your system (e.g.  some
							 | 
						|||
| 
								 | 
							
								embedded processor), and you don't want to use estimated plans, as a
							 | 
						|||
| 
								 | 
							
								last resort you can use the '--with-slow-timer' option to 'configure'
							 | 
						|||
| 
								 | 
							
								(on Unix) or '#define WITH_SLOW_TIMER' in 'config.h' (elsewhere).  This
							 | 
						|||
| 
								 | 
							
								will use the much lower-resolution 'gettimeofday' function, or even
							 | 
						|||
| 
								 | 
							
								'clock' if the former is unavailable, and planning will be extremely
							 | 
						|||
| 
								 | 
							
								slow.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Generating your own code,  Prev: Cycle Counters,  Up: Installation and Customization
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								10.4 Generating your own code
							 | 
						|||
| 
								 | 
							
								=============================
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								The directory 'genfft' contains the programs that were used to generate
							 | 
						|||
| 
								 | 
							
								FFTW's "codelets," which are hard-coded transforms of small sizes.  We
							 | 
						|||
| 
								 | 
							
								do not expect casual users to employ the generator, which is a rather
							 | 
						|||
| 
								 | 
							
								sophisticated program that generates directed acyclic graphs of FFT
							 | 
						|||
| 
								 | 
							
								algorithms and performs algebraic simplifications on them.  It was
							 | 
						|||
| 
								 | 
							
								written in Objective Caml, a dialect of ML, which is available at
							 | 
						|||
| 
								 | 
							
								<http://caml.inria.fr/ocaml/index.en.html>.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   If you have Objective Caml installed (along with recent versions of
							 | 
						|||
| 
								 | 
							
								GNU 'autoconf', 'automake', and 'libtool'), then you can change the set
							 | 
						|||
| 
								 | 
							
								of codelets that are generated or play with the generation options.  The
							 | 
						|||
| 
								 | 
							
								set of generated codelets is specified by the
							 | 
						|||
| 
								 | 
							
								'{dft,rdft}/{codelets,simd}/*/Makefile.am' files.  For example, you can
							 | 
						|||
| 
								 | 
							
								add efficient REDFT codelets of small sizes by modifying
							 | 
						|||
| 
								 | 
							
								'rdft/codelets/r2r/Makefile.am'.  After you modify any 'Makefile.am'
							 | 
						|||
| 
								 | 
							
								files, you can type 'sh bootstrap.sh' in the top-level directory
							 | 
						|||
| 
								 | 
							
								followed by 'make' to re-generate the files.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We do not provide more details about the code-generation process,
							 | 
						|||
| 
								 | 
							
								since we do not expect that most users will need to generate their own
							 | 
						|||
| 
								 | 
							
								code.  However, feel free to contact us at <fftw@fftw.org> if you are
							 | 
						|||
| 
								 | 
							
								interested in the subject.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You might find it interesting to learn Caml and/or some modern
							 | 
						|||
| 
								 | 
							
								programming techniques that we used in the generator (including monadic
							 | 
						|||
| 
								 | 
							
								programming), especially if you heard the rumor that Java and
							 | 
						|||
| 
								 | 
							
								object-oriented programming are the latest advancement in the field.
							 | 
						|||
| 
								 | 
							
								The internal operation of the codelet generator is described in the
							 | 
						|||
| 
								 | 
							
								paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
							 | 
						|||
| 
								 | 
							
								available from the FFTW home page (http://www.fftw.org) and also
							 | 
						|||
| 
								 | 
							
								appeared in the 'Proceedings of the 1999 ACM SIGPLAN Conference on
							 | 
						|||
| 
								 | 
							
								Programming Language Design and Implementation (PLDI)'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: Acknowledgments,  Next: License and Copyright,  Prev: Installation and Customization,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								11 Acknowledgments
							 | 
						|||
| 
								 | 
							
								******************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								Matteo Frigo was supported in part by the Special Research Program SFB
							 | 
						|||
| 
								 | 
							
								F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln
							 | 
						|||
| 
								 | 
							
								Laboratory.  For previous versions of FFTW, he was supported in part by
							 | 
						|||
| 
								 | 
							
								the Defense Advanced Research Projects Agency (DARPA), under Grants
							 | 
						|||
| 
								 | 
							
								N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
							 | 
						|||
| 
								 | 
							
								Corporation Fellowship.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Steven G. Johnson was supported in part by a Dept. of Defense NDSEG
							 | 
						|||
| 
								 | 
							
								Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
							 | 
						|||
| 
								 | 
							
								Research Science and Engineering Center program of the National Science
							 | 
						|||
| 
								 | 
							
								Foundation under award DMR-9400334.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Code for the Cell Broadband Engine was graciously donated to the FFTW
							 | 
						|||
| 
								 | 
							
								project by the IBM Austin Research Lab and included in fftw-3.2.  (This
							 | 
						|||
| 
								 | 
							
								code was removed in fftw-3.3.)
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Code for the MIPS paired-single SIMD support was graciously donated
							 | 
						|||
| 
								 | 
							
								to the FFTW project by CodeSourcery, Inc.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We are grateful to Sun Microsystems Inc. for its donation of a
							 | 
						|||
| 
								 | 
							
								cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak).  These
							 | 
						|||
| 
								 | 
							
								machines served as the primary platform for the development of early
							 | 
						|||
| 
								 | 
							
								versions of FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We thank Intel Corporation for donating a four-processor Pentium Pro
							 | 
						|||
| 
								 | 
							
								machine.  We thank the GNU/Linux community for giving us a decent OS to
							 | 
						|||
| 
								 | 
							
								run on that machine.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We are thankful to the AMD corporation for donating an AMD Athlon XP
							 | 
						|||
| 
								 | 
							
								1700+ computer to the FFTW project.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   We thank the Compaq/HP testdrive program and VA Software Corporation
							 | 
						|||
| 
								 | 
							
								(SourceForge.net) for providing remote access to machines that were used
							 | 
						|||
| 
								 | 
							
								to test FFTW.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The 'genfft' suite of code generators was written using Objective
							 | 
						|||
| 
								 | 
							
								Caml, a dialect of ML. Objective Caml is a small and elegant language
							 | 
						|||
| 
								 | 
							
								developed by Xavier Leroy.  The implementation is available from
							 | 
						|||
| 
								 | 
							
								'http://caml.inria.fr/' (http://caml.inria.fr/).  In previous releases
							 | 
						|||
| 
								 | 
							
								of FFTW, 'genfft' was written in Caml Light, by the same authors.  An
							 | 
						|||
| 
								 | 
							
								even earlier implementation of 'genfft' was written in Scheme, but Caml
							 | 
						|||
| 
								 | 
							
								is definitely better for this kind of application.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW uses many tools from the GNU project, including 'automake',
							 | 
						|||
| 
								 | 
							
								'texinfo', and 'libtool'.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Prof. Charles E. Leiserson of MIT provided continuous support and
							 | 
						|||
| 
								 | 
							
								encouragement.  This program would not exist without him.  Charles also
							 | 
						|||
| 
								 | 
							
								proposed the name "codelets" for the basic FFT blocks.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance
							 | 
						|||
| 
								 | 
							
								of Steven's "extra-curricular" computer-science activities, as well as
							 | 
						|||
| 
								 | 
							
								remarkable creativity in working them into his grant proposals.
							 | 
						|||
| 
								 | 
							
								Steven's physics degree would not exist without him.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
							 | 
						|||
| 
								 | 
							
								led to the SIMD support in FFTW 3.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Stefan Kral wrote most of the K7 code generator distributed with FFTW
							 | 
						|||
| 
								 | 
							
								3.0.x and 3.1.x.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Andrew Sterian contributed the Windows timing code in FFTW 2.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Didier Miras reported a bug in the test procedure used in FFTW 1.2.
							 | 
						|||
| 
								 | 
							
								We now use a completely different test algorithm by Funda Ergun that
							 | 
						|||
| 
								 | 
							
								does not require a separate FFT program to compare against.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
							 | 
						|||
| 
								 | 
							
								that help portability.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Ming-Chang Liu uncovered a well-hidden bug in the complex transforms
							 | 
						|||
| 
								 | 
							
								of FFTW 2.0 and supplied a patch to correct it.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   The FFTW FAQ was written in 'bfnn' (Bizarre Format With No Name) and
							 | 
						|||
| 
								 | 
							
								formatted using the tools developed by Ian Jackson for the Linux FAQ.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   _We are especially thankful to all of our users for their continuing
							 | 
						|||
| 
								 | 
							
								support, feedback, and interest during our development of FFTW._
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								File: fftw3.info,  Node: License and Copyright,  Next: Concept Index,  Prev: Acknowledgments,  Up: Top
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								12 License and Copyright
							 | 
						|||
| 
								 | 
							
								************************
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003,
							 | 
						|||
| 
								 | 
							
								2007-11 Massachusetts Institute of Technology.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   FFTW is free software; you can redistribute it and/or modify it under
							 | 
						|||
| 
								 | 
							
								the terms of the GNU General Public License as published by the Free
							 | 
						|||
| 
								 | 
							
								Software Foundation; either version 2 of the License, or (at your
							 | 
						|||
| 
								 | 
							
								option) any later version.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   This program is distributed in the hope that it will be useful, but
							 | 
						|||
| 
								 | 
							
								WITHOUT ANY WARRANTY; without even the implied warranty of
							 | 
						|||
| 
								 | 
							
								MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
							 | 
						|||
| 
								 | 
							
								Public License for more details.
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   You should have received a copy of the GNU General Public License
							 | 
						|||
| 
								 | 
							
								along with this program; if not, write to the Free Software Foundation,
							 | 
						|||
| 
								 | 
							
								Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You can
							 | 
						|||
| 
								 | 
							
								also find the GPL on the GNU web site
							 | 
						|||
| 
								 | 
							
								(http://www.gnu.org/licenses/gpl-2.0.html).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   In addition, we kindly ask you to acknowledge FFTW and its authors in
							 | 
						|||
| 
								 | 
							
								any program or publication in which you use FFTW. (You are not
							 | 
						|||
| 
								 | 
							
								_required_ to do so; it is up to your common sense to decide whether you
							 | 
						|||
| 
								 | 
							
								want to comply with this request or not.)  For general publications, we
							 | 
						|||
| 
								 | 
							
								suggest referencing: Matteo Frigo and Steven G. Johnson, "The design and
							 | 
						|||
| 
								 | 
							
								implementation of FFTW3," Proc.  IEEE 93 (2), 216-231 (2005).
							 | 
						|||
| 
								 | 
							
								
							 | 
						|||
| 
								 | 
							
								   Non-free versions of FFTW are available under terms different from
							 | 
						|||
| 
								 | 
							
								those of the General Public License.  (e.g.  they do not require you to
							 | 
						|||
| 
								 | 
							
								accompany any object code using FFTW with the corresponding source
							 | 
						|||
| 
								 | 
							
								code.)  For these alternative terms you must purchase a license from
							 | 
						|||
| 
								 | 
							
								MIT's Technology Licensing Office.  Users interested in such a license
							 | 
						|||
| 
								 | 
							
								should contact us (<fftw@fftw.org>) for more information.
							 | 
						|||
| 
								 | 
							
								
							 |