400 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			400 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| @node Other Important Topics, FFTW Reference, Tutorial, Top
 | |
| @chapter Other Important Topics
 | |
| @menu
 | |
| * SIMD alignment and fftw_malloc::
 | |
| * Multi-dimensional Array Format::
 | |
| * Words of Wisdom-Saving Plans::
 | |
| * Caveats in Using Wisdom::
 | |
| @end menu
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics
 | |
| @section SIMD alignment and fftw_malloc
 | |
| 
 | |
| SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of
 | |
| special operations supported by some processors to perform a single
 | |
| operation on several numbers (usually 2 or 4) simultaneously.  SIMD
 | |
| floating-point instructions are available on several popular CPUs:
 | |
| SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
 | |
| VSX on some POWER/PowerPCs, NEON on some ARM models.  FFTW can be
 | |
| compiled to support the SIMD instructions on any of these systems.
 | |
| @cindex SIMD
 | |
| @cindex SSE
 | |
| @cindex SSE2
 | |
| @cindex AVX
 | |
| @cindex AVX2
 | |
| @cindex AVX512
 | |
| @cindex AltiVec
 | |
| @cindex VSX
 | |
| @cindex precision
 | |
| 
 | |
| 
 | |
| A program linking to an FFTW library compiled with SIMD support can
 | |
| obtain a nonnegligible speedup for most complex and r2c/c2r
 | |
| transforms.  In order to obtain this speedup, however, the arrays of
 | |
| complex (or real) data passed to FFTW must be specially aligned in
 | |
| memory (typically 16-byte aligned), and often this alignment is more
 | |
| stringent than that provided by the usual @code{malloc} (etc.)
 | |
| allocation routines.
 | |
| 
 | |
| @cindex portability
 | |
| In order to guarantee proper alignment for SIMD, therefore, in case
 | |
| your program is ever linked against a SIMD-using FFTW, we recommend
 | |
| allocating your transform data with @code{fftw_malloc} and
 | |
| de-allocating it with @code{fftw_free}.
 | |
| @findex fftw_malloc
 | |
| @findex fftw_free
 | |
| These have exactly the same interface and behavior as
 | |
| @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure
 | |
| that the returned pointer has the necessary alignment (by calling
 | |
| @code{memalign} or its equivalent on your OS).
 | |
| 
 | |
| You are not @emph{required} to use @code{fftw_malloc}.  You can
 | |
| allocate your data in any way that you like, from @code{malloc} to
 | |
| @code{new} (in C++) to a fixed-size array declaration.  If the array
 | |
| happens not to be properly aligned, FFTW will not use the SIMD
 | |
| extensions.
 | |
| @cindex C++
 | |
| 
 | |
| @findex fftw_alloc_real
 | |
| @findex fftw_alloc_complex
 | |
| Since @code{fftw_malloc} only ever needs to be used for real and
 | |
| complex arrays, we provide two convenient wrapper routines
 | |
| @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are
 | |
| equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and
 | |
| @code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)},
 | |
| respectively (or their equivalents in other precisions).
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics
 | |
| @section Multi-dimensional Array Format
 | |
| 
 | |
| This section describes the format in which multi-dimensional arrays
 | |
| are stored in FFTW.  We felt that a detailed discussion of this topic
 | |
| was necessary.  Since several different formats are common, this topic
 | |
| is often a source of confusion.
 | |
| 
 | |
| @menu
 | |
| * Row-major Format::
 | |
| * Column-major Format::
 | |
| * Fixed-size Arrays in C::
 | |
| * Dynamic Arrays in C::
 | |
| * Dynamic Arrays in C-The Wrong Way::
 | |
| @end menu
 | |
| 
 | |
| @c =========>
 | |
| @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
 | |
| @subsection Row-major Format
 | |
| @cindex row-major
 | |
| 
 | |
| The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera
 | |
| are expected to be stored as a single contiguous block in
 | |
| @dfn{row-major} order (sometimes called ``C order'').  Basically, this
 | |
| means that as you step through adjacent memory locations, the first
 | |
| dimension's index varies most slowly and the last dimension's index
 | |
| varies most quickly.
 | |
| 
 | |
| To be more explicit, let us consider an array of rank @math{d} whose
 | |
| dimensions are @ndims{}. Now, we specify a location in the array by a
 | |
| sequence of @math{d} (zero-based) indices, one for each dimension:
 | |
| @tex
 | |
| $(i_0, i_1, i_2, \ldots, i_{d-1})$.
 | |
| @end tex
 | |
| @ifinfo
 | |
| (i[0], i[1], ..., i[d-1]).
 | |
| @end ifinfo
 | |
| @html
 | |
| (i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>).
 | |
| @end html
 | |
| If the array is stored in row-major
 | |
| order, then this element is located at the position
 | |
| @tex
 | |
| $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$.
 | |
| @end tex
 | |
| @ifinfo
 | |
| i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
 | |
| @end ifinfo
 | |
| @html
 | |
| i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)).
 | |
| @end html
 | |
| 
 | |
| Note that, for the ordinary complex DFT, each element of the array
 | |
| must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of
 | |
| (double-precision) numbers. 
 | |
| 
 | |
| In the advanced FFTW interface, the physical dimensions @math{n} from
 | |
| which the indices are computed can be different from (larger than)
 | |
| the logical dimensions of the transform to be computed, in order to
 | |
| transform a subset of a larger array.
 | |
| @cindex advanced interface
 | |
| Note also that, in the advanced interface, the expression above is
 | |
| multiplied by a @dfn{stride} to get the actual array index---this is
 | |
| useful in situations where each element of the multi-dimensional array
 | |
| is actually a data structure (or another array), and you just want to
 | |
| transform a single field. In the basic interface, however, the stride
 | |
| is 1.
 | |
| @cindex stride
 | |
| 
 | |
| @c =========>
 | |
| @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format
 | |
| @subsection Column-major Format
 | |
| @cindex column-major
 | |
| 
 | |
| Readers from the Fortran world are used to arrays stored in
 | |
| @dfn{column-major} order (sometimes called ``Fortran order'').  This is
 | |
| essentially the exact opposite of row-major order in that, here, the
 | |
| @emph{first} dimension's index varies most quickly.
 | |
| 
 | |
| If you have an array stored in column-major order and wish to
 | |
| transform it using FFTW, it is quite easy to do.  When creating the
 | |
| plan, simply pass the dimensions of the array to the planner in
 | |
| @emph{reverse order}.  For example, if your array is a rank three
 | |
| @code{N x M x L} matrix in column-major order, you should pass the
 | |
| dimensions of the array as if it were an @code{L x M x N} matrix
 | |
| (which it is, from the perspective of FFTW).  This is done for you
 | |
| @emph{automatically} by the FFTW legacy-Fortran interface
 | |
| (@pxref{Calling FFTW from Legacy Fortran}), but you must do it
 | |
| manually with the modern Fortran interface (@pxref{Reversing array
 | |
| dimensions}).
 | |
| @cindex Fortran interface
 | |
| 
 | |
| @c =========>
 | |
| @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
 | |
| @subsection Fixed-size Arrays in C
 | |
| @cindex C multi-dimensional arrays
 | |
| 
 | |
| A multi-dimensional array whose size is declared at compile time in C
 | |
| is @emph{already} in row-major order.  You don't have to do anything
 | |
| special to transform it.  For example:
 | |
| 
 | |
| @example
 | |
| @{
 | |
|      fftw_complex data[N0][N1][N2];
 | |
|      fftw_plan plan;
 | |
|      ...
 | |
|      plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
 | |
|                              FFTW_FORWARD, FFTW_ESTIMATE);
 | |
|      ...
 | |
| @}
 | |
| @end example
 | |
| 
 | |
| This will plan a 3d in-place transform of size @code{N0 x N1 x N2}.
 | |
| Notice how we took the address of the zero-th element to pass to the
 | |
| planner (we could also have used a typecast).
 | |
| 
 | |
| However, we tend to @emph{discourage} users from declaring their
 | |
| arrays in this way, for two reasons.  First, this allocates the array
 | |
| on the stack (``automatic'' storage), which has a very limited size on
 | |
| most operating systems (declaring an array with more than a few
 | |
| thousand elements will often cause a crash).  (You can get around this
 | |
| limitation on many systems by declaring the array as
 | |
| @code{static} and/or global, but that has its own drawbacks.)
 | |
| Second, it may not optimally align the array for use with a SIMD
 | |
| FFTW (@pxref{SIMD alignment and fftw_malloc}).  Instead, we recommend
 | |
| using @code{fftw_malloc}, as described below.
 | |
| 
 | |
| @c =========>
 | |
| @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format
 | |
| @subsection Dynamic Arrays in C
 | |
| 
 | |
| We recommend allocating most arrays dynamically, with
 | |
| @code{fftw_malloc}.  This isn't too hard to do, although it is not as
 | |
| straightforward for multi-dimensional arrays as it is for
 | |
| one-dimensional arrays.
 | |
| 
 | |
| Creating the array is simple: using a dynamic-allocation routine like
 | |
| @code{fftw_malloc}, allocate an array big enough to store N
 | |
| @code{fftw_complex} values (for a complex DFT), where N is the product
 | |
| of the sizes of the array dimensions (i.e. the total number of complex
 | |
| values in the array).  For example, here is code to allocate a
 | |
| @threedims{5,12,27} rank-3 array:
 | |
| @findex fftw_malloc
 | |
| 
 | |
| @example
 | |
| fftw_complex *an_array;
 | |
| an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
 | |
| @end example
 | |
| 
 | |
| Accessing the array elements, however, is more tricky---you can't
 | |
| simply use multiple applications of the @samp{[]} operator like you
 | |
| could for fixed-size arrays.  Instead, you have to explicitly compute
 | |
| the offset into the array using the formula given earlier for
 | |
| row-major arrays.  For example, to reference the @math{(i,j,k)}-th
 | |
| element of the array allocated above, you would use the expression
 | |
| @code{an_array[k + 27 * (j + 12 * i)]}.
 | |
| 
 | |
| This pain can be alleviated somewhat by defining appropriate macros,
 | |
| or, in C++, creating a class and overloading the @samp{()} operator.
 | |
| The recent C99 standard provides a way to reinterpret the dynamic
 | |
| array as a ``variable-length'' multi-dimensional array amenable to
 | |
| @samp{[]}, but this feature is not yet widely supported by compilers.
 | |
| @cindex C99
 | |
| @cindex C++
 | |
| 
 | |
| @c =========>
 | |
| @node Dynamic Arrays in C-The Wrong Way,  , Dynamic Arrays in C, Multi-dimensional Array Format
 | |
| @subsection Dynamic Arrays in C---The Wrong Way
 | |
| 
 | |
| A different method for allocating multi-dimensional arrays in C is
 | |
| often suggested that is incompatible with FFTW: @emph{using it will
 | |
| cause FFTW to die a painful death}.  We discuss the technique here,
 | |
| however, because it is so commonly known and used.  This method is to
 | |
| create arrays of pointers of arrays of pointers of @dots{}etcetera.
 | |
| For example, the analogue in this method to the example above is:
 | |
| 
 | |
| @example
 | |
| int i,j;
 | |
| fftw_complex ***a_bad_array;  /* @r{another way to make a 5x12x27 array} */
 | |
| 
 | |
| a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
 | |
| for (i = 0; i < 5; ++i) @{
 | |
|      a_bad_array[i] = 
 | |
|         (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
 | |
|      for (j = 0; j < 12; ++j)
 | |
|           a_bad_array[i][j] =
 | |
|                 (fftw_complex *) malloc(27 * sizeof(fftw_complex));
 | |
| @}
 | |
| @end example
 | |
| 
 | |
| As you can see, this sort of array is inconvenient to allocate (and
 | |
| deallocate).  On the other hand, it has the advantage that the
 | |
| @math{(i,j,k)}-th element can be referenced simply by
 | |
| @code{a_bad_array[i][j][k]}.
 | |
| 
 | |
| If you like this technique and want to maximize convenience in accessing
 | |
| the array, but still want to pass the array to FFTW, you can use a
 | |
| hybrid method.  Allocate the array as one contiguous block, but also
 | |
| declare an array of arrays of pointers that point to appropriate places
 | |
| in the block.  That sort of trick is beyond the scope of this
 | |
| documentation; for more information on multi-dimensional arrays in C,
 | |
| see the @code{comp.lang.c}
 | |
| @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}.
 | |
| 
 | |
| @c ------------------------------------------------------------
 | |
| @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics
 | |
| @section Words of Wisdom---Saving Plans
 | |
| @cindex wisdom
 | |
| @cindex saving plans to disk
 | |
| 
 | |
| FFTW implements a method for saving plans to disk and restoring them.
 | |
| In fact, what FFTW does is more general than just saving and loading
 | |
| plans.  The mechanism is called @dfn{wisdom}.  Here, we describe
 | |
| this feature at a high level. @xref{FFTW Reference}, for a less casual
 | |
| but more complete discussion of how to use wisdom in FFTW.
 | |
| 
 | |
| Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or
 | |
| @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance,
 | |
| but may require a long time to compute because FFTW must measure the
 | |
| runtime of many possible plans and select the best one.  This setup is
 | |
| designed for the situations where so many transforms of the same size
 | |
| must be computed that the start-up time is irrelevant.  For short
 | |
| initialization times, but slower transforms, we have provided
 | |
| @code{FFTW_ESTIMATE}.  The @code{wisdom} mechanism is a way to get the
 | |
| best of both worlds: you compute a good plan once, save it to
 | |
| disk, and later reload it as many times as necessary.  The wisdom
 | |
| mechanism can actually save and reload many plans at once, not just
 | |
| one.
 | |
| @ctindex FFTW_MEASURE
 | |
| @ctindex FFTW_PATIENT
 | |
| @ctindex FFTW_EXHAUSTIVE
 | |
| @ctindex FFTW_ESTIMATE
 | |
| 
 | |
| 
 | |
| Whenever you create a plan, the FFTW planner accumulates wisdom, which
 | |
| is information sufficient to reconstruct the plan.  After planning,
 | |
| you can save this information to disk by means of the function:
 | |
| @example
 | |
| int fftw_export_wisdom_to_filename(const char *filename);
 | |
| @end example
 | |
| @findex fftw_export_wisdom_to_filename
 | |
| (This function returns non-zero on success.)
 | |
| 
 | |
| The next time you run the program, you can restore the wisdom with
 | |
| @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success),
 | |
| and then recreate the plan using the same flags as before.
 | |
| @example
 | |
| int fftw_import_wisdom_from_filename(const char *filename);
 | |
| @end example
 | |
| @findex fftw_import_wisdom_from_filename
 | |
| 
 | |
| Wisdom is automatically used for any size to which it is applicable, as
 | |
| long as the planner flags are not more ``patient'' than those with which
 | |
| the wisdom was created.  For example, wisdom created with
 | |
| @code{FFTW_MEASURE} can be used if you later plan with
 | |
| @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with
 | |
| @code{FFTW_PATIENT}.
 | |
| 
 | |
| The @code{wisdom} is cumulative, and is stored in a global, private
 | |
| data structure managed internally by FFTW.  The storage space required
 | |
| is minimal, proportional to the logarithm of the sizes the wisdom was
 | |
| generated from.  If memory usage is a concern, however, the wisdom can
 | |
| be forgotten and its associated memory freed by calling:
 | |
| @example
 | |
| void fftw_forget_wisdom(void);
 | |
| @end example
 | |
| @findex fftw_forget_wisdom
 | |
| 
 | |
| Wisdom can be exported to a file, a string, or any other medium.
 | |
| For details, see @ref{Wisdom}.
 | |
| 
 | |
| @node Caveats in Using Wisdom,  , Words of Wisdom-Saving Plans, Other Important Topics
 | |
| @section Caveats in Using Wisdom
 | |
| @cindex wisdom, problems with
 | |
| 
 | |
| @quotation
 | |
| @html
 | |
| <i>
 | |
| @end html
 | |
| For in much wisdom is much grief, and he that increaseth knowledge
 | |
| increaseth sorrow.
 | |
| @html
 | |
| </i>
 | |
| @end html
 | |
| [Ecclesiastes 1:18]
 | |
| @cindex Ecclesiastes
 | |
| @end quotation
 | |
| @iftex
 | |
| @medskip
 | |
| @end iftex
 | |
| 
 | |
| @cindex portability
 | |
| There are pitfalls to using wisdom, in that it can negate FFTW's
 | |
| ability to adapt to changing hardware and other conditions. For
 | |
| example, it would be perfectly possible to export wisdom from a
 | |
| program running on one processor and import it into a program running
 | |
| on another processor.  Doing so, however, would mean that the second
 | |
| program would use plans optimized for the first processor, instead of
 | |
| the one it is running on.
 | |
| 
 | |
| It should be safe to reuse wisdom as long as the hardware and program
 | |
| binaries remain unchanged. (Actually, the optimal plan may change even
 | |
| between runs of the same binary on identical hardware, due to
 | |
| differences in the virtual memory environment, etcetera.  Users
 | |
| seriously interested in performance should worry about this problem,
 | |
| too.)  It is likely that, if the same wisdom is used for two
 | |
| different program binaries, even running on the same machine, the
 | |
| plans may be sub-optimal because of differing code alignments.  It is
 | |
| therefore wise to recreate wisdom every time an application is
 | |
| recompiled.  The more the underlying hardware and software changes
 | |
| between the creation of wisdom and its use, the greater grows
 | |
| the risk of sub-optimal plans.
 | |
| 
 | |
| Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or
 | |
| using possibly-suboptimal wisdom (created on the same machine, but for a
 | |
| different binary), the wisdom is likely to be better.  For this reason,
 | |
| we provide a function to import wisdom from a standard system-wide
 | |
| location (@code{/etc/fftw/wisdom} on Unix):
 | |
| @cindex wisdom, system-wide
 | |
| 
 | |
| @example
 | |
| int fftw_import_system_wisdom(void);
 | |
| @end example
 | |
| @findex fftw_import_system_wisdom
 | |
| 
 | |
| FFTW also provides a standalone program, @code{fftw-wisdom} (described
 | |
| by its own @code{man} page on Unix) with which users can create wisdom,
 | |
| e.g. for a canonical set of sizes to store in the system wisdom file.
 | |
| @xref{Wisdom Utilities}.
 | |
| @cindex fftw-wisdom utility
 | |
| 
 | 
