682 lines
		
	
	
		
			25 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			682 lines
		
	
	
		
			25 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|   | FFTW 3.3.10: | ||
|  | 
 | ||
|  | * Fix bug that would cause 2-way SIMD (notably SSE2 in double precision) | ||
|  |   to attempt unaligned accesses in certain obscure cases, causing | ||
|  |   segfaults. | ||
|  |    | ||
|  |   The following test triggers the bug (SSE2, double precision): | ||
|  | 
 | ||
|  |     ./tests/bench -oexhaustive r4*2:5:3 | ||
|  | 
 | ||
|  |   This test computes a pair of length-4 real->complex transforms where | ||
|  |   the second input is 5 real numbers away from the first input.  That | ||
|  |   is, there is a gap of one real number between the first and second | ||
|  |   input array.  The -oexhaustive level allow FFTW to attempt to | ||
|  |   compute this transform by reducing it to a pair of complex | ||
|  |   transforms of length 2, but now the second input is not aligned to a | ||
|  |   complex-number boundary.  The fact that 5 is odd is the problem. | ||
|  | 
 | ||
|  |   The bug cannot occur in complex->complex transforms because the | ||
|  |   complex interface accepts strides in units of complex numbers, so | ||
|  |   strides are aligned by construction. | ||
|  |    | ||
|  |   This bug has been around at least since fftw-3.1.2 (July 2006), and | ||
|  |   probably since fftw-3.0 (2003). | ||
|  | 
 | ||
|  | FFTW 3.3.9: | ||
|  | 
 | ||
|  | * New API fftw_planner_nthreads() returns the number of threads | ||
|  |   currently being used by the planner. | ||
|  | 
 | ||
|  | * Fix incorrect math in 128-bit generic SIMD | ||
|  | 
 | ||
|  | * Fix wisdom for avx512. | ||
|  | 
 | ||
|  |   The avx512 alignment requirement was set to 64 bytes, but this is | ||
|  |   wrong.  Alignment requirements are a property of the platform (e.g., | ||
|  |   x86) and not of the instruction set (e.g., AVX).  Among other | ||
|  |   things, this broke wisdom with avx512. | ||
|  | 
 | ||
|  |   Note that avx512 support is still experimental because the FFTW | ||
|  |   authors have no avx512 hardware available for testing. | ||
|  | 
 | ||
|  | * fftw_threads_set_callback function to change the threading backend at runtime. | ||
|  | 
 | ||
|  | FFTW 3.3.8: | ||
|  | 
 | ||
|  | * Fixed AVX, AVX2 for gcc-8. | ||
|  | 
 | ||
|  |   By default, FFTW 3.3.7 was broken with gcc-8.  AVX and AVX2 code | ||
|  |   assumed that the compiler honors the distinction between +0 and -0, | ||
|  |   but gcc-8 -ffast-math does not.  The default CFLAGS included -ffast-math. | ||
|  |   This release ensures that FFTW works with gcc-8 -ffast-math, and | ||
|  |   removes -ffast-math from the default CFLAGS for good measure. | ||
|  | 
 | ||
|  | FFTW 3.3.7: | ||
|  | 
 | ||
|  | * Experimental support for CMake. | ||
|  | 
 | ||
|  |   The primary build mechanism for FFTW remains GNU autoconf/automake. | ||
|  |   CMake support is meant to offer an easy way to compile FFTW on | ||
|  |   Windows, and as such it does not cover all the features of the | ||
|  |   automake build system, such as exotic cycle counters, | ||
|  |   cross-compiling, or build of binaries for a mixture of ISA's | ||
|  |   (e.g., amd64 vs amd64+avx vs amd64+avx2).  Patches are welcome. | ||
|  | 
 | ||
|  | * Fixes for armv7a cycle counter. | ||
|  | * Official support for aarch64, now that we have hardware to test it. | ||
|  | * Tweak usage of FMA instructions in a way that favors newer processors | ||
|  |   (Skylake and Ryzen) over older processors (Haswell). | ||
|  | * tests/bench: use 64-bit precision to compute mflops. | ||
|  | 
 | ||
|  | FFTW 3.3.6-pl2: | ||
|  | 
 | ||
|  | * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1. | ||
|  | 
 | ||
|  | FFTW 3.3.6-pl1: | ||
|  | 
 | ||
|  | * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated | ||
|  |   shared libraries of the form libfftw3.so.2.6.6 instead of | ||
|  |   libfftw3.so.3.*. | ||
|  | 
 | ||
|  | FFTW 3.3.6: | ||
|  | 
 | ||
|  | * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't | ||
|  |   work, and this 3.3.6 fixes it.  Sorry about that. | ||
|  | * compilation fixes for IBM XLC | ||
|  | * compilation fixes for threads on Windows | ||
|  | * fix SIMD autodetection on amd64 when (_MSC_VER > 1500) | ||
|  | 
 | ||
|  | FFTW 3.3.5: | ||
|  | 
 | ||
|  | * New SIMD support: | ||
|  |   - Power8 VSX instructions in single and double precision. | ||
|  |     To use, add --enable-vsx to configure. | ||
|  |   - Support for AVX2 (256-bit FMA instructions). | ||
|  |     To use, add --enable-avx2 to configure. | ||
|  |   - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi) | ||
|  |     This code is expected to work but the FFTW maintainers do not have | ||
|  |     hardware to test it. | ||
|  |   - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma) | ||
|  |   - Double precision Neon SIMD for aarch64. | ||
|  |     This code is expected to work but the FFTW maintainers do not have | ||
|  |     hardware to test it. | ||
|  |   - generic SIMD support using gcc vector intrinsics | ||
|  | * Add fftw_make_planner_thread_safe() API | ||
|  | * fix #18 (disable float128 for CUDACC) | ||
|  | * fix #19: missing Fortran interface for fftwq_alloc_real | ||
|  | * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc) | ||
|  | * fix: Avoid segfaults due to double free in MPI transpose | ||
|  | 
 | ||
|  | * Special note for distribution maintainers: Although FFTW supports a | ||
|  |   zillion SIMD instruction sets, enabling them all at the same time is | ||
|  |   a bad idea, because it increases the planning time for minimal gain. | ||
|  |   We recommend that general-purpose x86 distributions only enable SSE2 | ||
|  |   and perhaps AVX.  Users who care about the last ounce of performance | ||
|  |   should recompile FFTW themselves. | ||
|  | 
 | ||
|  | FFTW 3.3.4 | ||
|  | 
 | ||
|  | * New functions fftw_alignment_of (to check whether two arrays are | ||
|  |   equally aligned for the purposes of applying a plan) and fftw_sprint_plan | ||
|  |   (to output a description of plan to a string). | ||
|  | 
 | ||
|  | * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the | ||
|  |   bug report. | ||
|  | 
 | ||
|  | * Fixed manual to work with texinfo-5. | ||
|  | 
 | ||
|  | * Increased timing interval on x86_64 to reduce timing errors. | ||
|  | 
 | ||
|  | * Default to Win32 threads, not pthreads, if both are present. | ||
|  | 
 | ||
|  | * Various build-script fixes. | ||
|  | 
 | ||
|  | FFTW 3.3.3 | ||
|  | 
 | ||
|  | * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the | ||
|  |   bug report and patch, and to Graham Dennis for the bug report). | ||
|  | 
 | ||
|  | * Use 128-bit ARM NEON instructions instead of 64-bits.  This change | ||
|  |   appears to speed up even ARM processors with a 64-bit NEON pipe. | ||
|  | 
 | ||
|  | * Speed improvements for single-precision AVX. | ||
|  | 
 | ||
|  | * Speed up planner on machines without "official" cycle counters, such as ARM. | ||
|  | 
 | ||
|  | FFTW 3.3.2 | ||
|  | 
 | ||
|  | * Removed an archaic stack-alignment hack that was failing with | ||
|  |   gcc-4.7/i386. | ||
|  | 
 | ||
|  | * Added stack-alignment hack necessary for gcc on Windows/i386.  We | ||
|  |   will regret this in ten years (see previous change). | ||
|  | 
 | ||
|  | * Fix incompatibility with Intel icc which pretends to be gcc | ||
|  |   but does not support quad precision. | ||
|  | 
 | ||
|  | * make libfftw{threads,mpi} depend upon libfftw when using libtool; | ||
|  |   this is consistent with most other libraries and simplifies the life | ||
|  |   of various distributors of GNU/Linux. | ||
|  | 
 | ||
|  | FFTW 3.3.1 | ||
|  | 
 | ||
|  | * Changes since 3.3.1-beta1: | ||
|  | 
 | ||
|  |   - Reduced planning time in estimate mode for sizes with large | ||
|  |     prime factors. | ||
|  | 
 | ||
|  |   - Added AVX autodetection under Visual Studio.  Thanks Carsten | ||
|  |     Steger for submitting the necessary code. | ||
|  | 
 | ||
|  |   - Modern Fortran interface now uses a separate fftw3l.f03 interface | ||
|  |     file for the long double interface, which is not supported by | ||
|  |     some Fortran compilers.  Provided new fftw3q.f03 interface file | ||
|  |     to access the quadruple-precision FFTW routines with recent | ||
|  |     versions of gcc/gfortran. | ||
|  | 
 | ||
|  | * Added support for the NEON extensions to the ARM ISA.  (Note to beta | ||
|  |   users: an ARM cycle counter is not yet implemented; please contact | ||
|  |   fftw@fftw.org if you know how to do it right.) | ||
|  | 
 | ||
|  | * MPI code now compiles even if mpicc is a C++ compiler; thanks to | ||
|  |   Kyle Spyksma for the bug report. | ||
|  | 
 | ||
|  | FFTW 3.3 | ||
|  | 
 | ||
|  | * Changes since 3.3-beta1: | ||
|  | 
 | ||
|  |   - Compiling OpenMP support (--enable-openmp) now installs a | ||
|  |     fftw3_omp library, instead of fftw3_threads, so that OpenMP | ||
|  |     and POSIX threads (--enable-threads) libraries can be built | ||
|  |     and installed at the same time. | ||
|  | 
 | ||
|  |   - Various minor compilation fixes, corrections of manual typos, and | ||
|  |     improvements to the benchmark test program. | ||
|  | 
 | ||
|  | * Add support for the AVX extensions to x86 and x86-64.  The AVX code | ||
|  |   works with 16-byte alignment (as opposed to 32-byte alignment), | ||
|  |   so there is no ABI change compared to FFTW 3.2.2. | ||
|  | 
 | ||
|  | * Added Fortran 2003 interface, which should be usable on most modern | ||
|  |   Fortran compilers (e.g. gfortran) and provides type-checked access | ||
|  |   to the the C FFTW interface.  (The legacy Fortran-77 interface is | ||
|  |   still included also.) | ||
|  | 
 | ||
|  | * Added MPI distributed-memory transforms.  Compared to 3.3alpha, | ||
|  |   the major changes in the MPI transforms are: | ||
|  |     - Fixed some deadlock and crashing bugs. | ||
|  |     - Added Fortran 2003 interface. | ||
|  |     - Added new-array execute functions for MPI plans. | ||
|  |     - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; | ||
|  |       thanks to Jonathan Bentz for the bug report. | ||
|  |     - Expanded documentation. | ||
|  |     - 'make check' now runs MPI tests | ||
|  |     - Some ABI changes - not binary-compatible with 3.3alpha MPI. | ||
|  | 
 | ||
|  | * Add support for quad-precision __float128 in gcc 4.6 or later (on x86. | ||
|  |   x86-64, and Itanium).  The new routines use the fftwq_ prefix. | ||
|  | 
 | ||
|  | * Removed support for MIPS paired-single instructions due to lack of | ||
|  |   available hardware for testing.  Users who want this functionality | ||
|  |   should continue using FFTW 3.2.x.  (Note that FFTW 3.3 still works | ||
|  |   on MIPS; this only concerns special instructions available on some | ||
|  |   MIPS chips.) | ||
|  | 
 | ||
|  | * Removed support for the Cell Broadband Engine.  Cell users should | ||
|  |   use FFTW 3.2.x. | ||
|  | 
 | ||
|  | * New convenience functions fftw_alloc_real and fftw_alloc_complex | ||
|  |   to use fftw_malloc for real and complex arrays without typecasts | ||
|  |   or sizeof. | ||
|  | 
 | ||
|  | * New convenience functions fftw_export_wisdom_to_filename and | ||
|  |   fftw_import_wisdom_from_filename that export/import wisdom | ||
|  |   to a file, which don't require you to open/close the file yourself. | ||
|  | 
 | ||
|  | * New function fftw_cost to return FFTW's internal cost metric for | ||
|  |   a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the | ||
|  |   suggestion. | ||
|  | 
 | ||
|  | * The --enable-sse2 configure flag now works in both double and single | ||
|  |   precision (and is equivalent to --enable-sse in the latter case). | ||
|  | 
 | ||
|  | * Remove --enable-portable-binary flag: we new produce portable binaries | ||
|  |   by default. | ||
|  | 
 | ||
|  | * Remove the automatic detection of native architecture flag for gcc | ||
|  |   which was introduced in fftw-3.1, since new gcc supports -mtune=native. | ||
|  |   Remove the --with-gcc-arch flag; if you want to specify a particlar | ||
|  |   arch to configure, use ./configure CC="gcc -mtune=...". | ||
|  | 
 | ||
|  | * --with-our-malloc16 configure flag is now renamed --with-our-malloc. | ||
|  | 
 | ||
|  | * Fixed build problem failure when srand48 declaration is missing; | ||
|  |   thanks to Ralf Wildenhues for the bug report. | ||
|  | 
 | ||
|  | * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit | ||
|  |   is equivalent to no timelimit in all cases.  Thanks to William Andrew | ||
|  |   Burnson for the bug report. | ||
|  | 
 | ||
|  | * Fixed stack-overflow problem on OpenBSD caused by using alloca with | ||
|  |   too large a buffer. | ||
|  | 
 | ||
|  | FFTW 3.2.2 | ||
|  | 
 | ||
|  | * Improve performance of some copy operations of complex arrays on | ||
|  |   x86 machines. | ||
|  | 
 | ||
|  | * Add configure flag to disable alloca(), which is broken in mingw64. | ||
|  | 
 | ||
|  | * Planning in FFTW_ESTIMATE mode for r2r transforms became slower | ||
|  |   between fftw-3.1.3 and 3.2.  This regression has now been fixed. | ||
|  | 
 | ||
|  | FFTW 3.2.1 | ||
|  | 
 | ||
|  | * Performance improvements for some multidimensional r2c/c2r transforms; | ||
|  |   thanks to Eugene Miloslavsky for his benchmark reports. | ||
|  | 
 | ||
|  | * Compile with icc on MacOS X, use better icc compiler flags. | ||
|  | 
 | ||
|  | * Compilation fixes for systems where snprintf is defined as a macro; | ||
|  |   thanks to Marcus Mae for the bug report. | ||
|  | 
 | ||
|  | * Fortran documentation now recommends not using dfftw_execute, | ||
|  |   because of reports of problems with various Fortran compilers; | ||
|  |   it is better to use dfftw_execute_dft etcetera. | ||
|  | 
 | ||
|  | * Some documentation clarifications, e.g. of fact that --enable-openmp | ||
|  |   and --enable-threads are mutually exclusive (thanks to Long To), | ||
|  |   and document slightly odd behavior of plan_guru_r2r in Fortran | ||
|  |   (thanks to Alexander Pozdneev). | ||
|  | 
 | ||
|  | * FAQ was accidentally omitted from 3.2 tarball. | ||
|  | 
 | ||
|  | * Remove some extraneous (harmless) files accidentally included in | ||
|  |   a subdirectory of the 3.2 tarball. | ||
|  | 
 | ||
|  | FFTW 3.2 | ||
|  | 
 | ||
|  | * Worked around apparent glibc bug that leads to rare hangs when freeing | ||
|  |   semaphores. | ||
|  | 
 | ||
|  | * Fixed segfault due to unaligned access in certain obscure problems | ||
|  |   that use SSE and multiple threads. | ||
|  | 
 | ||
|  | * MPI transforms not included, as they are still in alpha; the alpha | ||
|  |   versions of the MPI transforms have been moved to FFTW 3.3alpha1. | ||
|  | 
 | ||
|  | FFTW 3.2alpha3 | ||
|  | 
 | ||
|  | * Performance improvements for sizes with factors of 5 and 10. | ||
|  | 
 | ||
|  | * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario | ||
|  |   Emmenlauer and Phil Dumont. | ||
|  | 
 | ||
|  | * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. | ||
|  | 
 | ||
|  | * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner | ||
|  |   for the suggestions. | ||
|  | 
 | ||
|  | * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle | ||
|  |   counter for AIX/xlc (thanks to Jeff Haferman for the bug report). | ||
|  | 
 | ||
|  | * Fixed incorrect type prefix in MPI code that prevented wisdom routines | ||
|  |   from working in single precision (thanks to Eric A. Borisch for the report). | ||
|  | 
 | ||
|  | * Added 'make check' for MPI code (which still fails in a couple corner | ||
|  |   cases, but should be much better than in alpha2). | ||
|  | 
 | ||
|  | * Many other small fixes. | ||
|  | 
 | ||
|  | FFTW 3.2alpha2 | ||
|  | 
 | ||
|  | * Support for the Cell processor, donated by IBM Research; see README.Cell | ||
|  |   and the Cell section of the manual. | ||
|  | 
 | ||
|  | * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" | ||
|  |   function with the same semantics, but which takes fftw_iodim64 instead of | ||
|  |   fftw_iodim.  fftw_iodim64 is the same as fftw_iodim, except that it takes | ||
|  |   ptrdiff_t integer types as parameters, which is a 64-bit type on | ||
|  |   64-bit machines.  This is only useful for specifying very large transforms | ||
|  |   on 64-bit machines.  (Internally, FFTW uses ptrdiff_t everywhere | ||
|  |   regardless of what API you choose.) | ||
|  | 
 | ||
|  | * Experimental MPI support.  Complex one- and multi-dimensional FFTs, | ||
|  |   multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and | ||
|  |   distributed transpose operations, with 1d block distributions. | ||
|  |   (This is an alpha preview: routines have not been exhaustively | ||
|  |   tested, documentation is incomplete, and some functionality is | ||
|  |   missing, e.g. Fortran support.)  See mpi/README and also the MPI | ||
|  |   section of the manual. | ||
|  | 
 | ||
|  | * Significantly faster r2c/c2r transforms, especially on machines with SIMD. | ||
|  | 
 | ||
|  | * Rewritten multi-threaded support for better performance by | ||
|  |   re-using a fixed pool of threads rather than continually | ||
|  |   respawning and joining (which nowadays is much slower). | ||
|  | 
 | ||
|  | * Support for MIPS paired-single SIMD instructions, donated by | ||
|  |   Codesourcery. | ||
|  | 
 | ||
|  | * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is | ||
|  |   available and return NULL otherwise. | ||
|  | 
 | ||
|  | * Removed k7 support, which only worked in 32-bit mode and is | ||
|  |   becoming obsolete.  Use --enable-sse instead. | ||
|  | 
 | ||
|  | * Added --with-g77-wrappers configure option to force inclusion | ||
|  |   of g77 wrappers, in addition to whatever is needed for the | ||
|  |   detected Fortran compilers.  This is mainly intended for GNU/Linux | ||
|  |   distros switching to gfortran that wish to include both | ||
|  |   gfortran and g77 support in FFTW. | ||
|  | 
 | ||
|  | * In manual, renamed "guru execute" functions to "new-array execute" | ||
|  |   functions, to reduce confusion with the guru planner interface. | ||
|  |   (The programming interface is unchanged.) | ||
|  | 
 | ||
|  | * Add missing __declspec attribute to threads API functions when compiling | ||
|  |   for Windows; thanks to Robert O. Morris for the bug report. | ||
|  | 
 | ||
|  | * Fixed missing return value from dfftw_init_threads in Fortran; | ||
|  |   thanks to Markus Wetzstein for the bug report. | ||
|  | 
 | ||
|  | FFTW 3.1.3 | ||
|  | 
 | ||
|  | * Bug fix: FFTW computes incorrect results when the user plans both | ||
|  |   REDFT11 and RODFT11 transforms of certain sizes.  The bug is caused | ||
|  |   by incorrect sharing of twiddle-factor tables between the two | ||
|  |   transforms, and only occurs when both are used.  Thanks to Paul | ||
|  |   A. Valiant for the bug report. | ||
|  | 
 | ||
|  | FFTW 3.1.2 | ||
|  | 
 | ||
|  | * Correct bug in configure script: --enable-portable-binary option was ignored! | ||
|  |   Thanks to Andrew Salamon for the bug report. | ||
|  | 
 | ||
|  | * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use | ||
|  |   either if we are using gcc.  Thanks to Guy Moebs for the bug report. | ||
|  | 
 | ||
|  | * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken, | ||
|  |   and suggest a workaround.  configure script now detects Core/Duo arch. | ||
|  | 
 | ||
|  | * Use -maltivec when checking for altivec.h.  Fixes Gentoo bug #129304, | ||
|  |   thanks to Markus Dittrich. | ||
|  | 
 | ||
|  | FFTW 3.1.1 | ||
|  | 
 | ||
|  | * Performance improvements for Intel EMT64. | ||
|  | 
 | ||
|  | * Performance improvements for large-size transforms with SIMD. | ||
|  | 
 | ||
|  | * Cycle counter support for Intel icc and Visual C++ on x86-64. | ||
|  | 
 | ||
|  | * In fftw-wisdom tool, replaced obsolete --impatient with --measure. | ||
|  | 
 | ||
|  | * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. | ||
|  | 
 | ||
|  | * Windows DLL support for Fortran API (added missing __declspec(dllexport)). | ||
|  | 
 | ||
|  | * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 | ||
|  |   CPUs lacking a CPUID instruction; thanks to Eric Korpela. | ||
|  | 
 | ||
|  | FFTW 3.1 | ||
|  | 
 | ||
|  | * Faster FFTW_ESTIMATE planner. | ||
|  | 
 | ||
|  | * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. | ||
|  | 
 | ||
|  | * "4-step" algorithm for faster FFTs of very large sizes (> 2^18). | ||
|  | 
 | ||
|  | * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). | ||
|  | 
 | ||
|  | * Faster in-place non-square transpositions (FFTW uses these internally | ||
|  |   for in-place FFTs, and you can also perform them explicitly using | ||
|  |   the guru interface). | ||
|  | 
 | ||
|  | * Faster prime-size DFTs: implemented Bluestein's algorithm, as well | ||
|  |   as a zero-padded Rader variant to limit recursive use of Rader's algorithm. | ||
|  | 
 | ||
|  | * SIMD support for split complex arrays. | ||
|  | 
 | ||
|  | * Much faster Altivec/VMX performance. | ||
|  | 
 | ||
|  | * New fftw_set_timelimit function to specify a (rough) upper bound to the | ||
|  |   planning time (does not affect ESTIMATE mode). | ||
|  | 
 | ||
|  | * Removed --enable-3dnow support; use --enable-k7 instead. | ||
|  | 
 | ||
|  | * FMA (fused multiply-add) version is now included in "standard" FFTW, | ||
|  |   and is enabled with --enable-fma (the default on PowerPC and Itanium). | ||
|  | 
 | ||
|  | * Automatic detection of native architecture flag for gcc.  New | ||
|  |   configure options: --enable-portable-binary and --with-gcc-arch=<arch>, | ||
|  |   for people distributing compiled binaries of FFTW (see manual). | ||
|  | 
 | ||
|  | * Automatic detection of Altivec under Linux with gcc 3.4 (so that | ||
|  |   same binary should work on both Altivec and non-Altivec PowerPCs). | ||
|  | 
 | ||
|  | * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, | ||
|  |   Solaris/Intel. | ||
|  | 
 | ||
|  | * Various documentation clarifications. | ||
|  | 
 | ||
|  | * 64-bit clean.  (Fixes a bug affecting the split guru planner on | ||
|  |   64-bit machines, reported by David Necas.) | ||
|  | 
 | ||
|  | * Fixed Debian bug #259612: inadvertent use of SSE instructions on | ||
|  |   non-SSE machines (causing a crash) for --enable-sse binaries. | ||
|  | 
 | ||
|  | * Fixed bug that caused HC2R transforms to destroy the input in | ||
|  |   certain cases, even if the user specified FFTW_PRESERVE_INPUT. | ||
|  | 
 | ||
|  | * Fixed bug where wisdom would be lost under rare circumstances, | ||
|  |   causing excessive planning time. | ||
|  | 
 | ||
|  | * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. | ||
|  | 
 | ||
|  | * Fixed accidentally exported symbol that prohibited simultaneous | ||
|  |   linking to double/single multithreaded FFTW (thanks to Alessio Massaro). | ||
|  | 
 | ||
|  | * Support Win32 threads under MinGW (thanks to Alessio Massaro). | ||
|  | 
 | ||
|  | * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. | ||
|  | 
 | ||
|  | * Fix build failure if no Fortran compiler is found (thanks to Charles | ||
|  |   Radley for the bug report). | ||
|  | 
 | ||
|  | * Fixed compilation failure with icc 8.0 and SSE/SSE2.  Automatic | ||
|  |   detection of icc architecture flag (e.g. -xW). | ||
|  | 
 | ||
|  | * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). | ||
|  | 
 | ||
|  | * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). | ||
|  | 
 | ||
|  | * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, | ||
|  |   but its malloc is 16-byte aligned). | ||
|  | 
 | ||
|  | * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, | ||
|  |   MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for | ||
|  |   reports/fixes).  Added x86-64 cycle counter for PGI compilers, | ||
|  |   courtesy Cristiano Calonaci. | ||
|  | 
 | ||
|  | * Fix compilation problem in test program due to C99 conflict. | ||
|  | 
 | ||
|  | * Portability fix for import_system_wisdom with djgpp (thanks to Juan | ||
|  |   Manuel Guerrero). | ||
|  | 
 | ||
|  | * Fixed compilation failure on MacOS 10.3 due to getopt conflict. | ||
|  | 
 | ||
|  | * Work around Visual C++ (version 6/7) bug in SSE compilation; | ||
|  |   thanks to Eddie Yee for his detailed report. | ||
|  | 
 | ||
|  | Changes from FFTW 3.1 beta 2: | ||
|  | 
 | ||
|  | * Several minor compilation fixes. | ||
|  | 
 | ||
|  | * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with | ||
|  |   fftw_set_timelimit function.  Make wisdom work with time-limited plans. | ||
|  | 
 | ||
|  | Changes from FFTW 3.1 beta 1: | ||
|  | 
 | ||
|  | * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. | ||
|  | 
 | ||
|  | * Fixed more 64-bit problems, thanks to John Pavel for the bug report. | ||
|  | 
 | ||
|  | * Further speed improvements for Altivec/VMX. | ||
|  | 
 | ||
|  | * Further speed improvements for non-square transpositions. | ||
|  | 
 | ||
|  | * Many minor tweaks. | ||
|  | 
 | ||
|  | FFTW 3.0.1 | ||
|  | 
 | ||
|  | * Some speed improvements in SIMD code. | ||
|  | 
 | ||
|  | * --without-cycle-counter option is removed.  If no cycle counter is found, | ||
|  |   then the estimator is always used.  A --with-slow-timer option is provided | ||
|  |   to force the use of lower-resolution timers. | ||
|  | 
 | ||
|  | * Several fixes for compilation under Visual C++, with help from Stefane Ruel. | ||
|  | 
 | ||
|  | * Added x86 cycle counter for Visual C++, with help from Morten Nissov. | ||
|  | 
 | ||
|  | * Added S390 cycle counter, courtesy of James Treacy. | ||
|  | 
 | ||
|  | * Added missing static keyword that prevented simultaneous linkage | ||
|  |   of different-precision versions; thanks to Rasmus Larsen for the bug report. | ||
|  | 
 | ||
|  | * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. | ||
|  | 
 | ||
|  | * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. | ||
|  | 
 | ||
|  | * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase | ||
|  |   preprocessor limits; thanks to Peter Vouras for the bug report. | ||
|  | 
 | ||
|  | * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; | ||
|  |   thanks to Nicolas Decoster for the patch. | ||
|  | 
 | ||
|  | * Added 'make smallcheck' target in tests/ directory, at the request of | ||
|  |   James Treacy. | ||
|  | 
 | ||
|  | FFTW 3.0 | ||
|  | 
 | ||
|  | Major goals of this release: | ||
|  | 
 | ||
|  | * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). | ||
|  | 
 | ||
|  | * Complete rewrite, to make it easier to add new algorithms and transforms. | ||
|  | 
 | ||
|  | * New API, to support more general semantics. | ||
|  | 
 | ||
|  | Other enhancements: | ||
|  | 
 | ||
|  | * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). | ||
|  |  (With special thanks to Franz Franchetti for many experimental prototypes | ||
|  |   and to Stefan Kral for the vectorizing generator from fftwgel.) | ||
|  | 
 | ||
|  | * True in-place 1d transforms of large sizes (as well as compressed | ||
|  |   twiddle tables for additional memory/cache savings). | ||
|  | 
 | ||
|  | * More arbitrary placement of real & imaginary data, e.g. including | ||
|  |   interleaved (as in FFTW 2.x) as well as separate real/imag arrays. | ||
|  | 
 | ||
|  | * Efficient prime-size transforms of real data. | ||
|  | 
 | ||
|  | * Multidimensional transforms can operate on a subset of a larger matrix, | ||
|  |   and/or transform selected dimensions of a multidimensional array. | ||
|  | 
 | ||
|  | * By popular demand, simultaneous linking to double precision (fftw), | ||
|  |   single precision (fftwf), and long-double precision (fftwl) versions | ||
|  |   of FFTW is now supported. | ||
|  | 
 | ||
|  | * Cycle counters (on all modern CPUs) are exploited to speed planning. | ||
|  | 
 | ||
|  | * Efficient transforms of real even/odd arrays, a.k.a. discrete | ||
|  |   cosine/sine transforms (types I-IV).  (Currently work via pre/post | ||
|  |   processing of real transforms, ala FFTPACK, so are not optimal.) | ||
|  | 
 | ||
|  | * DHTs (Discrete Hartley Transforms), again via post-processing | ||
|  |   of real transforms (and thus suboptimal, for now). | ||
|  | 
 | ||
|  | * Support for linking to just those parts of FFTW that you need, | ||
|  |   greatly reducing the size of statically linked programs when | ||
|  |   only a limited set of transform sizes/types are required. | ||
|  | 
 | ||
|  | * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along | ||
|  |   with a command-line tool (fftw-wisdom) to generate/update it. | ||
|  | 
 | ||
|  | * Fortran API can be used with both g77 and non-g77 compilers | ||
|  |   simultaneously. | ||
|  | 
 | ||
|  | * Multi-threaded version has optional OpenMP support. | ||
|  | 
 | ||
|  | * Authors' good looks have greatly improved with age. | ||
|  | 
 | ||
|  | Changes from 3.0beta3: | ||
|  | 
 | ||
|  | * Separate FMA distribution to better exploit fused multiply-add instructions | ||
|  |   on PowerPC (and possibly other) architectures. | ||
|  | 
 | ||
|  | * Performance improvements via some inlining tweaks. | ||
|  | 
 | ||
|  | * fftw_flops now returns double arguments, not int, to avoid overflows | ||
|  |   for large sizes. | ||
|  | 
 | ||
|  | * Workarounds for automake bugs. | ||
|  | 
 | ||
|  | Changes from 3.0beta2: | ||
|  | 
 | ||
|  | * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in | ||
|  |   FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so | ||
|  |   we replaced it with a slower routine that is more accurate. | ||
|  | 
 | ||
|  | * The guru planner and execute functions now have two variants, one that | ||
|  |   takes complex arguments and one that takes separate real/imag pointers. | ||
|  | 
 | ||
|  | * Execute and planner routines now automatically align the stack on x86, | ||
|  |   in case the calling program is misaligned. | ||
|  | 
 | ||
|  | * README file for test program. | ||
|  | 
 | ||
|  | * Fixed bugs in the combination of SIMD with multi-threaded transforms. | ||
|  | 
 | ||
|  | * Eliminated internal fftw_threads_init function, which some people were | ||
|  |   calling accidentally instead of the fftw_init_threads API function. | ||
|  | 
 | ||
|  | * Check for -openmp flag (Intel C compiler) when --enable-openmp is used. | ||
|  | 
 | ||
|  | * Support AMD x86-64 SIMD and cycle counter. | ||
|  | 
 | ||
|  | * Support SSE2 intrinsics in forthcoming gcc 3.3. | ||
|  | 
 | ||
|  | Changes from 3.0beta1: | ||
|  | 
 | ||
|  | * Faster in-place 1d transforms of non-power-of-two sizes. | ||
|  | 
 | ||
|  | * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT | ||
|  |   transforms. | ||
|  | 
 | ||
|  | * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the | ||
|  |   default distribution only includes hard-coded size-8 DCT-II/III, however. | ||
|  | 
 | ||
|  | * Many minor improvements to the manual.  Added section on using the | ||
|  |   codelet generator to customize and enhance FFTW. | ||
|  | 
 | ||
|  | * The default 'make check' should now only take a few minutes; for more | ||
|  |   strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. | ||
|  | 
 | ||
|  | * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where | ||
|  |   the latter uses stdout. | ||
|  | 
 | ||
|  | * Fixed ability to compile with a C++ compiler. | ||
|  | 
 | ||
|  | * Fixed support for C99 complex type under glibc. | ||
|  | 
 | ||
|  | * Fixed problems with alloca under MinGW, AIX. | ||
|  | 
 | ||
|  | * Workaround for gcc/SPARC bug. | ||
|  | 
 | ||
|  | * Fixed multi-threaded initialization failure on IRIX due to lack of | ||
|  |   user-accessible PTHREAD_SCOPE_SYSTEM there. |