Subversion Repositories shark

Rev

Rev 3 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 pj 1
Version 2.1.2
2
 
3
  * Fixed incompatibility between our MPI test programs and MPICH with
4
    the p4 device (TCP/IP).  (The 2.1.1 transforms worked, but the test
5
    programs crashed.)
6
 
7
  * Added missing fftw_f77_threads_init function to the Fortran wrappers
8
    for the multi-threaded transforms.  Thanks to V. Sundararajan for
9
    the bug report.
10
 
11
  * The codelet generator can now output efficient hard-coded DCT/DST
12
    transforms.  As a side effect of this work, we slightly reduced the
13
    code size of rfftw.
14
 
15
  * Test programs now support GNU-style long options when used with glibc.
16
 
17
  * Added some more ideas to our TODO list.
18
 
19
  * Improved codelet generator speed.
20
 
21
Version 2.1.1
22
 
23
  * Fixed bug in the complex transforms for certain sizes with
24
    intermediate-length prime factors (17-97), which under some
25
    (hopefully rare) circumstances could cause incorrect results.
26
    Thanks to Ming-Chang Liu for the bug report and patch.  (The test
27
    program will now catch this sort of problem when it is run in
28
    paranoid mode.)
29
 
30
Version 2.1
31
 
32
  * Added Fortran-callable wrapper routines for the multi-threaded
33
    transforms.
34
 
35
  * Documentation fixes and improvements.
36
 
37
Version 2.1-beta1
38
 
39
  * The --enable-type-prefix option to configure makes it easy to install
40
    both single- and double-precision versions of FFTW on the same
41
    (Unix) system.  (See the installation section of the manual.)
42
 
43
  * The MPI FFTW routines now include parallel one-dimensional transforms
44
    for complex data.  (See the fftw_mpi documentation in the FFTW
45
    manual.)
46
 
47
  * The MPI FFTW routines now include parallel multi-dimensional transforms
48
    specialized for real data.  (See the rfftwnd_mpi documentation in the
49
    FFTW manual.)
50
 
51
  * The MPI FFTW routines are now documented in the main
52
    manual (in the doc directory).  On Unix systems, they are also
53
    automatically configured, compiled, and installed along with the main
54
    FFTW library when you include --enable-mpi in the flags to the
55
    configure script.  (See the FFTW manual.)
56
 
57
  * Largely-rewritten MPI code.  It is now cleaner and (sometimes) faster.
58
    It also supports the option of a user-supplied workspace for (often)
59
    greater performance (using the MPI_Alltoall primitive).  Beware that
60
    the interfaces have changed slightly, however.
61
 
62
  * The multi-threaded FFTW routines now include parallel one- and
63
    multi-dimensional transforms of real data.  (See the rfftw_threads
64
    documentation in the FFTW manual.)
65
 
66
  * The multi-threaded FFTW routines are now documented in the main
67
    manual (in the doc directory).  On Unix systems, they are also
68
    automatically configured, compiled, and installed along with the main
69
    FFTW library when you include --enable-threads in the flags to the
70
    configure script.  (See the FFTW manual.)
71
 
72
  * The multi-threaded FFTW routines now include support for Mach C
73
    threads (used, for example, in Apple's MacOS X).
74
 
75
  * The Fortran-callable wrapper routines are now incorporated into
76
    the ordinary FFTW libraries by default (although you can
77
    disable this with the --disable-fortran option to configure) and
78
    are documented in the main FFTW manual.
79
 
80
  * Added an illustration of the data layout to the rfftwnd tutorial
81
    section of the manual, in the hope of preventing future confusion
82
    on this subject.
83
 
84
  * The test programs now allow you to specify multidimensional sizes
85
    (e.g. 128x54x81) for the -c and -s correctness and speed test options.
86
 
87
Version 2.0.1
88
 
89
  * (bug fix) Due to a poorly-parenthesized expression, rfftwnd overflowed
90
    32-bit integer precision for rank > 1 transforms with a final
91
    dimension >= 65536.  This is now fixed.  (Thanks to Walter Brisken
92
    for the bug report.)
93
 
94
  * (bug fix) Added definition of FFTW_OUT_OF_PLACE to fftw.h.  The
95
    flag is mentioned several times in the documentation, but its
96
    definition was accidentally omitted since FFTW_OUT_OF_PLACE is the
97
    default behavior.
98
 
99
  * Corrected various small errors in the documentation.  Thanks to
100
    Geir Thomassen and Jeremy Buhler for their comments.
101
 
102
  * Improved speed of the codelet generator by orders of magnitude,
103
    since a user needed a hard-coded fft of size 101.
104
 
105
  * Modified buffering in multidimensional transforms for some speed
106
    improvements (only when fftwnd_create_plan_specific is used).
107
    Thanks to Geert van Kempen for his tips.
108
 
109
  * Added Andrew Sterian's patch to allow FFTW to be used as a shared
110
    library more easily on Win32.
111
 
112
Version 2.0
113
 
114
  * Completely rewritten real-complex transforms, now using
115
    specialized codelets and an inherently real-complex algorithm for
116
    greatly increased speed.  Also, rfftw can now handle odd sizes and
117
    strided transforms.  Beware that the output format for 1D rfftw
118
    transforms has changed.  See the manual for more details.
119
 
120
  * The complex transforms now use a fast algorithm for large prime
121
    factors, working in O(N lg N) time even for prime sizes.
122
    (Previously, the complexity contained an O(p^2) term, where p is
123
    the largest prime factor of N.  This is still the case for the
124
    rfftw transforms.)  Small prime factors are still more efficient,
125
    however.
126
 
127
  * Added functions fftw_one, fftwnd_one, rfftw_one, etcetera, to
128
    simplify and clarify the use of fftw for single, unit-stride
129
    transforms.
130
 
131
  * Renamed FFTW_COMPLEX, FFTW_REAL to fftw_complex, fftw_real (for
132
    greater consistency in capitalization).  The all-caps names will
133
    continue to be supported indefinitely, but are deprecated.  (Also,
134
    support for the COMPLEX and REAL types from FFTW 1.0 is now
135
    disabled by default.)
136
 
137
  * There are now Fortran-callable wrappers for the rfftw real-complex
138
    transforms.
139
 
140
  * New section of the manual discussing the use of FFTW with multiple
141
    threads, and a new FFTW_THREADSAFE flag (described therein).
142
 
143
  * Added shared library support.  Use configure --enable-shared to
144
    produce a shared library instead of a static library (the default).
145
 
146
  * Dropped support for the operation-count (*_op_count) routines
147
    introduced in v1.3, as these were little-used and were a pain to
148
    keep up-to-date as FFTW changed internally.
149
 
150
  * Made it easier to support floating-point types other than float
151
    and double (e.g. long double).  (See the file fftw-int.h.)
152
 
153
Version 1.3
154
 
155
  * Multi-dimensional transforms contain significant performance
156
    improvements for dimensions >= 3.
157
 
158
  * Performance improvements in multi-dimensional transforms
159
    with howmany > 1 and stride > dist.
160
 
161
  * Improved parallelization and performance in the threads
162
    code for dimensions >= 3.
163
 
164
  * Changed the wisdom import/export format (the new wisdom remembers
165
    the stride of the plan that generated it, for use with the new
166
    create_plan_specific functions).  (You should regenerate any stored
167
    wisdom you have anyway, since this is a new version of FFTW.)
168
 
169
  * Several small fixes to aid compilation on some systems.
170
 
171
Version 1.3b1
172
 
173
  * Fixed a bug in the MPI transform (in the transpose routine) that
174
    caused errors for some array sizes.
175
 
176
  * Fixed the (hopefully) last few things causing problems with C++
177
    compilers.
178
 
179
  * Hack for x86/gcc to properly align local double-precision variables.
180
 
181
  * Completely rewritten codelet generator.  Now it produces
182
    better code for non powers of 2, and is ready to produce
183
    real->complex transforms.
184
 
185
  * Testing algorithm is now more robust, and has a more rigorous
186
    theoretical foundation.  (Bugs in testing large transforms or
187
    in single precision are now fixed--these bugs were only in the
188
    test programs and not in the FFTW library itself.)
189
 
190
  * Added "specific" planners, which allow plan optimization for a
191
    specific array/stride.  They also reduce the memory requirements
192
    of the planner, and permit new optimizations in the multi-dimensional
193
    case.  (See the *_create_plan_specific functions.)
194
 
195
  * FFTW can now compute a count of the number of arithmetic operations
196
    it requires, which is useful for some academic purposes.  (See the
197
    *_count_plan_ops functions.)
198
 
199
  * Adapted for use with GNU autoconf to aid installation on UNIX systems.
200
    (Installation on non-UNIX systems should be the same as before.)
201
 
202
  * Used gettimeofday function if available.  (This function typically
203
    has much higher accuracy than clock(), permitting plans to be
204
    created much more quickly than before on many machines.)
205
 
206
  * Made timing algorithm (hopefully) more robust in the face of
207
    system interrupts, etc.
208
 
209
  * Added wrapper routines for calling FFTW from MATLAB (in the
210
    matlab/ directory).
211
 
212
  * Added wrapper routines for calling FFTW from Fortran (in the
213
    fortran/ directory).  (These were available separately before.)
214
 
215
Version 1.2.1
216
 
217
  * Fixed a third bug in the mpi transpose routines (sheesh!) that
218
    could cause problems when re-using a transpose plan.  Thanks
219
    to Eric Skyllingstad for the bug reports.
220
 
221
  * Fixed another bug in the mpi transpose routines. This bug produced
222
    a memory leak and also occasionally tries to free a null pointer,
223
    which causes problems on some systems.  The mpi transpose/fft routines
224
    now pass all of our malloc paranoia tests.
225
 
226
  * Fixed bug in mpi transpose routines, where wrong results
227
    could be given for some large 2D arrays.
228
 
229
Version 1.2:
230
 
231
  * Added a FAQ (in the FAQ/ directory).
232
 
233
  * Fixed bug in rfftwnd routines where a block was accidentally
234
    allocated to be too small, causing random memory to be
235
    overwritten (yikes!).  (Amazingly, this bug only caused the
236
    test program to fail on one system that we could find.  Our
237
    test suite can now catch this sort of bug.)
238
 
239
  * Abstractified taking differences of times (with fftw_time_diff
240
    macro/function) to allow more general timer data structures.
241
 
242
  * Added "wisdom" mechanism for saving plans & related info.
243
 
244
  * Made timing mechanism more robust and maintainable.  (Instead of
245
    using a fixed number of iterations, we now repeatedly double
246
    the number of iterations until a specified time interval
247
    (FFTW_TIME_MIN) is reached.)
248
 
249
  * Fixed header files to prevent difficulties when a mix of C and
250
    C++ compilers is used, and to prevent problems with multiple
251
    inclusions.
252
 
253
  * Added experimental distributed-memory transforms using MPI.
254
 
255
  * Fixed memory leak in fftwnd_destroy_plan (reported by Richard
256
    Sullivan).  Our test programs now all check for leaks.
257
 
258
Version 1.1:
259
 
260
  * Improved speed (yes!) [Some clever tricks with twiddle factors
261
    and better code generator]
262
 
263
  * Renamed `blocks' to `codelets', just to be fashionable
264
 
265
  * Rewritten planner and executor--much simpler and more readable
266
    code.  Reference-counter garbage collection employed throughout.
267
 
268
  * Much improved codelet generator.  The ML code should be now
269
    readable by humans, and easier to modify.
270
 
271
  * Support for Prime Factor transforms in the codelet generator.
272
 
273
  * Renamed COMPLEX -> FFTW_COMPLEX to avoid clashes with
274
    existing packages.  COMPLEX is still supported
275
    for compatibility with 1.0
276
 
277
  * Added experimental real->complex transform (quick hack,
278
    use at your own risk).
279
 
280
  * Added experimental parallel transforms using Cilk.
281
 
282
  * Added experimental parallel transforms using threads (currently,
283
    POSIX threads and Solaris threads are implemented and tested).
284
 
285
  * Added DOS support, in the sense that we now support 8.3 filenames.
286
 
287
Version 1.0:  First release