Rev 2 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
2 | pj | 1 | Version 2.1.2 |
2 | |||
3 | * Fixed incompatibility between our MPI test programs and MPICH with |
||
4 | the p4 device (TCP/IP). (The 2.1.1 transforms worked, but the test |
||
5 | programs crashed.) |
||
6 | |||
7 | * Added missing fftw_f77_threads_init function to the Fortran wrappers |
||
8 | for the multi-threaded transforms. Thanks to V. Sundararajan for |
||
9 | the bug report. |
||
10 | |||
11 | * The codelet generator can now output efficient hard-coded DCT/DST |
||
12 | transforms. As a side effect of this work, we slightly reduced the |
||
13 | code size of rfftw. |
||
14 | |||
15 | * Test programs now support GNU-style long options when used with glibc. |
||
16 | |||
17 | * Added some more ideas to our TODO list. |
||
18 | |||
19 | * Improved codelet generator speed. |
||
20 | |||
21 | Version 2.1.1 |
||
22 | |||
23 | * Fixed bug in the complex transforms for certain sizes with |
||
24 | intermediate-length prime factors (17-97), which under some |
||
25 | (hopefully rare) circumstances could cause incorrect results. |
||
26 | Thanks to Ming-Chang Liu for the bug report and patch. (The test |
||
27 | program will now catch this sort of problem when it is run in |
||
28 | paranoid mode.) |
||
29 | |||
30 | Version 2.1 |
||
31 | |||
32 | * Added Fortran-callable wrapper routines for the multi-threaded |
||
33 | transforms. |
||
34 | |||
35 | * Documentation fixes and improvements. |
||
36 | |||
37 | Version 2.1-beta1 |
||
38 | |||
39 | * The --enable-type-prefix option to configure makes it easy to install |
||
40 | both single- and double-precision versions of FFTW on the same |
||
41 | (Unix) system. (See the installation section of the manual.) |
||
42 | |||
43 | * The MPI FFTW routines now include parallel one-dimensional transforms |
||
44 | for complex data. (See the fftw_mpi documentation in the FFTW |
||
45 | manual.) |
||
46 | |||
47 | * The MPI FFTW routines now include parallel multi-dimensional transforms |
||
48 | specialized for real data. (See the rfftwnd_mpi documentation in the |
||
49 | FFTW manual.) |
||
50 | |||
51 | * The MPI FFTW routines are now documented in the main |
||
52 | manual (in the doc directory). On Unix systems, they are also |
||
53 | automatically configured, compiled, and installed along with the main |
||
54 | FFTW library when you include --enable-mpi in the flags to the |
||
55 | configure script. (See the FFTW manual.) |
||
56 | |||
57 | * Largely-rewritten MPI code. It is now cleaner and (sometimes) faster. |
||
58 | It also supports the option of a user-supplied workspace for (often) |
||
59 | greater performance (using the MPI_Alltoall primitive). Beware that |
||
60 | the interfaces have changed slightly, however. |
||
61 | |||
62 | * The multi-threaded FFTW routines now include parallel one- and |
||
63 | multi-dimensional transforms of real data. (See the rfftw_threads |
||
64 | documentation in the FFTW manual.) |
||
65 | |||
66 | * The multi-threaded FFTW routines are now documented in the main |
||
67 | manual (in the doc directory). On Unix systems, they are also |
||
68 | automatically configured, compiled, and installed along with the main |
||
69 | FFTW library when you include --enable-threads in the flags to the |
||
70 | configure script. (See the FFTW manual.) |
||
71 | |||
72 | * The multi-threaded FFTW routines now include support for Mach C |
||
73 | threads (used, for example, in Apple's MacOS X). |
||
74 | |||
75 | * The Fortran-callable wrapper routines are now incorporated into |
||
76 | the ordinary FFTW libraries by default (although you can |
||
77 | disable this with the --disable-fortran option to configure) and |
||
78 | are documented in the main FFTW manual. |
||
79 | |||
80 | * Added an illustration of the data layout to the rfftwnd tutorial |
||
81 | section of the manual, in the hope of preventing future confusion |
||
82 | on this subject. |
||
83 | |||
84 | * The test programs now allow you to specify multidimensional sizes |
||
85 | (e.g. 128x54x81) for the -c and -s correctness and speed test options. |
||
86 | |||
87 | Version 2.0.1 |
||
88 | |||
89 | * (bug fix) Due to a poorly-parenthesized expression, rfftwnd overflowed |
||
90 | 32-bit integer precision for rank > 1 transforms with a final |
||
91 | dimension >= 65536. This is now fixed. (Thanks to Walter Brisken |
||
92 | for the bug report.) |
||
93 | |||
94 | * (bug fix) Added definition of FFTW_OUT_OF_PLACE to fftw.h. The |
||
95 | flag is mentioned several times in the documentation, but its |
||
96 | definition was accidentally omitted since FFTW_OUT_OF_PLACE is the |
||
97 | default behavior. |
||
98 | |||
99 | * Corrected various small errors in the documentation. Thanks to |
||
100 | Geir Thomassen and Jeremy Buhler for their comments. |
||
101 | |||
102 | * Improved speed of the codelet generator by orders of magnitude, |
||
103 | since a user needed a hard-coded fft of size 101. |
||
104 | |||
105 | * Modified buffering in multidimensional transforms for some speed |
||
106 | improvements (only when fftwnd_create_plan_specific is used). |
||
107 | Thanks to Geert van Kempen for his tips. |
||
108 | |||
109 | * Added Andrew Sterian's patch to allow FFTW to be used as a shared |
||
110 | library more easily on Win32. |
||
111 | |||
112 | Version 2.0 |
||
113 | |||
114 | * Completely rewritten real-complex transforms, now using |
||
115 | specialized codelets and an inherently real-complex algorithm for |
||
116 | greatly increased speed. Also, rfftw can now handle odd sizes and |
||
117 | strided transforms. Beware that the output format for 1D rfftw |
||
118 | transforms has changed. See the manual for more details. |
||
119 | |||
120 | * The complex transforms now use a fast algorithm for large prime |
||
121 | factors, working in O(N lg N) time even for prime sizes. |
||
122 | (Previously, the complexity contained an O(p^2) term, where p is |
||
123 | the largest prime factor of N. This is still the case for the |
||
124 | rfftw transforms.) Small prime factors are still more efficient, |
||
125 | however. |
||
126 | |||
127 | * Added functions fftw_one, fftwnd_one, rfftw_one, etcetera, to |
||
128 | simplify and clarify the use of fftw for single, unit-stride |
||
129 | transforms. |
||
130 | |||
131 | * Renamed FFTW_COMPLEX, FFTW_REAL to fftw_complex, fftw_real (for |
||
132 | greater consistency in capitalization). The all-caps names will |
||
133 | continue to be supported indefinitely, but are deprecated. (Also, |
||
134 | support for the COMPLEX and REAL types from FFTW 1.0 is now |
||
135 | disabled by default.) |
||
136 | |||
137 | * There are now Fortran-callable wrappers for the rfftw real-complex |
||
138 | transforms. |
||
139 | |||
140 | * New section of the manual discussing the use of FFTW with multiple |
||
141 | threads, and a new FFTW_THREADSAFE flag (described therein). |
||
142 | |||
143 | * Added shared library support. Use configure --enable-shared to |
||
144 | produce a shared library instead of a static library (the default). |
||
145 | |||
146 | * Dropped support for the operation-count (*_op_count) routines |
||
147 | introduced in v1.3, as these were little-used and were a pain to |
||
148 | keep up-to-date as FFTW changed internally. |
||
149 | |||
150 | * Made it easier to support floating-point types other than float |
||
151 | and double (e.g. long double). (See the file fftw-int.h.) |
||
152 | |||
153 | Version 1.3 |
||
154 | |||
155 | * Multi-dimensional transforms contain significant performance |
||
156 | improvements for dimensions >= 3. |
||
157 | |||
158 | * Performance improvements in multi-dimensional transforms |
||
159 | with howmany > 1 and stride > dist. |
||
160 | |||
161 | * Improved parallelization and performance in the threads |
||
162 | code for dimensions >= 3. |
||
163 | |||
164 | * Changed the wisdom import/export format (the new wisdom remembers |
||
165 | the stride of the plan that generated it, for use with the new |
||
166 | create_plan_specific functions). (You should regenerate any stored |
||
167 | wisdom you have anyway, since this is a new version of FFTW.) |
||
168 | |||
169 | * Several small fixes to aid compilation on some systems. |
||
170 | |||
171 | Version 1.3b1 |
||
172 | |||
173 | * Fixed a bug in the MPI transform (in the transpose routine) that |
||
174 | caused errors for some array sizes. |
||
175 | |||
176 | * Fixed the (hopefully) last few things causing problems with C++ |
||
177 | compilers. |
||
178 | |||
179 | * Hack for x86/gcc to properly align local double-precision variables. |
||
180 | |||
181 | * Completely rewritten codelet generator. Now it produces |
||
182 | better code for non powers of 2, and is ready to produce |
||
183 | real->complex transforms. |
||
184 | |||
185 | * Testing algorithm is now more robust, and has a more rigorous |
||
186 | theoretical foundation. (Bugs in testing large transforms or |
||
187 | in single precision are now fixed--these bugs were only in the |
||
188 | test programs and not in the FFTW library itself.) |
||
189 | |||
190 | * Added "specific" planners, which allow plan optimization for a |
||
191 | specific array/stride. They also reduce the memory requirements |
||
192 | of the planner, and permit new optimizations in the multi-dimensional |
||
193 | case. (See the *_create_plan_specific functions.) |
||
194 | |||
195 | * FFTW can now compute a count of the number of arithmetic operations |
||
196 | it requires, which is useful for some academic purposes. (See the |
||
197 | *_count_plan_ops functions.) |
||
198 | |||
199 | * Adapted for use with GNU autoconf to aid installation on UNIX systems. |
||
200 | (Installation on non-UNIX systems should be the same as before.) |
||
201 | |||
202 | * Used gettimeofday function if available. (This function typically |
||
203 | has much higher accuracy than clock(), permitting plans to be |
||
204 | created much more quickly than before on many machines.) |
||
205 | |||
206 | * Made timing algorithm (hopefully) more robust in the face of |
||
207 | system interrupts, etc. |
||
208 | |||
209 | * Added wrapper routines for calling FFTW from MATLAB (in the |
||
210 | matlab/ directory). |
||
211 | |||
212 | * Added wrapper routines for calling FFTW from Fortran (in the |
||
213 | fortran/ directory). (These were available separately before.) |
||
214 | |||
215 | Version 1.2.1 |
||
216 | |||
217 | * Fixed a third bug in the mpi transpose routines (sheesh!) that |
||
218 | could cause problems when re-using a transpose plan. Thanks |
||
219 | to Eric Skyllingstad for the bug reports. |
||
220 | |||
221 | * Fixed another bug in the mpi transpose routines. This bug produced |
||
222 | a memory leak and also occasionally tries to free a null pointer, |
||
223 | which causes problems on some systems. The mpi transpose/fft routines |
||
224 | now pass all of our malloc paranoia tests. |
||
225 | |||
226 | * Fixed bug in mpi transpose routines, where wrong results |
||
227 | could be given for some large 2D arrays. |
||
228 | |||
229 | Version 1.2: |
||
230 | |||
231 | * Added a FAQ (in the FAQ/ directory). |
||
232 | |||
233 | * Fixed bug in rfftwnd routines where a block was accidentally |
||
234 | allocated to be too small, causing random memory to be |
||
235 | overwritten (yikes!). (Amazingly, this bug only caused the |
||
236 | test program to fail on one system that we could find. Our |
||
237 | test suite can now catch this sort of bug.) |
||
238 | |||
239 | * Abstractified taking differences of times (with fftw_time_diff |
||
240 | macro/function) to allow more general timer data structures. |
||
241 | |||
242 | * Added "wisdom" mechanism for saving plans & related info. |
||
243 | |||
244 | * Made timing mechanism more robust and maintainable. (Instead of |
||
245 | using a fixed number of iterations, we now repeatedly double |
||
246 | the number of iterations until a specified time interval |
||
247 | (FFTW_TIME_MIN) is reached.) |
||
248 | |||
249 | * Fixed header files to prevent difficulties when a mix of C and |
||
250 | C++ compilers is used, and to prevent problems with multiple |
||
251 | inclusions. |
||
252 | |||
253 | * Added experimental distributed-memory transforms using MPI. |
||
254 | |||
255 | * Fixed memory leak in fftwnd_destroy_plan (reported by Richard |
||
256 | Sullivan). Our test programs now all check for leaks. |
||
257 | |||
258 | Version 1.1: |
||
259 | |||
260 | * Improved speed (yes!) [Some clever tricks with twiddle factors |
||
261 | and better code generator] |
||
262 | |||
263 | * Renamed `blocks' to `codelets', just to be fashionable |
||
264 | |||
265 | * Rewritten planner and executor--much simpler and more readable |
||
266 | code. Reference-counter garbage collection employed throughout. |
||
267 | |||
268 | * Much improved codelet generator. The ML code should be now |
||
269 | readable by humans, and easier to modify. |
||
270 | |||
271 | * Support for Prime Factor transforms in the codelet generator. |
||
272 | |||
273 | * Renamed COMPLEX -> FFTW_COMPLEX to avoid clashes with |
||
274 | existing packages. COMPLEX is still supported |
||
275 | for compatibility with 1.0 |
||
276 | |||
277 | * Added experimental real->complex transform (quick hack, |
||
278 | use at your own risk). |
||
279 | |||
280 | * Added experimental parallel transforms using Cilk. |
||
281 | |||
282 | * Added experimental parallel transforms using threads (currently, |
||
283 | POSIX threads and Solaris threads are implemented and tested). |
||
284 | |||
285 | * Added DOS support, in the sense that we now support 8.3 filenames. |
||
286 | |||
287 | Version 1.0: First release |