Go to most recent revision | Details | Last modification | View Log | RSS feed

Rev | Author | Line No. | Line |
---|---|---|---|

2 | pj | 1 | Version 2.1.2 |

2 | |||

3 | * Fixed incompatibility between our MPI test programs and MPICH with |
||

4 | the p4 device (TCP/IP). (The 2.1.1 transforms worked, but the test |
||

5 | programs crashed.) |
||

6 | |||

7 | * Added missing fftw_f77_threads_init function to the Fortran wrappers |
||

8 | for the multi-threaded transforms. Thanks to V. Sundararajan for |
||

9 | the bug report. |
||

10 | |||

11 | * The codelet generator can now output efficient hard-coded DCT/DST |
||

12 | transforms. As a side effect of this work, we slightly reduced the |
||

13 | code size of rfftw. |
||

14 | |||

15 | * Test programs now support GNU-style long options when used with glibc. |
||

16 | |||

17 | * Added some more ideas to our TODO list. |
||

18 | |||

19 | * Improved codelet generator speed. |
||

20 | |||

21 | Version 2.1.1 |
||

22 | |||

23 | * Fixed bug in the complex transforms for certain sizes with |
||

24 | intermediate-length prime factors (17-97), which under some |
||

25 | (hopefully rare) circumstances could cause incorrect results. |
||

26 | Thanks to Ming-Chang Liu for the bug report and patch. (The test |
||

27 | program will now catch this sort of problem when it is run in |
||

28 | paranoid mode.) |
||

29 | |||

30 | Version 2.1 |
||

31 | |||

32 | * Added Fortran-callable wrapper routines for the multi-threaded |
||

33 | transforms. |
||

34 | |||

35 | * Documentation fixes and improvements. |
||

36 | |||

37 | Version 2.1-beta1 |
||

38 | |||

39 | * The --enable-type-prefix option to configure makes it easy to install |
||

40 | both single- and double-precision versions of FFTW on the same |
||

41 | (Unix) system. (See the installation section of the manual.) |
||

42 | |||

43 | * The MPI FFTW routines now include parallel one-dimensional transforms |
||

44 | for complex data. (See the fftw_mpi documentation in the FFTW |
||

45 | manual.) |
||

46 | |||

47 | * The MPI FFTW routines now include parallel multi-dimensional transforms |
||

48 | specialized for real data. (See the rfftwnd_mpi documentation in the |
||

49 | FFTW manual.) |
||

50 | |||

51 | * The MPI FFTW routines are now documented in the main |
||

52 | manual (in the doc directory). On Unix systems, they are also |
||

53 | automatically configured, compiled, and installed along with the main |
||

54 | FFTW library when you include --enable-mpi in the flags to the |
||

55 | configure script. (See the FFTW manual.) |
||

56 | |||

57 | * Largely-rewritten MPI code. It is now cleaner and (sometimes) faster. |
||

58 | It also supports the option of a user-supplied workspace for (often) |
||

59 | greater performance (using the MPI_Alltoall primitive). Beware that |
||

60 | the interfaces have changed slightly, however. |
||

61 | |||

62 | * The multi-threaded FFTW routines now include parallel one- and |
||

63 | multi-dimensional transforms of real data. (See the rfftw_threads |
||

64 | documentation in the FFTW manual.) |
||

65 | |||

66 | * The multi-threaded FFTW routines are now documented in the main |
||

67 | manual (in the doc directory). On Unix systems, they are also |
||

68 | automatically configured, compiled, and installed along with the main |
||

69 | FFTW library when you include --enable-threads in the flags to the |
||

70 | configure script. (See the FFTW manual.) |
||

71 | |||

72 | * The multi-threaded FFTW routines now include support for Mach C |
||

73 | threads (used, for example, in Apple's MacOS X). |
||

74 | |||

75 | * The Fortran-callable wrapper routines are now incorporated into |
||

76 | the ordinary FFTW libraries by default (although you can |
||

77 | disable this with the --disable-fortran option to configure) and |
||

78 | are documented in the main FFTW manual. |
||

79 | |||

80 | * Added an illustration of the data layout to the rfftwnd tutorial |
||

81 | section of the manual, in the hope of preventing future confusion |
||

82 | on this subject. |
||

83 | |||

84 | * The test programs now allow you to specify multidimensional sizes |
||

85 | (e.g. 128x54x81) for the -c and -s correctness and speed test options. |
||

86 | |||

87 | Version 2.0.1 |
||

88 | |||

89 | * (bug fix) Due to a poorly-parenthesized expression, rfftwnd overflowed |
||

90 | 32-bit integer precision for rank > 1 transforms with a final |
||

91 | dimension >= 65536. This is now fixed. (Thanks to Walter Brisken |
||

92 | for the bug report.) |
||

93 | |||

94 | * (bug fix) Added definition of FFTW_OUT_OF_PLACE to fftw.h. The |
||

95 | flag is mentioned several times in the documentation, but its |
||

96 | definition was accidentally omitted since FFTW_OUT_OF_PLACE is the |
||

97 | default behavior. |
||

98 | |||

99 | * Corrected various small errors in the documentation. Thanks to |
||

100 | Geir Thomassen and Jeremy Buhler for their comments. |
||

101 | |||

102 | * Improved speed of the codelet generator by orders of magnitude, |
||

103 | since a user needed a hard-coded fft of size 101. |
||

104 | |||

105 | * Modified buffering in multidimensional transforms for some speed |
||

106 | improvements (only when fftwnd_create_plan_specific is used). |
||

107 | Thanks to Geert van Kempen for his tips. |
||

108 | |||

109 | * Added Andrew Sterian's patch to allow FFTW to be used as a shared |
||

110 | library more easily on Win32. |
||

111 | |||

112 | Version 2.0 |
||

113 | |||

114 | * Completely rewritten real-complex transforms, now using |
||

115 | specialized codelets and an inherently real-complex algorithm for |
||

116 | greatly increased speed. Also, rfftw can now handle odd sizes and |
||

117 | strided transforms. Beware that the output format for 1D rfftw |
||

118 | transforms has changed. See the manual for more details. |
||

119 | |||

120 | * The complex transforms now use a fast algorithm for large prime |
||

121 | factors, working in O(N lg N) time even for prime sizes. |
||

122 | (Previously, the complexity contained an O(p^2) term, where p is |
||

123 | the largest prime factor of N. This is still the case for the |
||

124 | rfftw transforms.) Small prime factors are still more efficient, |
||

125 | however. |
||

126 | |||

127 | * Added functions fftw_one, fftwnd_one, rfftw_one, etcetera, to |
||

128 | simplify and clarify the use of fftw for single, unit-stride |
||

129 | transforms. |
||

130 | |||

131 | * Renamed FFTW_COMPLEX, FFTW_REAL to fftw_complex, fftw_real (for |
||

132 | greater consistency in capitalization). The all-caps names will |
||

133 | continue to be supported indefinitely, but are deprecated. (Also, |
||

134 | support for the COMPLEX and REAL types from FFTW 1.0 is now |
||

135 | disabled by default.) |
||

136 | |||

137 | * There are now Fortran-callable wrappers for the rfftw real-complex |
||

138 | transforms. |
||

139 | |||

140 | * New section of the manual discussing the use of FFTW with multiple |
||

141 | threads, and a new FFTW_THREADSAFE flag (described therein). |
||

142 | |||

143 | * Added shared library support. Use configure --enable-shared to |
||

144 | produce a shared library instead of a static library (the default). |
||

145 | |||

146 | * Dropped support for the operation-count (*_op_count) routines |
||

147 | introduced in v1.3, as these were little-used and were a pain to |
||

148 | keep up-to-date as FFTW changed internally. |
||

149 | |||

150 | * Made it easier to support floating-point types other than float |
||

151 | and double (e.g. long double). (See the file fftw-int.h.) |
||

152 | |||

153 | Version 1.3 |
||

154 | |||

155 | * Multi-dimensional transforms contain significant performance |
||

156 | improvements for dimensions >= 3. |
||

157 | |||

158 | * Performance improvements in multi-dimensional transforms |
||

159 | with howmany > 1 and stride > dist. |
||

160 | |||

161 | * Improved parallelization and performance in the threads |
||

162 | code for dimensions >= 3. |
||

163 | |||

164 | * Changed the wisdom import/export format (the new wisdom remembers |
||

165 | the stride of the plan that generated it, for use with the new |
||

166 | create_plan_specific functions). (You should regenerate any stored |
||

167 | wisdom you have anyway, since this is a new version of FFTW.) |
||

168 | |||

169 | * Several small fixes to aid compilation on some systems. |
||

170 | |||

171 | Version 1.3b1 |
||

172 | |||

173 | * Fixed a bug in the MPI transform (in the transpose routine) that |
||

174 | caused errors for some array sizes. |
||

175 | |||

176 | * Fixed the (hopefully) last few things causing problems with C++ |
||

177 | compilers. |
||

178 | |||

179 | * Hack for x86/gcc to properly align local double-precision variables. |
||

180 | |||

181 | * Completely rewritten codelet generator. Now it produces |
||

182 | better code for non powers of 2, and is ready to produce |
||

183 | real->complex transforms. |
||

184 | |||

185 | * Testing algorithm is now more robust, and has a more rigorous |
||

186 | theoretical foundation. (Bugs in testing large transforms or |
||

187 | in single precision are now fixed--these bugs were only in the |
||

188 | test programs and not in the FFTW library itself.) |
||

189 | |||

190 | * Added "specific" planners, which allow plan optimization for a |
||

191 | specific array/stride. They also reduce the memory requirements |
||

192 | of the planner, and permit new optimizations in the multi-dimensional |
||

193 | case. (See the *_create_plan_specific functions.) |
||

194 | |||

195 | * FFTW can now compute a count of the number of arithmetic operations |
||

196 | it requires, which is useful for some academic purposes. (See the |
||

197 | *_count_plan_ops functions.) |
||

198 | |||

199 | * Adapted for use with GNU autoconf to aid installation on UNIX systems. |
||

200 | (Installation on non-UNIX systems should be the same as before.) |
||

201 | |||

202 | * Used gettimeofday function if available. (This function typically |
||

203 | has much higher accuracy than clock(), permitting plans to be |
||

204 | created much more quickly than before on many machines.) |
||

205 | |||

206 | * Made timing algorithm (hopefully) more robust in the face of |
||

207 | system interrupts, etc. |
||

208 | |||

209 | * Added wrapper routines for calling FFTW from MATLAB (in the |
||

210 | matlab/ directory). |
||

211 | |||

212 | * Added wrapper routines for calling FFTW from Fortran (in the |
||

213 | fortran/ directory). (These were available separately before.) |
||

214 | |||

215 | Version 1.2.1 |
||

216 | |||

217 | * Fixed a third bug in the mpi transpose routines (sheesh!) that |
||

218 | could cause problems when re-using a transpose plan. Thanks |
||

219 | to Eric Skyllingstad for the bug reports. |
||

220 | |||

221 | * Fixed another bug in the mpi transpose routines. This bug produced |
||

222 | a memory leak and also occasionally tries to free a null pointer, |
||

223 | which causes problems on some systems. The mpi transpose/fft routines |
||

224 | now pass all of our malloc paranoia tests. |
||

225 | |||

226 | * Fixed bug in mpi transpose routines, where wrong results |
||

227 | could be given for some large 2D arrays. |
||

228 | |||

229 | Version 1.2: |
||

230 | |||

231 | * Added a FAQ (in the FAQ/ directory). |
||

232 | |||

233 | * Fixed bug in rfftwnd routines where a block was accidentally |
||

234 | allocated to be too small, causing random memory to be |
||

235 | overwritten (yikes!). (Amazingly, this bug only caused the |
||

236 | test program to fail on one system that we could find. Our |
||

237 | test suite can now catch this sort of bug.) |
||

238 | |||

239 | * Abstractified taking differences of times (with fftw_time_diff |
||

240 | macro/function) to allow more general timer data structures. |
||

241 | |||

242 | * Added "wisdom" mechanism for saving plans & related info. |
||

243 | |||

244 | * Made timing mechanism more robust and maintainable. (Instead of |
||

245 | using a fixed number of iterations, we now repeatedly double |
||

246 | the number of iterations until a specified time interval |
||

247 | (FFTW_TIME_MIN) is reached.) |
||

248 | |||

249 | * Fixed header files to prevent difficulties when a mix of C and |
||

250 | C++ compilers is used, and to prevent problems with multiple |
||

251 | inclusions. |
||

252 | |||

253 | * Added experimental distributed-memory transforms using MPI. |
||

254 | |||

255 | * Fixed memory leak in fftwnd_destroy_plan (reported by Richard |
||

256 | Sullivan). Our test programs now all check for leaks. |
||

257 | |||

258 | Version 1.1: |
||

259 | |||

260 | * Improved speed (yes!) [Some clever tricks with twiddle factors |
||

261 | and better code generator] |
||

262 | |||

263 | * Renamed `blocks' to `codelets', just to be fashionable |
||

264 | |||

265 | * Rewritten planner and executor--much simpler and more readable |
||

266 | code. Reference-counter garbage collection employed throughout. |
||

267 | |||

268 | * Much improved codelet generator. The ML code should be now |
||

269 | readable by humans, and easier to modify. |
||

270 | |||

271 | * Support for Prime Factor transforms in the codelet generator. |
||

272 | |||

273 | * Renamed COMPLEX -> FFTW_COMPLEX to avoid clashes with |
||

274 | existing packages. COMPLEX is still supported |
||

275 | for compatibility with 1.0 |
||

276 | |||

277 | * Added experimental real->complex transform (quick hack, |
||

278 | use at your own risk). |
||

279 | |||

280 | * Added experimental parallel transforms using Cilk. |
||

281 | |||

282 | * Added experimental parallel transforms using threads (currently, |
||

283 | POSIX threads and Solaris threads are implemented and tested). |
||

284 | |||

285 | * Added DOS support, in the sense that we now support 8.3 filenames. |
||

286 | |||

287 | Version 1.0: First release |