Rev 3 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
2 | pj | 1 | \input texinfo @c -*- texinfo -*- |
2 | @c % $Id: fftw.tex,v 1.1.1.1 2002-03-29 14:12:55 pj Exp $ |
||
3 | @c %**start of header |
||
4 | @setfilename fftw.info |
||
5 | @settitle FFTW |
||
6 | @c %**end of header |
||
7 | |||
8 | @include version.texi |
||
9 | @setchapternewpage odd |
||
10 | @c define constant index (ct) |
||
11 | @defcodeindex ct |
||
12 | @syncodeindex ct fn |
||
13 | @syncodeindex vr fn |
||
14 | @syncodeindex pg fn |
||
15 | @syncodeindex tp fn |
||
16 | @c define foreign function index (ff) |
||
17 | @defcodeindex ff |
||
18 | @syncodeindex ff cp |
||
19 | @c define foreign constant index (fc) |
||
20 | @defcodeindex fc |
||
21 | @syncodeindex fc cp |
||
22 | @c define foreign program index (fp) |
||
23 | @defcodeindex fp |
||
24 | @syncodeindex fp cp |
||
25 | |||
26 | @ifinfo |
||
27 | This is the FFTW User's manual. |
||
28 | |||
29 | Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology |
||
30 | |||
31 | Permission is granted to make and distribute verbatim copies of this |
||
32 | manual provided the copyright notice and this permission notice are |
||
33 | preserved on all copies. |
||
34 | |||
35 | Permission is granted to copy and distribute modified versions of this |
||
36 | manual under the conditions for verbatim copying, provided that the |
||
37 | entire resulting derived work is distributed under the terms of a |
||
38 | permission notice identical to this one. |
||
39 | |||
40 | Permission is granted to copy and distribute translations of this manual |
||
41 | into another language, under the above conditions for modified versions, |
||
42 | except that this permission notice may be stated in a translation |
||
43 | approved by the Free Software Foundation. |
||
44 | @end ifinfo |
||
45 | |||
46 | @titlepage |
||
47 | @sp 10 |
||
48 | @comment The title is printed in a large font. |
||
49 | @title{FFTW User's Manual} |
||
50 | @subtitle For version @value{VERSION}, @value{UPDATED} |
||
51 | @author{Matteo Frigo} |
||
52 | @author{Steven G. Johnson} |
||
53 | |||
54 | @c The following two commands start the copyright page. |
||
55 | @page |
||
56 | @vskip 0pt plus 1filll |
||
57 | Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology. |
||
58 | |||
59 | Permission is granted to make and distribute verbatim copies of this |
||
60 | manual provided the copyright notice and this permission notice are |
||
61 | preserved on all copies. |
||
62 | |||
63 | Permission is granted to copy and distribute modified versions of this |
||
64 | manual under the conditions for verbatim copying, provided that the |
||
65 | entire resulting derived work is distributed under the terms of a |
||
66 | permission notice identical to this one. |
||
67 | |||
68 | Permission is granted to copy and distribute translations of this manual |
||
69 | into another language, under the above conditions for modified versions, |
||
70 | except that this permission notice may be stated in a translation |
||
71 | approved by the Free Software Foundation. |
||
72 | @end titlepage |
||
73 | |||
74 | @node Top, Introduction, (dir), (dir) |
||
75 | @ifinfo |
||
76 | @top FFTW User Manual |
||
77 | Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a |
||
78 | collection of fast C routines to compute the discrete Fourier transform. |
||
79 | This manual documents FFTW version @value{VERSION}. |
||
80 | @end ifinfo |
||
81 | |||
82 | @menu |
||
83 | * Introduction:: |
||
84 | * Tutorial:: |
||
85 | * FFTW Reference:: |
||
86 | * Parallel FFTW:: |
||
87 | * Calling FFTW from Fortran:: |
||
88 | * Installation and Customization:: |
||
89 | * Acknowledgments:: |
||
90 | * License and Copyright:: |
||
91 | * Concept Index:: |
||
92 | * Library Index:: |
||
93 | |||
94 | @detailmenu --- The Detailed Node Listing --- |
||
95 | |||
96 | Tutorial |
||
97 | |||
98 | * Complex One-dimensional Transforms Tutorial:: |
||
99 | * Complex Multi-dimensional Transforms Tutorial:: |
||
100 | * Real One-dimensional Transforms Tutorial:: |
||
101 | * Real Multi-dimensional Transforms Tutorial:: |
||
102 | * Multi-dimensional Array Format:: |
||
103 | * Words of Wisdom:: |
||
104 | |||
105 | Multi-dimensional Array Format |
||
106 | |||
107 | * Row-major Format:: |
||
108 | * Column-major Format:: |
||
109 | * Static Arrays in C:: |
||
110 | * Dynamic Arrays in C:: |
||
111 | * Dynamic Arrays in C-The Wrong Way:: |
||
112 | |||
113 | Words of Wisdom |
||
114 | |||
115 | * Caveats in Using Wisdom:: What you should worry about in using wisdom |
||
116 | * Importing and Exporting Wisdom:: I/O of wisdom to disk and other media |
||
117 | |||
118 | FFTW Reference |
||
119 | |||
120 | * Data Types:: real, complex, and halfcomplex numbers |
||
121 | * One-dimensional Transforms Reference:: |
||
122 | * Multi-dimensional Transforms Reference:: |
||
123 | * Real One-dimensional Transforms Reference:: |
||
124 | * Real Multi-dimensional Transforms Reference:: |
||
125 | * Wisdom Reference:: |
||
126 | * Memory Allocator Reference:: |
||
127 | * Thread safety:: |
||
128 | |||
129 | One-dimensional Transforms Reference |
||
130 | |||
131 | * fftw_create_plan:: Plan Creation |
||
132 | * Discussion on Specific Plans:: |
||
133 | * fftw:: Plan Execution |
||
134 | * fftw_destroy_plan:: Plan Destruction |
||
135 | * What FFTW Really Computes:: Definition of the DFT. |
||
136 | |||
137 | Multi-dimensional Transforms Reference |
||
138 | |||
139 | * fftwnd_create_plan:: Plan Creation |
||
140 | * fftwnd:: Plan Execution |
||
141 | * fftwnd_destroy_plan:: Plan Destruction |
||
142 | * What FFTWND Really Computes:: |
||
143 | |||
144 | Real One-dimensional Transforms Reference |
||
145 | |||
146 | * rfftw_create_plan:: Plan Creation |
||
147 | * rfftw:: Plan Execution |
||
148 | * rfftw_destroy_plan:: Plan Destruction |
||
149 | * What RFFTW Really Computes:: |
||
150 | |||
151 | Real Multi-dimensional Transforms Reference |
||
152 | |||
153 | * rfftwnd_create_plan:: Plan Creation |
||
154 | * rfftwnd:: Plan Execution |
||
155 | * Array Dimensions for Real Multi-dimensional Transforms:: |
||
156 | * Strides in In-place RFFTWND:: |
||
157 | * rfftwnd_destroy_plan:: Plan Destruction |
||
158 | * What RFFTWND Really Computes:: |
||
159 | |||
160 | Wisdom Reference |
||
161 | |||
162 | * fftw_export_wisdom:: |
||
163 | * fftw_import_wisdom:: |
||
164 | * fftw_forget_wisdom:: |
||
165 | |||
166 | Parallel FFTW |
||
167 | |||
168 | * Multi-threaded FFTW:: |
||
169 | * MPI FFTW:: |
||
170 | |||
171 | Multi-threaded FFTW |
||
172 | |||
173 | * Installation and Supported Hardware/Software:: |
||
174 | * Usage of Multi-threaded FFTW:: |
||
175 | * How Many Threads to Use?:: |
||
176 | * Using Multi-threaded FFTW in a Multi-threaded Program:: |
||
177 | * Tips for Optimal Threading:: |
||
178 | |||
179 | MPI FFTW |
||
180 | |||
181 | * MPI FFTW Installation:: |
||
182 | * Usage of MPI FFTW for Complex Multi-dimensional Transforms:: |
||
183 | * MPI Data Layout:: |
||
184 | * Usage of MPI FFTW for Real Multi-dimensional Transforms:: |
||
185 | * Usage of MPI FFTW for Complex One-dimensional Transforms:: |
||
186 | * MPI Tips:: |
||
187 | |||
188 | Calling FFTW from Fortran |
||
189 | |||
190 | * Wrapper Routines:: |
||
191 | * FFTW Constants in Fortran:: |
||
192 | * Fortran Examples:: |
||
193 | |||
194 | Installation and Customization |
||
195 | |||
196 | * Installation on Unix:: |
||
197 | * Installation on non-Unix Systems:: |
||
198 | * Installing FFTW in both single and double precision:: |
||
199 | * gcc and Pentium/PentiumPro hacks:: |
||
200 | * Customizing the timer:: |
||
201 | * Generating your own code:: |
||
202 | |||
203 | @end detailmenu |
||
204 | @end menu |
||
205 | |||
206 | @c ************************************************************ |
||
207 | @node Introduction, Tutorial, Top, Top |
||
208 | @chapter Introduction |
||
209 | This manual documents version @value{VERSION} of FFTW, the @emph{Fastest |
||
210 | Fourier Transform in the West}. FFTW is a comprehensive collection of |
||
211 | fast C routines for computing the discrete Fourier transform (DFT) in |
||
212 | one or more dimensions, of both real and complex data, and of arbitrary |
||
213 | input size. FFTW also includes parallel transforms for both shared- and |
||
214 | distributed-memory systems. We assume herein that the reader is already |
||
215 | familiar with the properties and uses of the DFT that are relevant to |
||
216 | her application. Otherwise, see e.g. @cite{The Fast Fourier Transform} |
||
217 | by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974). |
||
218 | @uref{http://theory.lcs.mit.edu/~fftw, Our web page} also has links to |
||
219 | FFT-related information online. |
||
220 | @cindex FFTW |
||
221 | |||
222 | FFTW is usually faster (and sometimes much faster) than all other |
||
223 | freely-available Fourier transform programs found on the Net. For |
||
224 | transforms whose size is a power of two, it compares favorably with the |
||
225 | FFT codes in Sun's Performance Library and IBM's ESSL library, which are |
||
226 | targeted at specific machines. Moreover, FFTW's performance is |
||
227 | @emph{portable}. Indeed, FFTW is unique in that it automatically adapts |
||
228 | itself to your machine, your cache, the size of your memory, the number |
||
229 | of registers, and all the other factors that normally make it impossible |
||
230 | to optimize a program for more than one machine. An extensive |
||
231 | comparison of FFTW's performance with that of other Fourier transform |
||
232 | codes has been made. The results are available on the Web at |
||
233 | @uref{http://theory.lcs.mit.edu/~benchfft, the benchFFT home page}. |
||
234 | @cindex benchmark |
||
235 | @fpindex benchfft |
||
236 | |||
237 | In order to use FFTW effectively, you need to understand one basic |
||
238 | concept of FFTW's internal structure. FFTW does not used a fixed |
||
239 | algorithm for computing the transform, but it can adapt the DFT |
||
240 | algorithm to details of the underlying hardware in order to achieve best |
||
241 | performance. Hence, the computation of the transform is split into two |
||
242 | phases. First, FFTW's @dfn{planner} is called, which ``learns'' the |
||
243 | @cindex plan |
||
244 | fastest way to compute the transform on your machine. The planner |
||
245 | @cindex planner |
||
246 | produces a data structure called a @dfn{plan} that contains this |
||
247 | information. Subsequently, the plan is passed to FFTW's @dfn{executor}, |
||
248 | @cindex executor |
||
249 | along with an array of input data. The executor computes the actual |
||
250 | transform, as dictated by the plan. The plan can be reused as many |
||
251 | times as needed. In typical high-performance applications, many |
||
252 | transforms of the same size are computed, and consequently a |
||
253 | relatively-expensive initialization of this sort is acceptable. On the |
||
254 | other hand, if you need a single transform of a given size, the one-time |
||
255 | cost of the planner becomes significant. For this case, FFTW provides |
||
256 | fast planners based on heuristics or on previously computed plans. |
||
257 | |||
258 | The pattern of planning/execution applies to all four operation modes of |
||
259 | FFTW, that is, @w{I) one-dimensional} complex transforms (FFTW), @w{II) |
||
260 | multi-dimensional} complex transforms (FFTWND), @w{III) one-dimensional} |
||
261 | transforms of real data (RFFTW), @w{IV) multi-dimensional} transforms of |
||
262 | real data (RFFTWND). Each mode comes with its own planner and executor. |
||
263 | |||
264 | Besides the automatic performance adaptation performed by the planner, |
||
265 | it is also possible for advanced users to customize FFTW for their |
||
266 | special needs. As distributed, FFTW works most efficiently for arrays |
||
267 | whose size can be factored into small primes (@math{2}, @math{3}, |
||
268 | @math{5}, and @math{7}), and uses a slower general-purpose routine for |
||
269 | other factors. FFTW, however, comes with a code generator that can |
||
270 | produce fast C programs for any particular array size you may care |
||
271 | about. |
||
272 | @cindex code generator |
||
273 | For example, if you need transforms of size |
||
274 | @ifinfo |
||
275 | @math{513 = 19 x 3^3}, |
||
276 | @end ifinfo |
||
277 | @iftex |
||
278 | @tex |
||
279 | $513 = 19 \cdot 3^3$, |
||
280 | @end tex |
||
281 | @end iftex |
||
282 | @ifhtml |
||
283 | 513 = 19*3<sup>3</sup>, |
||
284 | @end ifhtml |
||
285 | you can customize FFTW to support the factor @math{19} efficiently. |
||
286 | |||
287 | FFTW can exploit multiple processors if you have them. FFTW comes with |
||
288 | a shared-memory implementation on top of POSIX (and similar) threads, as |
||
289 | well as a distributed-memory implementation based on MPI. |
||
290 | @cindex parallel transform |
||
291 | @cindex threads |
||
292 | @cindex MPI |
||
293 | We also provide an experimental parallel implementation written in Cilk, |
||
294 | @emph{the superior programming tool of choice for discriminating |
||
295 | hackers} (Olin Shivers). (See @uref{http://supertech.lcs.mit.edu/cilk, |
||
296 | the Cilk home page}.) |
||
297 | @cindex Cilk |
||
298 | |||
299 | For more information regarding FFTW, see the paper, ``The Fastest |
||
300 | Fourier Transform in the West,'' by M. Frigo and S. G. Johnson, which is |
||
301 | the technical report MIT-LCS-TR-728 (Sep. '97). See also, ``FFTW: An |
||
302 | Adaptive Software Architecture for the FFT,'' by M. Frigo and |
||
303 | S. G. Johnson, which appeared in the 23rd International Conference on |
||
304 | Acoustics, Speech, and Signal Processing (@cite{Proc. ICASSP 1998} |
||
305 | @b{3}, p. 1381). The code generator is described in the paper ``A Fast |
||
306 | Fourier Transform Compiler'', |
||
307 | @cindex compiler |
||
308 | by M. Frigo, to appear in the @cite{Proceedings of the 1999 ACM SIGPLAN |
||
309 | Conference on Programming Language Design and Implementation (PLDI), |
||
310 | Atlanta, Georgia, May 1999}. These papers, along with the latest |
||
311 | version of FFTW, the FAQ, benchmarks, and other links, are available at |
||
312 | @uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}. The current |
||
313 | version of FFTW incorporates many good ideas from the past thirty years |
||
314 | of FFT literature. In one way or another, FFTW uses the Cooley-Tukey |
||
315 | algorithm, the Prime Factor algorithm, Rader's algorithm for prime |
||
316 | sizes, and the split-radix algorithm (with a variation due to Dan |
||
317 | Bernstein). Our code generator also produces new algorithms that we do |
||
318 | not yet completely understand. |
||
319 | @cindex algorithm |
||
320 | The reader is referred to the cited papers for the appropriate |
||
321 | references. |
||
322 | |||
323 | The rest of this manual is organized as follows. We first discuss the |
||
324 | sequential (one-processor) implementation. We start by describing the |
||
325 | basic features of FFTW in @ref{Tutorial}. This discussion includes the |
||
326 | storage scheme of multi-dimensional arrays (@ref{Multi-dimensional Array |
||
327 | Format}) and FFTW's mechanisms for storing plans on disk (@ref{Words of |
||
328 | Wisdom}). Next, @ref{FFTW Reference} provides comprehensive |
||
329 | documentation of all FFTW's features. Parallel transforms are discussed |
||
330 | in their own chapter @ref{Parallel FFTW}. Fortran programmers can also |
||
331 | use FFTW, as described in @ref{Calling FFTW from Fortran}. |
||
332 | @ref{Installation and Customization} explains how to install FFTW in |
||
333 | your computer system and how to adapt FFTW to your needs. License and |
||
334 | copyright information is given in @ref{License and Copyright}. Finally, |
||
335 | we thank all the people who helped us in @ref{Acknowledgments}. |
||
336 | |||
337 | @c ************************************************************ |
||
338 | @node Tutorial, FFTW Reference, Introduction, Top |
||
339 | @chapter Tutorial |
||
340 | @cindex Tutorial |
||
341 | This chapter describes the basic usage of FFTW, i.e., how to compute the |
||
342 | Fourier transform of a single array. This chapter tells the truth, but |
||
343 | not the @emph{whole} truth. Specifically, FFTW implements additional |
||
344 | routines and flags, providing extra functionality, that are not |
||
345 | documented here. @xref{FFTW Reference}, for more complete information. |
||
346 | (Note that you need to compile and install FFTW before you can use it in |
||
347 | a program. @xref{Installation and Customization}, for the details of |
||
348 | the installation.) |
||
349 | |||
350 | This tutorial chapter is structured as follows. @ref{Complex |
||
351 | One-dimensional Transforms Tutorial} describes the basic usage of the |
||
352 | one-dimensional transform of complex data. @ref{Complex |
||
353 | Multi-dimensional Transforms Tutorial} describes the basic usage of the |
||
354 | multi-dimensional transform of complex data. @ref{Real One-dimensional |
||
355 | Transforms Tutorial} describes the one-dimensional transform of real |
||
356 | data and its inverse. Finally, @ref{Real Multi-dimensional Transforms |
||
357 | Tutorial} describes the multi-dimensional transform of real data and its |
||
358 | inverse. We recommend that you read these sections in the order that |
||
359 | they are presented. We then discuss two topics in detail. In |
||
360 | @ref{Multi-dimensional Array Format}, we discuss the various |
||
361 | alternatives for storing multi-dimensional arrays in memory. @ref{Words |
||
362 | of Wisdom} shows how you can save FFTW's plans for future use. |
||
363 | |||
364 | @menu |
||
365 | * Complex One-dimensional Transforms Tutorial:: |
||
366 | * Complex Multi-dimensional Transforms Tutorial:: |
||
367 | * Real One-dimensional Transforms Tutorial:: |
||
368 | * Real Multi-dimensional Transforms Tutorial:: |
||
369 | * Multi-dimensional Array Format:: |
||
370 | * Words of Wisdom:: |
||
371 | @end menu |
||
372 | |||
373 | @node Complex One-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial, Tutorial |
||
374 | @section Complex One-dimensional Transforms Tutorial |
||
375 | @cindex complex one-dimensional transform |
||
376 | @cindex complex transform |
||
377 | |||
378 | The basic usage of FFTW is simple. A typical call to FFTW looks like: |
||
379 | |||
380 | @example |
||
381 | #include <fftw.h> |
||
382 | ... |
||
383 | @{ |
||
384 | fftw_complex in[N], out[N]; |
||
385 | fftw_plan p; |
||
386 | ... |
||
387 | p = fftw_create_plan(N, FFTW_FORWARD, FFTW_ESTIMATE); |
||
388 | ... |
||
389 | fftw_one(p, in, out); |
||
390 | ... |
||
391 | fftw_destroy_plan(p); |
||
392 | @} |
||
393 | @end example |
||
394 | |||
395 | The first thing we do is to create a @dfn{plan}, which is an object |
||
396 | @cindex plan |
||
397 | that contains all the data that FFTW needs to compute the FFT, using the |
||
398 | following function: |
||
399 | |||
400 | @example |
||
401 | fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags); |
||
402 | @end example |
||
403 | @findex fftw_create_plan |
||
404 | @findex fftw_direction |
||
405 | @tindex fftw_plan |
||
406 | |||
407 | The first argument, @code{n}, is the size of the transform you are |
||
408 | trying to compute. The size @code{n} can be any positive integer, but |
||
409 | sizes that are products of small factors are transformed most |
||
410 | efficiently. The second argument, @code{dir}, can be either |
||
411 | @code{FFTW_FORWARD} or @code{FFTW_BACKWARD}, and indicates the direction |
||
412 | of the transform you |
||
413 | @ctindex FFTW_FORWARD |
||
414 | @ctindex FFTW_BACKWARD |
||
415 | are interested in. Alternatively, you can use the sign of the exponent |
||
416 | in the transform, @math{-1} or @math{+1}, which corresponds to |
||
417 | @code{FFTW_FORWARD} or @code{FFTW_BACKWARD} respectively. The |
||
418 | @code{flags} argument is either @code{FFTW_MEASURE} or |
||
419 | @cindex flags |
||
420 | @code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} means that FFTW actually runs |
||
421 | @ctindex FFTW_MEASURE |
||
422 | and measures the execution time of several FFTs in order to find the |
||
423 | best way to compute the transform of size @code{n}. This may take some |
||
424 | time, depending on your installation and on the precision of the timer |
||
425 | in your machine. @code{FFTW_ESTIMATE}, on the contrary, does not run |
||
426 | any computation, and just builds a |
||
427 | @ctindex FFTW_ESTIMATE |
||
428 | reasonable plan, which may be sub-optimal. In other words, if your |
||
429 | program performs many transforms of the same size and initialization |
||
430 | time is not important, use @code{FFTW_MEASURE}; otherwise use the |
||
431 | estimate. (A compromise between these two extremes exists. @xref{Words |
||
432 | of Wisdom}.) |
||
433 | |||
434 | Once the plan has been created, you can use it as many times as you like |
||
435 | for transforms on arrays of the same size. When you are done with the |
||
436 | plan, you deallocate it by calling @code{fftw_destroy_plan(plan)}. |
||
437 | @findex fftw_destroy_plan |
||
438 | |||
439 | The transform itself is computed by passing the plan along with the |
||
440 | input and output arrays to @code{fftw_one}: |
||
441 | |||
442 | @example |
||
443 | void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out); |
||
444 | @end example |
||
445 | @findex fftw_one |
||
446 | |||
447 | Note that the transform is out of place: @code{in} and @code{out} must |
||
448 | point to distinct arrays. It operates on data of type |
||
449 | @code{fftw_complex}, a data structure with real (@code{in[i].re}) and |
||
450 | imaginary (@code{in[i].im}) floating-point components. The @code{in} |
||
451 | and @code{out} arrays should have the length specified when the plan was |
||
452 | created. An alternative function, @code{fftw}, allows you to |
||
453 | efficiently perform multiple and/or strided transforms (@pxref{FFTW |
||
454 | Reference}). |
||
455 | @tindex fftw_complex |
||
456 | |||
457 | The DFT results are stored in-order in the array @code{out}, with the |
||
458 | zero-frequency (DC) component in @code{out[0]}. |
||
459 | @cindex frequency |
||
460 | The array @code{in} is not modified. Users should note that FFTW |
||
461 | computes an unnormalized DFT, the sign of whose exponent is given by the |
||
462 | @code{dir} parameter of @code{fftw_create_plan}. Thus, computing a |
||
463 | forward followed by a backward transform (or vice versa) results in the |
||
464 | original array scaled by @code{n}. @xref{What FFTW Really Computes}, |
||
465 | for the definition of DFT. |
||
466 | @cindex normalization |
||
467 | |||
468 | A program using FFTW should be linked with @code{-lfftw -lm} on Unix |
||
469 | systems, or with the FFTW and standard math libraries in general. |
||
470 | @cindex linking on Unix |
||
471 | |||
472 | @node Complex Multi-dimensional Transforms Tutorial, Real One-dimensional Transforms Tutorial, Complex One-dimensional Transforms Tutorial, Tutorial |
||
473 | @section Complex Multi-dimensional Transforms Tutorial |
||
474 | @cindex complex multi-dimensional transform |
||
475 | @cindex multi-dimensional transform |
||
476 | |||
477 | FFTW can also compute transforms of any number of dimensions |
||
478 | (@dfn{rank}). The syntax is similar to that for the one-dimensional |
||
479 | @cindex rank |
||
480 | transforms, with @samp{fftw_} replaced by @samp{fftwnd_} (which stands |
||
481 | for ``@code{fftw} in @code{N} dimensions''). |
||
482 | |||
483 | As before, we @code{#include <fftw.h>} and create a plan for the |
||
484 | transforms, this time of type @code{fftwnd_plan}: |
||
485 | |||
486 | @example |
||
487 | fftwnd_plan fftwnd_create_plan(int rank, const int *n, |
||
488 | fftw_direction dir, int flags); |
||
489 | @end example |
||
490 | @tindex fftwnd_plan |
||
491 | @tindex fftw_direction |
||
492 | @findex fftwnd_create_plan |
||
493 | |||
494 | @code{rank} is the dimensionality of the array, and can be any |
||
495 | non-negative integer. The next argument, @code{n}, is a pointer to an |
||
496 | integer array of length @code{rank} containing the (positive) sizes of |
||
497 | each dimension of the array. (Note that the array will be stored in |
||
498 | row-major order. @xref{Multi-dimensional Array Format}, for information |
||
499 | on row-major order.) The last two parameters are the same as in |
||
500 | @code{fftw_create_plan}. We now, however, have an additional possible |
||
501 | flag, @code{FFTW_IN_PLACE}, since @code{fftwnd} supports true in-place |
||
502 | @cindex flags |
||
503 | @ctindex FFTW_IN_PLACE |
||
504 | @findex fftwnd |
||
505 | transforms. Multiple flags are combined using a bitwise @dfn{or} |
||
506 | (@samp{|}). (An @dfn{in-place} transform is one in which the output |
||
507 | data overwrite the input data. It thus requires half as much memory |
||
508 | as---and is often faster than---its opposite, an @dfn{out-of-place} |
||
509 | transform.) |
||
510 | @cindex in-place transform |
||
511 | @cindex out-of-place transform |
||
512 | |||
513 | For two- and three-dimensional transforms, FFTWND provides alternative |
||
514 | routines that accept the sizes of each dimension directly, rather than |
||
515 | indirectly through a rank and an array of sizes. These are otherwise |
||
516 | identical to @code{fftwnd_create_plan}, and are sometimes more |
||
517 | convenient: |
||
518 | |||
519 | @example |
||
520 | fftwnd_plan fftw2d_create_plan(int nx, int ny, |
||
521 | fftw_direction dir, int flags); |
||
522 | fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz, |
||
523 | fftw_direction dir, int flags); |
||
524 | @end example |
||
525 | @findex fftw2d_create_plan |
||
526 | @findex fftw3d_create_plan |
||
527 | |||
528 | Once the plan has been created, you can use it any number of times for |
||
529 | transforms of the same size. When you do not need a plan anymore, you |
||
530 | can deallocate the plan by calling @code{fftwnd_destroy_plan(plan)}. |
||
531 | @findex fftwnd_destroy_plan |
||
532 | |||
533 | Given a plan, you can compute the transform of an array of data by |
||
534 | calling: |
||
535 | |||
536 | @example |
||
537 | void fftwnd_one(fftwnd_plan plan, fftw_complex *in, fftw_complex *out); |
||
538 | @end example |
||
539 | @findex fftwnd_one |
||
540 | |||
541 | Here, @code{in} and @code{out} point to multi-dimensional arrays in |
||
542 | row-major order, of the size specified when the plan was created. In |
||
543 | the case of an in-place transform, the @code{out} parameter is ignored |
||
544 | and the output data are stored in the @code{in} array. The results are |
||
545 | stored in-order, unnormalized, with the zero-frequency component in |
||
546 | @code{out[0]}. |
||
547 | @cindex frequency |
||
548 | A forward followed by a backward transform (or vice-versa) yields the |
||
549 | original data multiplied by the size of the array (i.e. the product of |
||
550 | the dimensions). @xref{What FFTWND Really Computes}, for a discussion |
||
551 | of what FFTWND computes. |
||
552 | @cindex normalization |
||
553 | |||
554 | For example, code to perform an in-place FFT of a three-dimensional |
||
555 | array might look like: |
||
556 | |||
557 | @example |
||
558 | #include <fftw.h> |
||
559 | ... |
||
560 | @{ |
||
561 | fftw_complex in[L][M][N]; |
||
562 | fftwnd_plan p; |
||
563 | ... |
||
564 | p = fftw3d_create_plan(L, M, N, FFTW_FORWARD, |
||
565 | FFTW_MEASURE | FFTW_IN_PLACE); |
||
566 | ... |
||
567 | fftwnd_one(p, &in[0][0][0], NULL); |
||
568 | ... |
||
569 | fftwnd_destroy_plan(p); |
||
570 | @} |
||
571 | @end example |
||
572 | |||
573 | Note that @code{in} is a statically-declared array, which is |
||
574 | automatically in row-major order, but we must take the address of the |
||
575 | first element in order to fit the type expected by @code{fftwnd_one}. |
||
576 | (@xref{Multi-dimensional Array Format}.) |
||
577 | |||
578 | @node Real One-dimensional Transforms Tutorial, Real Multi-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial |
||
579 | @section Real One-dimensional Transforms Tutorial |
||
580 | @cindex real transform |
||
581 | @cindex complex to real transform |
||
582 | @cindex RFFTW |
||
583 | |||
584 | If the input data are purely real, you can save roughly a factor of two |
||
585 | in both time and storage by using the @dfn{rfftw} transforms, which are |
||
586 | FFTs specialized for real data. The output of a such a transform is a |
||
587 | @dfn{halfcomplex} array, which consists of only half of the complex DFT |
||
588 | amplitudes (since the negative-frequency amplitudes for real data are |
||
589 | the complex conjugate of the positive-frequency amplitudes). |
||
590 | @cindex halfcomplex array |
||
591 | |||
592 | In exchange for these speed and space advantages, the user sacrifices |
||
593 | some of the simplicity of FFTW's complex transforms. First of all, to |
||
594 | allow maximum performance, the output format of the one-dimensional real |
||
595 | transforms is different from that used by the multi-dimensional |
||
596 | transforms. Second, the inverse transform (halfcomplex to real) has the |
||
597 | side-effect of destroying its input array. Neither of these |
||
598 | inconveniences should pose a serious problem for users, but it is |
||
599 | important to be aware of them. (Both the inconvenient output format |
||
600 | and the side-effect of the inverse transform can be ameliorated for |
||
601 | one-dimensional transforms, at the expense of some performance, by using |
||
602 | instead the multi-dimensional transform routines with a rank of one.) |
||
603 | |||
604 | The computation of the plan is similar to that for the complex |
||
605 | transforms. First, you @code{#include <rfftw.h>}. Then, you create a |
||
606 | plan (of type @code{rfftw_plan}) by calling: |
||
607 | |||
608 | @example |
||
609 | rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags); |
||
610 | @end example |
||
611 | @tindex rfftw_plan |
||
612 | @tindex fftw_direction |
||
613 | @findex rfftw_create_plan |
||
614 | |||
615 | @code{n} is the length of the @emph{real} array in the transform (even |
||
616 | for halfcomplex-to-real transforms), and can be any positive integer |
||
617 | (although sizes with small factors are transformed more efficiently). |
||
618 | @code{dir} is either @code{FFTW_REAL_TO_COMPLEX} or |
||
619 | @code{FFTW_COMPLEX_TO_REAL}. |
||
620 | @ctindex FFTW_REAL_TO_COMPLEX |
||
621 | @ctindex FFTW_COMPLEX_TO_REAL |
||
622 | The @code{flags} parameter is the same as in @code{fftw_create_plan}. |
||
623 | |||
624 | Once created, a plan can be used for any number of transforms, and is |
||
625 | deallocated when you are done with it by calling |
||
626 | @code{rfftw_destroy_plan(plan)}. |
||
627 | @findex rfftw_destroy_plan |
||
628 | |||
629 | Given a plan, a real-to-complex or complex-to-real transform is computed |
||
630 | by calling: |
||
631 | |||
632 | @example |
||
633 | void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out); |
||
634 | @end example |
||
635 | @findex rfftw_one |
||
636 | |||
637 | (Note that @code{fftw_real} is an alias for the floating-point type for |
||
638 | which FFTW was compiled.) Depending upon the direction of the plan, |
||
639 | either the input or the output array is halfcomplex, and is stored in |
||
640 | the following format: |
||
641 | @cindex halfcomplex array |
||
642 | |||
643 | @iftex |
||
644 | @tex |
||
645 | $$ |
||
646 | r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1 |
||
647 | $$ |
||
648 | @end tex |
||
649 | @end iftex |
||
650 | @ifinfo |
||
651 | r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 |
||
652 | @end ifinfo |
||
653 | @ifhtml |
||
654 | <p align=center> |
||
655 | r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub> |
||
656 | </p> |
||
657 | @end ifhtml |
||
658 | |||
659 | Here, |
||
660 | @ifinfo |
||
661 | rk |
||
662 | @end ifinfo |
||
663 | @iftex |
||
664 | @tex |
||
665 | $r_k$ |
||
666 | @end tex |
||
667 | @end iftex |
||
668 | @ifhtml |
||
669 | r<sub>k</sub> |
||
670 | @end ifhtml |
||
671 | is the real part of the @math{k}th output, and |
||
672 | @ifinfo |
||
673 | ik |
||
674 | @end ifinfo |
||
675 | @iftex |
||
676 | @tex |
||
677 | $i_k$ |
||
678 | @end tex |
||
679 | @end iftex |
||
680 | @ifhtml |
||
681 | i<sub>k</sub> |
||
682 | @end ifhtml |
||
683 | is the imaginary part. (We follow here the C convention that integer |
||
684 | division is rounded down, e.g. @math{7 / 2 = 3}.) For a halfcomplex |
||
685 | array @code{hc[]}, the @math{k}th component has its real part in |
||
686 | @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with the |
||
687 | exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter only |
||
688 | if n is even)---in these two cases, the imaginary part is zero due to |
||
689 | symmetries of the real-complex transform, and is not stored. Thus, the |
||
690 | transform of @code{n} real values is a halfcomplex array of length |
||
691 | @code{n}, and vice versa. @footnote{The output for the |
||
692 | multi-dimensional rfftw is a more-conventional array of |
||
693 | @code{fftw_complex} values, but the format here permitted us greater |
||
694 | efficiency in one dimension.} This is actually only half of the DFT |
||
695 | spectrum of the data. Although the other half can be obtained by |
||
696 | complex conjugation, it is not required by many applications such as |
||
697 | convolution and filtering. |
||
698 | |||
699 | Like the complex transforms, the RFFTW transforms are unnormalized, so a |
||
700 | forward followed by a backward transform (or vice-versa) yields the |
||
701 | original data scaled by the length of the array, @code{n}. |
||
702 | @cindex normalization |
||
703 | |||
704 | Let us reiterate here our warning that an @code{FFTW_COMPLEX_TO_REAL} |
||
705 | transform has the side-effect of destroying its (halfcomplex) input. |
||
706 | The @code{FFTW_REAL_TO_COMPLEX} transform, however, leaves its (real) |
||
707 | input untouched, just as you would hope. |
||
708 | |||
709 | As an example, here is an outline of how you might use RFFTW to compute |
||
710 | the power spectrum of a real array (i.e. the squares of the absolute |
||
711 | values of the DFT amplitudes): |
||
712 | @cindex power spectrum |
||
713 | |||
714 | @example |
||
715 | #include <rfftw.h> |
||
716 | ... |
||
717 | @{ |
||
718 | fftw_real in[N], out[N], power_spectrum[N/2+1]; |
||
719 | rfftw_plan p; |
||
720 | int k; |
||
721 | ... |
||
722 | p = rfftw_create_plan(N, FFTW_REAL_TO_COMPLEX, FFTW_ESTIMATE); |
||
723 | ... |
||
724 | rfftw_one(p, in, out); |
||
725 | power_spectrum[0] = out[0]*out[0]; /* DC component */ |
||
726 | for (k = 1; k < (N+1)/2; ++k) /* (k < N/2 rounded up) */ |
||
727 | power_spectrum[k] = out[k]*out[k] + out[N-k]*out[N-k]; |
||
728 | if (N % 2 == 0) /* N is even */ |
||
729 | power_spectrum[N/2] = out[N/2]*out[N/2]; /* Nyquist freq. */ |
||
730 | ... |
||
731 | rfftw_destroy_plan(p); |
||
732 | @} |
||
733 | @end example |
||
734 | |||
735 | Programs using RFFTW should link with @code{-lrfftw -lfftw -lm} on Unix, |
||
736 | or with the FFTW, RFFTW, and math libraries in general. |
||
737 | @cindex linking on Unix |
||
738 | |||
739 | @node Real Multi-dimensional Transforms Tutorial, Multi-dimensional Array Format, Real One-dimensional Transforms Tutorial, Tutorial |
||
740 | @section Real Multi-dimensional Transforms Tutorial |
||
741 | @cindex real multi-dimensional transform |
||
742 | |||
743 | FFTW includes multi-dimensional transforms for real data of any rank. |
||
744 | As with the one-dimensional real transforms, they save roughly a factor |
||
745 | of two in time and storage over complex transforms of the same size. |
||
746 | Also as in one dimension, these gains come at the expense of some |
||
747 | increase in complexity---the output format is different from the |
||
748 | one-dimensional RFFTW (and is more similar to that of the complex FFTW) |
||
749 | and the inverse (complex to real) transforms have the side-effect of |
||
750 | overwriting their input data. |
||
751 | |||
752 | To use the real multi-dimensional transforms, you first @code{#include |
||
753 | <rfftw.h>} and then create a plan for the size and direction of |
||
754 | transform that you are interested in: |
||
755 | |||
756 | @example |
||
757 | rfftwnd_plan rfftwnd_create_plan(int rank, const int *n, |
||
758 | fftw_direction dir, int flags); |
||
759 | @end example |
||
760 | @tindex rfftwnd_plan |
||
761 | @findex rfftwnd_create_plan |
||
762 | |||
763 | The first two parameters describe the size of the real data (not the |
||
764 | halfcomplex data, which will have different dimensions). The last two |
||
765 | parameters are the same as those for @code{rfftw_create_plan}. Just as |
||
766 | for fftwnd, there are two alternate versions of this routine, |
||
767 | @code{rfftw2d_create_plan} and @code{rfftw3d_create_plan}, that are |
||
768 | sometimes more convenient for two- and three-dimensional transforms. |
||
769 | @findex rfftw2d_create_plan |
||
770 | @findex rfftw3d_create_plan |
||
771 | Also as in fftwnd, rfftwnd supports true in-place transforms, specified |
||
772 | by including @code{FFTW_IN_PLACE} in the flags. |
||
773 | |||
774 | Once created, a plan can be used for any number of transforms, and is |
||
775 | deallocated by calling @code{rfftwnd_destroy_plan(plan)}. |
||
776 | |||
777 | Given a plan, the transform is computed by calling one of the following |
||
778 | two routines: |
||
779 | |||
780 | @example |
||
781 | void rfftwnd_one_real_to_complex(rfftwnd_plan plan, |
||
782 | fftw_real *in, fftw_complex *out); |
||
783 | void rfftwnd_one_complex_to_real(rfftwnd_plan plan, |
||
784 | fftw_complex *in, fftw_real *out); |
||
785 | @end example |
||
786 | @findex rfftwnd_one_real_to_complex |
||
787 | @findex rfftwnd_one_complex_to_real |
||
788 | |||
789 | As is clear from their names and parameter types, the former function is |
||
790 | for @code{FFTW_REAL_TO_COMPLEX} transforms and the latter is for |
||
791 | @code{FFTW_COMPLEX_TO_REAL} transforms. (We could have used only a |
||
792 | single routine, since the direction of the transform is encoded in the |
||
793 | plan, but we wanted to correctly express the datatypes of the |
||
794 | parameters.) The latter routine, as we discuss elsewhere, has the |
||
795 | side-effect of overwriting its input (except when the rank of the array |
||
796 | is one). In both cases, the @code{out} parameter is ignored for |
||
797 | in-place transforms. |
||
798 | |||
799 | The format of the complex arrays deserves careful attention. |
||
800 | @cindex rfftwnd array format |
||
801 | Suppose that the real data has dimensions |
||
802 | @iftex |
||
803 | @tex |
||
804 | $n_1 \times n_2 \times \cdots \times n_d$ |
||
805 | @end tex |
||
806 | @end iftex |
||
807 | @ifinfo |
||
808 | n1 x n2 x ... x nd |
||
809 | @end ifinfo |
||
810 | @ifhtml |
||
811 | n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub> |
||
812 | @end ifhtml |
||
813 | (in row-major order). Then, after a real-to-complex transform, the |
||
814 | output is an |
||
815 | @iftex |
||
816 | @tex |
||
817 | $n_1 \times n_2 \times \cdots \times (n_d/2+1)$ |
||
818 | @end tex |
||
819 | @end iftex |
||
820 | @ifinfo |
||
821 | n1 x n2 x ... x (nd/2+1) |
||
822 | @end ifinfo |
||
823 | @ifhtml |
||
824 | n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1) |
||
825 | @end ifhtml |
||
826 | array of @code{fftw_complex} values in row-major order, corresponding to |
||
827 | slightly over half of the output of the corresponding complex transform. |
||
828 | (Note that the division is rounded down.) The ordering of the data is |
||
829 | otherwise exactly the same as in the complex case. (In principle, the |
||
830 | output could be exactly half the size of the complex transform output, |
||
831 | but in more than one dimension this requires too complicated a format to |
||
832 | be practical.) Note that, unlike the one-dimensional RFFTW, the real |
||
833 | and imaginary parts of the DFT amplitudes are here stored together in |
||
834 | the natural way. |
||
835 | |||
836 | Since the complex data is slightly larger than the real data, some |
||
837 | complications arise for in-place transforms. In this case, the final |
||
838 | dimension of the real data must be padded with extra values to |
||
839 | accommodate the size of the complex data---two extra if the last |
||
840 | dimension is even and one if it is odd. |
||
841 | @cindex padding |
||
842 | That is, the last dimension of the real data must physically contain |
||
843 | @iftex |
||
844 | @tex |
||
845 | $2 (n_d/2+1)$ |
||
846 | @end tex |
||
847 | @end iftex |
||
848 | @ifinfo |
||
849 | 2 * (nd/2+1) |
||
850 | @end ifinfo |
||
851 | @ifhtml |
||
852 | 2 * (n<sub>d</sub>/2+1) |
||
853 | @end ifhtml |
||
854 | @code{fftw_real} values (exactly enough to hold the complex data). |
||
855 | This physical array size does not, however, change the @emph{logical} |
||
856 | array size---only |
||
857 | @iftex |
||
858 | @tex |
||
859 | $n_d$ |
||
860 | @end tex |
||
861 | @end iftex |
||
862 | @ifinfo |
||
863 | nd |
||
864 | @end ifinfo |
||
865 | @ifhtml |
||
866 | n<sub>d</sub> |
||
867 | @end ifhtml |
||
868 | values are actually stored in the last dimension, and |
||
869 | @iftex |
||
870 | @tex |
||
871 | $n_d$ |
||
872 | @end tex |
||
873 | @end iftex |
||
874 | @ifinfo |
||
875 | nd |
||
876 | @end ifinfo |
||
877 | @ifhtml |
||
878 | n<sub>d</sub> |
||
879 | @end ifhtml |
||
880 | is the last dimension passed to @code{rfftwnd_create_plan}. |
||
881 | |||
882 | For example, consider the transform of a two-dimensional real array of |
||
883 | size @code{nx} by @code{ny}. The output of the @code{rfftwnd} transform |
||
884 | is a two-dimensional real array of size @code{nx} by @code{ny/2+1}, |
||
885 | where the @code{y} dimension has been cut nearly in half because of |
||
886 | redundancies in the output. Because @code{fftw_complex} is twice the |
||
887 | size of @code{fftw_real}, the output array is slightly bigger than the |
||
888 | input array. Thus, if we want to compute the transform in place, we |
||
889 | must @emph{pad} the input array so that it is of size @code{nx} by |
||
890 | @code{2*(ny/2+1)}. If @code{ny} is even, then there are two padding |
||
891 | elements at the end of each row (which need not be initialized, as they |
||
892 | are only used for output). |
||
893 | @ifhtml |
||
894 | The following illustration depicts the input and output arrays just |
||
895 | described, for both the out-of-place and in-place transforms (with the |
||
896 | arrows indicating consecutive memory locations): |
||
897 | |||
898 | <p align=center><img src="rfftwnd.gif" width=389 height=583> |
||
899 | @end ifhtml |
||
900 | |||
901 | @iftex |
||
902 | Figure 1 depicts the input and output arrays just |
||
903 | described, for both the out-of-place and in-place transforms (with the |
||
904 | arrows indicating consecutive memory locations). |
||
905 | |||
906 | @tex |
||
907 | { |
||
908 | \pageinsert |
||
909 | \vfill |
||
910 | \vskip405pt |
||
911 | \hskip40pt |
||
912 | \special{psfile="rfftwnd.eps" |
||
913 | } |
||
914 | \vskip 24pt |
||
915 | Figure 1: Illustration of the data layout for real to complex |
||
916 | transforms. |
||
917 | \vfill |
||
918 | \endinsert} |
||
919 | @end tex |
||
920 | @end iftex |
||
921 | |||
922 | The RFFTWND transforms are unnormalized, so a forward followed by a |
||
923 | backward transform will result in the original data scaled by the number |
||
924 | of real data elements---that is, the product of the (logical) dimensions |
||
925 | of the real data. |
||
926 | @cindex normalization |
||
927 | |||
928 | Below, we illustrate the use of RFFTWND by showing how you might use it |
||
929 | to compute the (cyclic) convolution of two-dimensional real arrays |
||
930 | @code{a} and @code{b} (using the identity that a convolution corresponds |
||
931 | to a pointwise product of the Fourier transforms). For variety, |
||
932 | in-place transforms are used for the forward FFTs and an out-of-place |
||
933 | transform is used for the inverse transform. |
||
934 | @cindex convolution |
||
935 | @cindex cyclic convolution |
||
936 | |||
937 | @example |
||
938 | #include <rfftw.h> |
||
939 | ... |
||
940 | @{ |
||
941 | fftw_real a[M][2*(N/2+1)], b[M][2*(N/2+1)], c[M][N]; |
||
942 | fftw_complex *A, *B, C[M][N/2+1]; |
||
943 | rfftwnd_plan p, pinv; |
||
944 | fftw_real scale = 1.0 / (M * N); |
||
945 | int i, j; |
||
946 | ... |
||
947 | p = rfftw2d_create_plan(M, N, FFTW_REAL_TO_COMPLEX, |
||
948 | FFTW_ESTIMATE | FFTW_IN_PLACE); |
||
949 | pinv = rfftw2d_create_plan(M, N, FFTW_COMPLEX_TO_REAL, |
||
950 | FFTW_ESTIMATE); |
||
951 | |||
952 | /* aliases for accessing complex transform outputs: */ |
||
953 | A = (fftw_complex*) &a[0][0]; |
||
954 | B = (fftw_complex*) &b[0][0]; |
||
955 | ... |
||
956 | for (i = 0; i < M; ++i) |
||
957 | for (j = 0; j < N; ++j) @{ |
||
958 | a[i][j] = ... ; |
||
959 | b[i][j] = ... ; |
||
960 | @} |
||
961 | ... |
||
962 | rfftwnd_one_real_to_complex(p, &a[0][0], NULL); |
||
963 | rfftwnd_one_real_to_complex(p, &b[0][0], NULL); |
||
964 | |||
965 | for (i = 0; i < M; ++i) |
||
966 | for (j = 0; j < N/2+1; ++j) @{ |
||
967 | int ij = i*(N/2+1) + j; |
||
968 | C[i][j].re = (A[ij].re * B[ij].re |
||
969 | - A[ij].im * B[ij].im) * scale; |
||
970 | C[i][j].im = (A[ij].re * B[ij].im |
||
971 | + A[ij].im * B[ij].re) * scale; |
||
972 | @} |
||
973 | |||
974 | /* inverse transform to get c, the convolution of a and b; |
||
975 | this has the side effect of overwriting C */ |
||
976 | rfftwnd_one_complex_to_real(pinv, &C[0][0], &c[0][0]); |
||
977 | ... |
||
978 | rfftwnd_destroy_plan(p); |
||
979 | rfftwnd_destroy_plan(pinv); |
||
980 | @} |
||
981 | @end example |
||
982 | |||
983 | We access the complex outputs of the in-place transforms by casting |
||
984 | each real array to a @code{fftw_complex} pointer. Because this is a |
||
985 | ``flat'' pointer, we have to compute the row-major index @code{ij} |
||
986 | explicitly in the convolution product loop. |
||
987 | @cindex row-major |
||
988 | In order to normalize the convolution, we must multiply by a scale |
||
989 | factor---we can do so either before or after the inverse transform, and |
||
990 | choose the former because it obviates the necessity of an additional |
||
991 | loop. |
||
992 | @cindex normalization |
||
993 | Notice the limits of the loops and the dimensions of the various arrays. |
||
994 | |||
995 | As with the one-dimensional RFFTW, an out-of-place |
||
996 | @code{FFTW_COMPLEX_TO_REAL} transform has the side-effect of overwriting |
||
997 | its input array. (The real-to-complex transform, on the other hand, |
||
998 | leaves its input array untouched.) If you use RFFTWND for a rank-one |
||
999 | transform, however, this side-effect does not occur. Because of this |
||
1000 | fact (and the simpler output format), users may find the RFFTWND |
||
1001 | interface more convenient than RFFTW for one-dimensional transforms. |
||
1002 | However, RFFTWND in one dimension is slightly slower than RFFTW because |
||
1003 | RFFTWND uses an extra buffer array internally. |
||
1004 | |||
1005 | @c ------------------------------------------------------------ |
||
1006 | @node Multi-dimensional Array Format, Words of Wisdom, Real Multi-dimensional Transforms Tutorial, Tutorial |
||
1007 | @section Multi-dimensional Array Format |
||
1008 | |||
1009 | This section describes the format in which multi-dimensional arrays are |
||
1010 | stored. We felt that a detailed discussion of this topic was necessary, |
||
1011 | since it is often a source of confusion among users and several |
||
1012 | different formats are common. Although the comments below refer to |
||
1013 | @code{fftwnd}, they are also applicable to the @code{rfftwnd} routines. |
||
1014 | |||
1015 | @menu |
||
1016 | * Row-major Format:: |
||
1017 | * Column-major Format:: |
||
1018 | * Static Arrays in C:: |
||
1019 | * Dynamic Arrays in C:: |
||
1020 | * Dynamic Arrays in C-The Wrong Way:: |
||
1021 | @end menu |
||
1022 | |||
1023 | @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format |
||
1024 | @subsection Row-major Format |
||
1025 | @cindex row-major |
||
1026 | |||
1027 | The multi-dimensional arrays passed to @code{fftwnd} are expected to be |
||
1028 | stored as a single contiguous block in @dfn{row-major} order (sometimes |
||
1029 | called ``C order''). Basically, this means that as you step through |
||
1030 | adjacent memory locations, the first dimension's index varies most |
||
1031 | slowly and the last dimension's index varies most quickly. |
||
1032 | |||
1033 | To be more explicit, let us consider an array of rank @math{d} whose |
||
1034 | dimensions are |
||
1035 | @iftex |
||
1036 | @tex |
||
1037 | $n_1 \times n_2 \times n_3 \times \cdots \times n_d$. |
||
1038 | @end tex |
||
1039 | @end iftex |
||
1040 | @ifinfo |
||
1041 | n1 x n2 x n3 x ... x nd. |
||
1042 | @end ifinfo |
||
1043 | @ifhtml |
||
1044 | n<sub>1</sub> x n<sub>2</sub> x n<sub>3</sub> x ... x n<sub>d</sub>. |
||
1045 | @end ifhtml |
||
1046 | Now, we specify a location in the array by a sequence of (zero-based) indices, |
||
1047 | one for each dimension: |
||
1048 | @iftex |
||
1049 | @tex |
||
1050 | $(i_1, i_2, i_3, \ldots, i_d)$. |
||
1051 | @end tex |
||
1052 | @end iftex |
||
1053 | @ifinfo |
||
1054 | (i1, i2, ..., id). |
||
1055 | @end ifinfo |
||
1056 | @ifhtml |
||
1057 | (i<sub>1</sub>, i<sub>2</sub>, i<sub>3</sub>,..., i<sub>d</sub>). |
||
1058 | @end ifhtml |
||
1059 | If the array is stored in row-major |
||
1060 | order, then this element is located at the position |
||
1061 | @iftex |
||
1062 | @tex |
||
1063 | $i_d + n_d (i_{d-1} + n_{d-1} (\ldots + n_2 i_1))$. |
||
1064 | @end tex |
||
1065 | @end iftex |
||
1066 | @ifinfo |
||
1067 | id + nd * (id-1 + nd-1 * (... + n2 * i1)). |
||
1068 | @end ifinfo |
||
1069 | @ifhtml |
||
1070 | i<sub>d</sub> + n<sub>d</sub> * (i<sub>d-1</sub> + n<sub>d-1</sub> * (... + n<sub>2</sub> * i<sub>1</sub>)). |
||
1071 | @end ifhtml |
||
1072 | |||
1073 | Note that each element of the array must be of type @code{fftw_complex}; |
||
1074 | i.e. a (real, imaginary) pair of (double-precision) numbers. Note also |
||
1075 | that, in @code{fftwnd}, the expression above is multiplied by the stride |
||
1076 | to get the actual array index---this is useful in situations where each |
||
1077 | element of the multi-dimensional array is actually a data structure or |
||
1078 | another array, and you just want to transform a single field. In most |
||
1079 | cases, however, you use a stride of 1. |
||
1080 | @cindex stride |
||
1081 | |||
1082 | @node Column-major Format, Static Arrays in C, Row-major Format, Multi-dimensional Array Format |
||
1083 | @subsection Column-major Format |
||
1084 | @cindex column-major |
||
1085 | |||
1086 | Readers from the Fortran world are used to arrays stored in |
||
1087 | @dfn{column-major} order (sometimes called ``Fortran order''). This is |
||
1088 | essentially the exact opposite of row-major order in that, here, the |
||
1089 | @emph{first} dimension's index varies most quickly. |
||
1090 | |||
1091 | If you have an array stored in column-major order and wish to transform |
||
1092 | it using @code{fftwnd}, it is quite easy to do. When creating the plan, |
||
1093 | simply pass the dimensions of the array to @code{fftwnd_create_plan} in |
||
1094 | @emph{reverse order}. For example, if your array is a rank three |
||
1095 | @code{N x M x L} matrix in column-major order, you should pass the |
||
1096 | dimensions of the array as if it were an @code{L x M x N} matrix (which |
||
1097 | it is, from perspective of @code{fftwnd}). This is done for you |
||
1098 | automatically by the FFTW Fortran wrapper routines (@pxref{Calling FFTW |
||
1099 | from Fortran}). |
||
1100 | @cindex Fortran-callable wrappers |
||
1101 | |||
1102 | @node Static Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format |
||
1103 | @subsection Static Arrays in C |
||
1104 | @cindex C multi-dimensional arrays |
||
1105 | |||
1106 | Multi-dimensional arrays declared statically (that is, at compile time, |
||
1107 | not necessarily with the @code{static} keyword) in C are @emph{already} |
||
1108 | in row-major order. You don't have to do anything special to transform |
||
1109 | them. (@xref{Complex Multi-dimensional Transforms Tutorial}, for an |
||
1110 | example of this sort of code.) |
||
1111 | |||
1112 | @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Static Arrays in C, Multi-dimensional Array Format |
||
1113 | @subsection Dynamic Arrays in C |
||
1114 | |||
1115 | Often, especially for large arrays, it is desirable to allocate the |
||
1116 | arrays dynamically, at runtime. This isn't too hard to do, although it |
||
1117 | is not as straightforward for multi-dimensional arrays as it is for |
||
1118 | one-dimensional arrays. |
||
1119 | |||
1120 | Creating the array is simple: using a dynamic-allocation routine like |
||
1121 | @code{malloc}, allocate an array big enough to store N @code{fftw_complex} |
||
1122 | values, where N is the product of the sizes of the array dimensions |
||
1123 | (i.e. the total number of complex values in the array). For example, |
||
1124 | here is code to allocate a 5x12x27 rank 3 array: |
||
1125 | @ffindex malloc |
||
1126 | |||
1127 | @example |
||
1128 | fftw_complex *an_array; |
||
1129 | |||
1130 | an_array = (fftw_complex *) malloc(5 * 12 * 27 * sizeof(fftw_complex)); |
||
1131 | @end example |
||
1132 | |||
1133 | Accessing the array elements, however, is more tricky---you can't simply |
||
1134 | use multiple applications of the @samp{[]} operator like you could for |
||
1135 | static arrays. Instead, you have to explicitly compute the offset into |
||
1136 | the array using the formula given earlier for row-major arrays. For |
||
1137 | example, to reference the @math{(i,j,k)}-th element of the array |
||
1138 | allocated above, you would use the expression @code{an_array[k + 27 * (j |
||
1139 | + 12 * i)]}. |
||
1140 | |||
1141 | This pain can be alleviated somewhat by defining appropriate macros, or, |
||
1142 | in C++, creating a class and overloading the @samp{()} operator. |
||
1143 | |||
1144 | @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format |
||
1145 | @subsection Dynamic Arrays in C---The Wrong Way |
||
1146 | |||
1147 | A different method for allocating multi-dimensional arrays in C is often |
||
1148 | suggested that is incompatible with @code{fftwnd}: @emph{using it will |
||
1149 | cause FFTW to die a painful death}. We discuss the technique here, |
||
1150 | however, because it is so commonly known and used. This method is to |
||
1151 | create arrays of pointers of arrays of pointers of @dots{}etcetera. For |
||
1152 | example, the analogue in this method to the example above is: |
||
1153 | |||
1154 | @example |
||
1155 | int i,j; |
||
1156 | fftw_complex ***a_bad_array; /* another way to make a 5x12x27 array */ |
||
1157 | |||
1158 | a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **)); |
||
1159 | for (i = 0; i < 5; ++i) @{ |
||
1160 | a_bad_array[i] = |
||
1161 | (fftw_complex **) malloc(12 * sizeof(fftw_complex *)); |
||
1162 | for (j = 0; j < 12; ++j) |
||
1163 | a_bad_array[i][j] = |
||
1164 | (fftw_complex *) malloc(27 * sizeof(fftw_complex)); |
||
1165 | @} |
||
1166 | @end example |
||
1167 | |||
1168 | As you can see, this sort of array is inconvenient to allocate (and |
||
1169 | deallocate). On the other hand, it has the advantage that the |
||
1170 | @math{(i,j,k)}-th element can be referenced simply by |
||
1171 | @code{a_bad_array[i][j][k]}. |
||
1172 | |||
1173 | If you like this technique and want to maximize convenience in accessing |
||
1174 | the array, but still want to pass the array to FFTW, you can use a |
||
1175 | hybrid method. Allocate the array as one contiguous block, but also |
||
1176 | declare an array of arrays of pointers that point to appropriate places |
||
1177 | in the block. That sort of trick is beyond the scope of this |
||
1178 | documentation; for more information on multi-dimensional arrays in C, |
||
1179 | see the @code{comp.lang.c} |
||
1180 | @uref{http://www.eskimo.com/~scs/C-faq/s6.html, FAQ}. |
||
1181 | |||
1182 | @c ------------------------------------------------------------ |
||
1183 | @node Words of Wisdom, , Multi-dimensional Array Format, Tutorial |
||
1184 | @section Words of Wisdom |
||
1185 | @cindex wisdom |
||
1186 | @cindex saving plans to disk |
||
1187 | |||
1188 | FFTW implements a method for saving plans to disk and restoring them. |
||
1189 | In fact, what FFTW does is more general than just saving and loading |
||
1190 | plans. The mechanism is called @dfn{@code{wisdom}}. Here, we describe |
||
1191 | this feature at a high level. @xref{FFTW Reference}, for a less casual |
||
1192 | (but more complete) discussion of how to use @code{wisdom} in FFTW. |
||
1193 | |||
1194 | Plans created with the @code{FFTW_MEASURE} option produce near-optimal |
||
1195 | FFT performance, but it can take a long time to compute a plan because |
||
1196 | FFTW must actually measure the runtime of many possible plans and select |
||
1197 | the best one. This is designed for the situations where so many |
||
1198 | transforms of the same size must be computed that the start-up time is |
||
1199 | irrelevant. For short initialization times but slightly slower |
||
1200 | transforms, we have provided @code{FFTW_ESTIMATE}. The @code{wisdom} |
||
1201 | mechanism is a way to get the best of both worlds. There are, however, |
||
1202 | certain caveats that the user must be aware of in using @code{wisdom}. |
||
1203 | For this reason, @code{wisdom} is an optional feature which is not |
||
1204 | enabled by default. |
||
1205 | |||
1206 | At its simplest, @code{wisdom} provides a way of saving plans to disk so |
||
1207 | that they can be reused in other program runs. You create a plan with |
||
1208 | the flags @code{FFTW_MEASURE} and @code{FFTW_USE_WISDOM}, and then save |
||
1209 | the @code{wisdom} using @code{fftw_export_wisdom}: |
||
1210 | @ctindex FFTW_USE_WISDOM |
||
1211 | |||
1212 | @example |
||
1213 | plan = fftw_create_plan(..., ... | FFTW_MEASURE | FFTW_USE_WISDOM); |
||
1214 | fftw_export_wisdom(...); |
||
1215 | @end example |
||
1216 | @findex fftw_export_wisdom |
||
1217 | |||
1218 | The next time you run the program, you can restore the @code{wisdom} |
||
1219 | with @code{fftw_import_wisdom}, and then recreate the plan using the |
||
1220 | same flags as before. This time, however, the same optimal plan will be |
||
1221 | created very quickly without measurements. (FFTW still needs some time |
||
1222 | to compute trigonometric tables, however.) The basic outline is: |
||
1223 | |||
1224 | @example |
||
1225 | fftw_import_wisdom(...); |
||
1226 | plan = fftw_create_plan(..., ... | FFTW_USE_WISDOM); |
||
1227 | @end example |
||
1228 | @findex fftw_import_wisdom |
||
1229 | |||
1230 | Wisdom is more than mere rote memorization, however. FFTW's |
||
1231 | @code{wisdom} encompasses all of the knowledge and measurements that |
||
1232 | were used to create the plan for a given size. Therefore, existing |
||
1233 | @code{wisdom} is also applied to the creation of other plans of |
||
1234 | different sizes. |
||
1235 | |||
1236 | Whenever a plan is created with the @code{FFTW_MEASURE} and |
||
1237 | @code{FFTW_USE_WISDOM} flags, @code{wisdom} is generated. Thereafter, |
||
1238 | plans for any transform with a similar factorization will be computed |
||
1239 | more quickly, so long as they use the @code{FFTW_USE_WISDOM} flag. In |
||
1240 | fact, for transforms with the same factors and of equal or lesser size, |
||
1241 | no measurements at all need to be made and an optimal plan can be |
||
1242 | created with negligible delay! |
||
1243 | |||
1244 | For example, suppose that you create a plan for |
||
1245 | @iftex |
||
1246 | @tex |
||
1247 | $N = 2^{16}$. |
||
1248 | @end tex |
||
1249 | @end iftex |
||
1250 | @ifinfo |
||
1251 | N = 2^16. |
||
1252 | @end ifinfo |
||
1253 | @ifhtml |
||
1254 | N = 2<sup>16</sup>. |
||
1255 | @end ifhtml |
||
1256 | Then, for any equal or smaller power of two, FFTW can create a |
||
1257 | plan (with the same direction and flags) quickly, using the |
||
1258 | precomputed @code{wisdom}. Even for larger powers of two, or sizes that |
||
1259 | are a power of two times some other prime factors, plans will be |
||
1260 | computed more quickly than they would otherwise (although some |
||
1261 | measurements still have to be made). |
||
1262 | |||
1263 | The @code{wisdom} is cumulative, and is stored in a global, private data |
||
1264 | structure managed internally by FFTW. The storage space required is |
||
1265 | minimal, proportional to the logarithm of the sizes the @code{wisdom} was |
||
1266 | generated from. The @code{wisdom} can be forgotten (and its associated |
||
1267 | memory freed) by a call to @code{fftw_forget_wisdom()}; otherwise, it is |
||
1268 | remembered until the program terminates. It can also be exported to a |
||
1269 | file, a string, or any other medium using @code{fftw_export_wisdom} and |
||
1270 | restored during a subsequent execution of the program (or a different |
||
1271 | program) using @code{fftw_import_wisdom} (these functions are described |
||
1272 | below). |
||
1273 | |||
1274 | Because @code{wisdom} is incorporated into FFTW at a very low level, the |
||
1275 | same @code{wisdom} can be used for one-dimensional transforms, |
||
1276 | multi-dimensional transforms, and even the parallel extensions to FFTW. |
||
1277 | Just include @code{FFTW_USE_WISDOM} in the flags for whatever plans you |
||
1278 | create (i.e., always plan wisely). |
||
1279 | |||
1280 | Plans created with the @code{FFTW_ESTIMATE} plan can use @code{wisdom}, |
||
1281 | but cannot generate it; only @code{FFTW_MEASURE} plans actually produce |
||
1282 | @code{wisdom}. Also, plans can only use @code{wisdom} generated from |
||
1283 | plans created with the same direction and flags. For example, a size |
||
1284 | @code{42} @code{FFTW_BACKWARD} transform will not use @code{wisdom} |
||
1285 | produced by a size @code{42} @code{FFTW_FORWARD} transform. The only |
||
1286 | exception to this rule is that @code{FFTW_ESTIMATE} plans can use |
||
1287 | @code{wisdom} from @code{FFTW_MEASURE} plans. |
||
1288 | |||
1289 | @menu |
||
1290 | * Caveats in Using Wisdom:: What you should worry about in using wisdom |
||
1291 | * Importing and Exporting Wisdom:: I/O of wisdom to disk and other media |
||
1292 | @end menu |
||
1293 | |||
1294 | @node Caveats in Using Wisdom, Importing and Exporting Wisdom, Words of Wisdom, Words of Wisdom |
||
1295 | @subsection Caveats in Using Wisdom |
||
1296 | @cindex wisdom, problems with |
||
1297 | |||
1298 | @quotation |
||
1299 | @ifhtml |
||
1300 | <i> |
||
1301 | @end ifhtml |
||
1302 | For in much wisdom is much grief, and he that increaseth knowledge |
||
1303 | increaseth sorrow. |
||
1304 | @ifhtml |
||
1305 | </i> |
||
1306 | @end ifhtml |
||
1307 | [Ecclesiastes 1:18] |
||
1308 | @cindex Ecclesiastes |
||
1309 | @end quotation |
||
1310 | |||
1311 | There are pitfalls to using @code{wisdom}, in that it can negate FFTW's |
||
1312 | ability to adapt to changing hardware and other conditions. For example, |
||
1313 | it would be perfectly possible to export @code{wisdom} from a program |
||
1314 | running on one processor and import it into a program running on another |
||
1315 | processor. Doing so, however, would mean that the second program would |
||
1316 | use plans optimized for the first processor, instead of the one it is |
||
1317 | running on. |
||
1318 | |||
1319 | It should be safe to reuse @code{wisdom} as long as the hardware and |
||
1320 | program binaries remain unchanged. (Actually, the optimal plan may |
||
1321 | change even between runs of the same binary on identical hardware, due |
||
1322 | to differences in the virtual memory environment, etcetera. Users |
||
1323 | seriously interested in performance should worry about this problem, |
||
1324 | too.) It is likely that, if the same @code{wisdom} is used for two |
||
1325 | different program binaries, even running on the same machine, the plans |
||
1326 | may be sub-optimal because of differing code alignments. It is |
||
1327 | therefore wise to recreate @code{wisdom} every time an application is |
||
1328 | recompiled. The more the underlying hardware and software changes |
||
1329 | between the creation of @code{wisdom} and its use, the greater grows the |
||
1330 | risk of sub-optimal plans. |
||
1331 | |||
1332 | @node Importing and Exporting Wisdom, , Caveats in Using Wisdom, Words of Wisdom |
||
1333 | @subsection Importing and Exporting Wisdom |
||
1334 | @cindex wisdom, import and export |
||
1335 | |||
1336 | @example |
||
1337 | void fftw_export_wisdom_to_file(FILE *output_file); |
||
1338 | fftw_status fftw_import_wisdom_from_file(FILE *input_file); |
||
1339 | @end example |
||
1340 | @findex fftw_export_wisdom_to_file |
||
1341 | @findex fftw_import_wisdom_from_file |
||
1342 | |||
1343 | @code{fftw_export_wisdom_to_file} writes the @code{wisdom} to |
||
1344 | @code{output_file}, which must be a file open for |
||
1345 | writing. @code{fftw_import_wisdom_from_file} reads the @code{wisdom} |
||
1346 | from @code{input_file}, which must be a file open for reading, and |
||
1347 | returns @code{FFTW_SUCCESS} if successful and @code{FFTW_FAILURE} |
||
1348 | otherwise. In both cases, the file is left open and must be closed by |
||
1349 | the caller. It is perfectly fine if other data lie before or after the |
||
1350 | @code{wisdom} in the file, as long as the file is positioned at the |
||
1351 | beginning of the @code{wisdom} data before import. |
||
1352 | |||
1353 | @example |
||
1354 | char *fftw_export_wisdom_to_string(void); |
||
1355 | fftw_status fftw_import_wisdom_from_string(const char *input_string) |
||
1356 | @end example |
||
1357 | @findex fftw_export_wisdom_to_string |
||
1358 | @findex fftw_import_wisdom_from_string |
||
1359 | |||
1360 | @code{fftw_export_wisdom_to_string} allocates a string, exports the |
||
1361 | @code{wisdom} to it in @code{NULL}-terminated format, and returns a |
||
1362 | pointer to the string. If there is an error in allocating or writing |
||
1363 | the data, it returns @code{NULL}. The caller is responsible for |
||
1364 | deallocating the string (with @code{fftw_free}) when she is done with |
||
1365 | it. @code{fftw_import_wisdom_from_string} imports the @code{wisdom} from |
||
1366 | @code{input_string}, returning @code{FFTW_SUCCESS} if successful and |
||
1367 | @code{FFTW_FAILURE} otherwise. |
||
1368 | |||
1369 | Exporting @code{wisdom} does not affect the store of @code{wisdom}. Imported |
||
1370 | @code{wisdom} supplements the current store rather than replacing it |
||
1371 | (except when there is conflicting @code{wisdom}, in which case the older |
||
1372 | @code{wisdom} is discarded). The format of the exported @code{wisdom} is |
||
1373 | ``nerd-readable'' LISP-like ASCII text; we will not document it here |
||
1374 | except to note that it is insensitive to white space (interested users |
||
1375 | can contact us for more details). |
||
1376 | @cindex LISP |
||
1377 | @cindex nerd-readable text |
||
1378 | |||
1379 | @xref{FFTW Reference}, for more information, and for a description of |
||
1380 | how you can implement @code{wisdom} import/export for other media |
||
1381 | besides files and strings. |
||
1382 | |||
1383 | The following is a brief example in which the @code{wisdom} is read from |
||
1384 | a file, a plan is created (possibly generating more @code{wisdom}), and |
||
1385 | then the @code{wisdom} is exported to a string and printed to |
||
1386 | @code{stdout}. |
||
1387 | |||
1388 | @example |
||
1389 | @{ |
||
1390 | fftw_plan plan; |
||
1391 | char *wisdom_string; |
||
1392 | FILE *input_file; |
||
1393 | |||
1394 | /* open file to read wisdom from */ |
||
1395 | input_file = fopen("sample.wisdom", "r"); |
||
1396 | if (FFTW_FAILURE == fftw_import_wisdom_from_file(input_file)) |
||
1397 | printf("Error reading wisdom!\n"); |
||
1398 | fclose(input_file); /* be sure to close the file! */ |
||
1399 | |||
1400 | /* create a plan for N=64, possibly creating and/or using wisdom */ |
||
1401 | plan = fftw_create_plan(64,FFTW_FORWARD, |
||
1402 | FFTW_MEASURE | FFTW_USE_WISDOM); |
||
1403 | |||
1404 | /* ... do some computations with the plan ... */ |
||
1405 | |||
1406 | /* always destroy plans when you are done */ |
||
1407 | fftw_destroy_plan(plan); |
||
1408 | |||
1409 | /* write the wisdom to a string */ |
||
1410 | wisdom_string = fftw_export_wisdom_to_string(); |
||
1411 | if (wisdom_string != NULL) @{ |
||
1412 | printf("Accumulated wisdom: %s\n",wisdom_string); |
||
1413 | |||
1414 | /* Just for fun, destroy and restore the wisdom */ |
||
1415 | fftw_forget_wisdom(); /* all gone! */ |
||
1416 | fftw_import_wisdom_from_string(wisdom_string); |
||
1417 | /* wisdom is back! */ |
||
1418 | |||
1419 | fftw_free(wisdom_string); /* deallocate it since we're done */ |
||
1420 | @} |
||
1421 | @} |
||
1422 | @end example |
||
1423 | |||
1424 | @c ************************************************************ |
||
1425 | @node FFTW Reference, Parallel FFTW, Tutorial, Top |
||
1426 | @chapter FFTW Reference |
||
1427 | |||
1428 | This chapter provides a complete reference for all sequential (i.e., |
||
1429 | one-processor) FFTW functions. We first define the data types upon |
||
1430 | which FFTW operates, that is, real, complex, and ``halfcomplex'' numbers |
||
1431 | (@pxref{Data Types}). Then, in four sections, we explain the FFTW |
||
1432 | program interface for complex one-dimensional transforms |
||
1433 | (@pxref{One-dimensional Transforms Reference}), complex |
||
1434 | multi-dimensional transforms (@pxref{Multi-dimensional Transforms |
||
1435 | Reference}), and real one-dimensional transforms (@pxref{Real |
||
1436 | One-dimensional Transforms Reference}), real multi-dimensional |
||
1437 | transforms (@pxref{Real Multi-dimensional Transforms Reference}). |
||
1438 | @ref{Wisdom Reference} describes the @code{wisdom} mechanism for |
||
1439 | exporting and importing plans. Finally, @ref{Memory Allocator |
||
1440 | Reference} describes how to change FFTW's default memory allocator. |
||
1441 | For parallel transforms, @xref{Parallel FFTW}. |
||
1442 | |||
1443 | @menu |
||
1444 | * Data Types:: real, complex, and halfcomplex numbers |
||
1445 | * One-dimensional Transforms Reference:: |
||
1446 | * Multi-dimensional Transforms Reference:: |
||
1447 | * Real One-dimensional Transforms Reference:: |
||
1448 | * Real Multi-dimensional Transforms Reference:: |
||
1449 | * Wisdom Reference:: |
||
1450 | * Memory Allocator Reference:: |
||
1451 | * Thread safety:: |
||
1452 | @end menu |
||
1453 | |||
1454 | @c ------------------------------------------------------- |
||
1455 | @node Data Types, One-dimensional Transforms Reference, FFTW Reference, FFTW Reference |
||
1456 | @section Data Types |
||
1457 | @cindex real number |
||
1458 | @cindex complex number |
||
1459 | @cindex halfcomplex array |
||
1460 | |||
1461 | The routines in the FFTW package use three main kinds of data types. |
||
1462 | @dfn{Real} and @dfn{complex} numbers should be already known to the |
||
1463 | reader. We also use the term @dfn{halfcomplex} to describe complex |
||
1464 | arrays in a special packed format used by the one-dimensional real |
||
1465 | transforms (taking advantage of the @dfn{hermitian} symmetry that arises |
||
1466 | in those cases). |
||
1467 | |||
1468 | By including @code{<fftw.h>} or @code{<rfftw.h>}, you will have access |
||
1469 | to the following definitions: |
||
1470 | |||
1471 | @example |
||
1472 | typedef double fftw_real; |
||
1473 | |||
1474 | typedef struct @{ |
||
1475 | fftw_real re, im; |
||
1476 | @} fftw_complex; |
||
1477 | |||
1478 | #define c_re(c) ((c).re) |
||
1479 | #define c_im(c) ((c).im) |
||
1480 | @end example |
||
1481 | @tindex fftw_real |
||
1482 | @tindex fftw_complex |
||
1483 | |||
1484 | All FFTW operations are performed on the @code{fftw_real} and |
||
1485 | @code{fftw_complex} data types. For @code{fftw_complex} numbers, the |
||
1486 | two macros @code{c_re} and @code{c_im} retrieve, respectively, the real |
||
1487 | and imaginary parts of the number. |
||
1488 | |||
1489 | A @dfn{real array} is an array of real numbers. A @dfn{complex array} |
||
1490 | is an array of complex numbers. A one-dimensional array @math{X} of |
||
1491 | @math{n} complex numbers is @dfn{hermitian} if the following property |
||
1492 | holds: |
||
1493 | @iftex |
||
1494 | @tex |
||
1495 | for all $0 \leq i < n$, we have $X_i = X^{*}_{n-i}$, where |
||
1496 | $x^*$ denotes the complex conjugate of $x$. |
||
1497 | @end tex |
||
1498 | @end iftex |
||
1499 | @ifinfo |
||
1500 | for all @math{0 <= i < n}, we have @math{X[i] = conj(X[n-i])}. |
||
1501 | @end ifinfo |
||
1502 | @ifhtml |
||
1503 | for all 0 <= i < n, we have X<sub>i</sub> = conj(X<sub>n-i</sub>)}. |
||
1504 | @end ifhtml |
||
1505 | Hermitian arrays are relevant to FFTW because the Fourier transform of a |
||
1506 | real array is hermitian. |
||
1507 | |||
1508 | Because of its symmetry, a hermitian array can be stored in half the |
||
1509 | space of a complex array of the same size. FFTW's one-dimensional real |
||
1510 | transforms store hermitian arrays as @dfn{halfcomplex} arrays. A |
||
1511 | halfcomplex array of size @math{n} is |
||
1512 | @cindex hermitian array |
||
1513 | a one-dimensional array of @math{n} @code{fftw_real} numbers. A |
||
1514 | hermitian array @math{X} in stored into a halfcomplex array @math{Y} as |
||
1515 | follows. |
||
1516 | @iftex |
||
1517 | @tex |
||
1518 | For all integers $i$ such that $0 \leq i \leq n / 2$, we have $Y_i := |
||
1519 | \hbox{Re}(X_i)$. For all integers $i$ such that $0 < i < n / 2$, |
||
1520 | we have $Y_{n - i} := \hbox{Im}(X_i)$. |
||
1521 | @end tex |
||
1522 | @end iftex |
||
1523 | @ifinfo |
||
1524 | For all integers @math{i} such that @math{0 <= i <= n / 2}, we have |
||
1525 | @math{Y[i] = Re(X[i])}. For all integers @math{i} such that @math{0 < |
||
1526 | i < n / 2}, we have @math{Y[n-i] = Im(X[i])}. |
||
1527 | @end ifinfo |
||
1528 | @ifhtml |
||
1529 | For all integers i such that 0 <= i <= n / 2, we have |
||
1530 | Y<sub>i</sub> = Re(X<sub>i</sub>). For all integers i such that 0 |
||
1531 | < i < n / 2, we have Y<sub>n-i</sub> = Im(X<sub>i</sub>). |
||
1532 | @end ifhtml |
||
1533 | |||
1534 | We now illustrate halfcomplex storage for @math{n = 4} and @math{n = 5}, |
||
1535 | since the scheme depends on the parity of @math{n}. Let @math{n = 4}. |
||
1536 | In this case, we have |
||
1537 | @iftex |
||
1538 | @tex |
||
1539 | $Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$, |
||
1540 | $Y_2 := \hbox{Re}(X_2)$, and $Y_3 := \hbox{Im}(X_1)$. |
||
1541 | @end tex |
||
1542 | @end iftex |
||
1543 | @ifinfo |
||
1544 | @math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])}, |
||
1545 | @math{Y[2] = Re(X[2])}, and @math{Y[3] = Im(X[1])}. |
||
1546 | @end ifinfo |
||
1547 | @ifhtml |
||
1548 | Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>), |
||
1549 | Y<sub>2</sub> = Re(X<sub>2</sub>), and Y<sub>3</sub> = Im(X<sub>1</sub>). |
||
1550 | @end ifhtml |
||
1551 | Let now @math{n = 5}. In this case, we have |
||
1552 | @iftex |
||
1553 | @tex |
||
1554 | $Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$, |
||
1555 | $Y_2 := \hbox{Re}(X_2)$, $Y_3 := \hbox{Im}(X_2)$, and |
||
1556 | $Y_4 := \hbox{Im}(X_1)$. |
||
1557 | @end tex |
||
1558 | @end iftex |
||
1559 | @ifinfo |
||
1560 | @math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])}, |
||
1561 | @math{Y[2] = Re(X[2])}, @math{Y[3] = Im(X[2])}, and |
||
1562 | @math{Y[4] = Im(X[1])}. |
||
1563 | @end ifinfo |
||
1564 | @ifhtml |
||
1565 | Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>), |
||
1566 | Y<sub>2</sub> = Re(X<sub>2</sub>), Y<sub>3</sub> = Im(X<sub>2</sub>), |
||
1567 | and Y<sub>4</sub> = Im(X<sub>1</sub>). |
||
1568 | @end ifhtml |
||
1569 | |||
1570 | @cindex floating-point precision |
||
1571 | By default, the type @code{fftw_real} equals the C type @code{double}. |
||
1572 | To work in single precision rather than double precision, @code{#define} |
||
1573 | the symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} and then recompile |
||
1574 | the library. On Unix systems, you can instead use @code{configure |
||
1575 | --enable-float} at installation time (@pxref{Installation and |
||
1576 | Customization}). |
||
1577 | @fpindex configure |
||
1578 | @ctindex FFTW_ENABLE_FLOAT |
||
1579 | |||
1580 | In version 1 of FFTW, the data types were called @code{FFTW_REAL} and |
||
1581 | @code{FFTW_COMPLEX}. We changed the capitalization for consistency with |
||
1582 | the rest of FFTW's conventions. The old names are still supported, but |
||
1583 | their use is deprecated. |
||
1584 | @tindex FFTW_REAL |
||
1585 | @tindex FFTW_COMPLEX |
||
1586 | |||
1587 | @c ------------------------------------------------------- |
||
1588 | @node One-dimensional Transforms Reference, Multi-dimensional Transforms Reference, Data Types, FFTW Reference |
||
1589 | @section One-dimensional Transforms Reference |
||
1590 | |||
1591 | The one-dimensional complex routines are generally prefixed with |
||
1592 | @code{fftw_}. Programs using FFTW should be linked with @code{-lfftw |
||
1593 | -lm} on Unix systems, or with the FFTW and standard math libraries in |
||
1594 | general. |
||
1595 | |||
1596 | @menu |
||
1597 | * fftw_create_plan:: Plan Creation |
||
1598 | * Discussion on Specific Plans:: |
||
1599 | * fftw:: Plan Execution |
||
1600 | * fftw_destroy_plan:: Plan Destruction |
||
1601 | * What FFTW Really Computes:: Definition of the DFT. |
||
1602 | @end menu |
||
1603 | |||
1604 | @node fftw_create_plan, Discussion on Specific Plans, One-dimensional Transforms Reference, One-dimensional Transforms Reference |
||
1605 | @subsection Plan Creation for One-dimensional Transforms |
||
1606 | |||
1607 | @example |
||
1608 | #include <fftw.h> |
||
1609 | |||
1610 | fftw_plan fftw_create_plan(int n, fftw_direction dir, |
||
1611 | int flags); |
||
1612 | |||
1613 | fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, |
||
1614 | int flags, |
||
1615 | fftw_complex *in, int istride, |
||
1616 | fftw_complex *out, int ostride); |
||
1617 | @end example |
||
1618 | @tindex fftw_plan |
||
1619 | @tindex fftw_direction |
||
1620 | @findex fftw_create_plan |
||
1621 | @findex fftw_create_plan_specific |
||
1622 | |||
1623 | The function @code{fftw_create_plan} creates a plan, which is |
||
1624 | a data structure containing all the information that @code{fftw} |
||
1625 | needs in order to compute the 1D Fourier transform. You can |
||
1626 | create as many plans as you need, but only one plan for a given |
||
1627 | array size is required (a plan can be reused many times). |
||
1628 | |||
1629 | @code{fftw_create_plan} returns a valid plan, or @code{NULL} |
||
1630 | if, for some reason, the plan can't be created. In the |
||
1631 | default installation, this cannot happen, but it is possible |
||
1632 | to configure FFTW in such a way that some input sizes are |
||
1633 | forbidden, and FFTW cannot create a plan. |
||
1634 | |||
1635 | The @code{fftw_create_plan_specific} variant takes as additional |
||
1636 | arguments specific input/output arrays and their strides. For the last |
||
1637 | four arguments, you should pass the arrays and strides that you will |
||
1638 | eventually be passing to @code{fftw}. The resulting plans will be |
||
1639 | optimized for those arrays and strides, although they may be used on |
||
1640 | other arrays as well. Note: the contents of the in and out arrays are |
||
1641 | @emph{destroyed} by the specific planner (the initial contents are |
||
1642 | ignored, so the arrays need not have been initialized). |
||
1643 | |||
1644 | @subsubheading Arguments |
||
1645 | @itemize @bullet |
||
1646 | @item |
||
1647 | @code{n} is the size of the transform. It can be |
||
1648 | any positive integer. |
||
1649 | |||
1650 | @itemize @minus |
||
1651 | @item |
||
1652 | FFTW is best at handling sizes of the form |
||
1653 | @ifinfo |
||
1654 | @math{2^a 3^b 5^c 7^d 11^e 13^f}, |
||
1655 | @end ifinfo |
||
1656 | @iftex |
||
1657 | @tex |
||
1658 | $2^a 3^b 5^c 7^d 11^e 13^f$, |
||
1659 | @end tex |
||
1660 | @end iftex |
||
1661 | @ifhtml |
||
1662 | 2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP> |
||
1663 | 11<SUP>e</SUP> 13<SUP>f</SUP>, |
||
1664 | @end ifhtml |
||
1665 | where @math{e+f} is either @math{0} or |
||
1666 | @math{1}, and the other exponents are arbitrary. Other sizes are |
||
1667 | computed by means of a slow, general-purpose routine (which nevertheless |
||
1668 | retains |
||
1669 | @iftex |
||
1670 | @tex |
||
1671 | $O(n \log n)$ |
||
1672 | @end tex |
||
1673 | @end iftex |
||
1674 | @ifinfo |
||
1675 | O(n lg n) |
||
1676 | @end ifinfo |
||
1677 | @ifhtml |
||
1678 | O(n lg n) |
||
1679 | @end ifhtml |
||
1680 | performance, even for prime sizes). (It is |
||
1681 | possible to customize FFTW for different array sizes. |
||
1682 | @xref{Installation and Customization} for more information.) Transforms |
||
1683 | whose sizes are powers of @math{2} are especially fast. |
||
1684 | @end itemize |
||
1685 | |||
1686 | @item |
||
1687 | @code{dir} is the sign of the exponent in the formula that |
||
1688 | defines the Fourier transform. It can be @math{-1} or @math{+1}. |
||
1689 | The aliases @code{FFTW_FORWARD} and @code{FFTW_BACKWARD} |
||
1690 | are provided, where @code{FFTW_FORWARD} stands for @math{-1}. |
||
1691 | |||
1692 | @item |
||
1693 | @cindex flags |
||
1694 | @code{flags} is a boolean OR (@samp{|}) of zero or more of the following: |
||
1695 | @itemize @minus |
||
1696 | @item |
||
1697 | @code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by |
||
1698 | actually @emph{computing} several FFTs and measuring their |
||
1699 | execution time. Depending on the installation, this can take some |
||
1700 | time. @footnote{The basic problem is the resolution of the clock: |
||
1701 | FFTW needs to run for a certain time for the clock to be reliable.} |
||
1702 | |||
1703 | @item |
||
1704 | @code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable'' |
||
1705 | plan (for a RISC processor with many registers). If neither |
||
1706 | @code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is |
||
1707 | @code{FFTW_ESTIMATE}. |
||
1708 | |||
1709 | @item |
||
1710 | @code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input and |
||
1711 | output arrays will be distinct (this is the default). |
||
1712 | @ctindex FFTW_OUT_OF_PLACE |
||
1713 | |||
1714 | @item |
||
1715 | @cindex in-place transform |
||
1716 | @code{FFTW_IN_PLACE}: produce a plan assuming that you want the output |
||
1717 | in the input array. The algorithm used is not necessarily in place: |
||
1718 | FFTW is able to compute true in-place transforms only for small values |
||
1719 | of @code{n}. If FFTW is not able to compute the transform in-place, it |
||
1720 | will allocate a temporary array (unless you provide one yourself), |
||
1721 | compute the transform out of place, and copy the result back. |
||
1722 | @emph{Warning: This option changes the meaning of some parameters of |
||
1723 | @code{fftw}} (@pxref{fftw,,Computing the One-dimensional Transform}). |
||
1724 | |||
1725 | The in-place option is mainly provided for people who want to write |
||
1726 | their own in-place multi-dimensional Fourier transform, using FFTW as a |
||
1727 | base. For example, consider a three-dimensional @code{n * n * n} |
||
1728 | transform. An out-of-place algorithm will need another array (which may |
||
1729 | be huge). However, FFTW can compute the in-place transform along |
||
1730 | each dimension using only a temporary array of size @code{n}. |
||
1731 | Moreover, if FFTW happens to be able to compute the transform truly |
||
1732 | in-place, no temporary array and no copying are needed. As distributed, |
||
1733 | FFTW `knows' how to compute in-place transforms of size 1, 2, 3, 4, 5, 6, |
||
1734 | 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32 and 64. |
||
1735 | |||
1736 | The default mode of operation is @code{FFTW_OUT_OF_PLACE}. |
||
1737 | |||
1738 | @item |
||
1739 | @cindex wisdom |
||
1740 | @code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help |
||
1741 | in the creation of the plan. (@xref{Words of Wisdom}.) |
||
1742 | This can greatly speed the creation of plans, especially with the |
||
1743 | @code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take |
||
1744 | advantage of @code{wisdom} to produce a more optimal plan (based on past |
||
1745 | measurements) than the estimation heuristic would normally |
||
1746 | generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom} |
||
1747 | will also be generated if the current transform size is not completely |
||
1748 | understood by existing @code{wisdom}. |
||
1749 | |||
1750 | @end itemize |
||
1751 | |||
1752 | @item |
||
1753 | @code{in}, @code{out}, @code{istride}, @code{ostride} (only for |
||
1754 | @code{fftw_create_plan_specific}): see corresponding arguments in the |
||
1755 | description of @code{fftw}. (@xref{fftw,,Computing the One-dimensional |
||
1756 | Transform}.) In particular, the @code{out} and @code{ostride} |
||
1757 | parameters have the same special meaning for @code{FFTW_IN_PLACE} |
||
1758 | transforms as they have for @code{fftw}. |
||
1759 | |||
1760 | @end itemize |
||
1761 | |||
1762 | @node Discussion on Specific Plans, fftw, fftw_create_plan, One-dimensional Transforms Reference |
||
1763 | @subsection Discussion on Specific Plans |
||
1764 | @cindex specific planner |
||
1765 | We recommend the use of the specific planners, even in cases where you |
||
1766 | will be transforming arrays different from those passed to the specific |
||
1767 | planners, as they confer the following advantages: |
||
1768 | |||
1769 | @itemize @bullet |
||
1770 | |||
1771 | @item |
||
1772 | The resulting plans will be optimized for your specific arrays and |
||
1773 | strides. This may or may not make a significant difference, but it |
||
1774 | certainly doesn't hurt. (The ordinary planner does its planning based |
||
1775 | upon a stride-one temporary array that it allocates.) |
||
1776 | |||
1777 | @item |
||
1778 | Less intermediate storage is required during the planning process. (The |
||
1779 | ordinary planner uses O(@code{N}) temporary storage, where @code{N} is |
||
1780 | the maximum dimension, while it is creating the plan.) |
||
1781 | |||
1782 | @item |
||
1783 | For multi-dimensional transforms, new parameters become accessible for |
||
1784 | optimization by the planner. (Since multi-dimensional arrays can be |
||
1785 | very large, we don't dare to allocate one in the ordinary planner for |
||
1786 | experimentation. This prevents us from doing certain optimizations |
||
1787 | that can yield dramatic improvements in some cases.) |
||
1788 | |||
1789 | @end itemize |
||
1790 | |||
1791 | On the other hand, note that @emph{the specific planner destroys the |
||
1792 | contents of the @code{in} and @code{out} arrays}. |
||
1793 | |||
1794 | @node fftw, fftw_destroy_plan, Discussion on Specific Plans, One-dimensional Transforms Reference |
||
1795 | @subsection Computing the One-dimensional Transform |
||
1796 | |||
1797 | @example |
||
1798 | #include <fftw.h> |
||
1799 | |||
1800 | void fftw(fftw_plan plan, int howmany, |
||
1801 | fftw_complex *in, int istride, int idist, |
||
1802 | fftw_complex *out, int ostride, int odist); |
||
1803 | |||
1804 | void fftw_one(fftw_plan plan, fftw_complex *in, |
||
1805 | fftw_complex *out); |
||
1806 | @end example |
||
1807 | @findex fftw |
||
1808 | @findex fftw_one |
||
1809 | |||
1810 | The function @code{fftw} computes the one-dimensional Fourier transform, |
||
1811 | using a plan created by @code{fftw_create_plan} (@xref{fftw_create_plan, |
||
1812 | , Plan Creation for One-dimensional Transforms}.) The function |
||
1813 | @code{fftw_one} provides a simplified interface for the common case of |
||
1814 | single input array of stride 1. |
||
1815 | @cindex stride |
||
1816 | |||
1817 | @subsubheading Arguments |
||
1818 | @itemize @bullet |
||
1819 | @item |
||
1820 | @code{plan} is the plan created by @code{fftw_create_plan} |
||
1821 | (@pxref{fftw_create_plan,,Plan Creation for One-dimensional Transforms}). |
||
1822 | |||
1823 | @item |
||
1824 | @code{howmany} is the number of transforms @code{fftw} will compute. |
||
1825 | It is faster to tell FFTW to compute many transforms, instead of |
||
1826 | simply calling @code{fftw} many times. |
||
1827 | |||
1828 | @item |
||
1829 | @code{in}, @code{istride} and @code{idist} describe the input array(s). |
||
1830 | There are @code{howmany} input arrays; the first one is pointed to by |
||
1831 | @code{in}, the second one is pointed to by @code{in + idist}, and so on, |
||
1832 | up to @code{in + (howmany - 1) * idist}. Each input array consists of |
||
1833 | complex numbers (@pxref{Data Types}), which are not necessarily |
||
1834 | contiguous in memory. Specifically, @code{in[0]} is the first element |
||
1835 | of the first array, @code{in[istride]} is the second element of the |
||
1836 | first array, and so on. In general, the @code{i}-th element of the |
||
1837 | @code{j}-th input array will be in position @code{in[i * istride + j * |
||
1838 | idist]}. |
||
1839 | |||
1840 | @item |
||
1841 | @code{out}, @code{ostride} and @code{odist} describe the output |
||
1842 | array(s). The format is the same as for the input array. |
||
1843 | |||
1844 | @itemize @minus |
||
1845 | @item @emph{In-place transforms}: |
||
1846 | @cindex in-place transform |
||
1847 | If the @code{plan} specifies an in-place transform, @code{ostride} and |
||
1848 | @code{odist} are always ignored. If @code{out} is @code{NULL}, |
||
1849 | @code{out} is ignored, too. Otherwise, @code{out} is interpreted as a |
||
1850 | pointer to an array of @code{n} complex numbers, that FFTW will use as |
||
1851 | temporary space to perform the in-place computation. @code{out} is used |
||
1852 | as scratch space and its contents destroyed. In this case, @code{out} |
||
1853 | must be an ordinary array whose elements are contiguous in memory (no |
||
1854 | striding). |
||
1855 | @end itemize |
||
1856 | |||
1857 | @end itemize |
||
1858 | |||
1859 | The function @code{fftw_one} transforms a single, contiguous input array |
||
1860 | to a contiguous output array. By definition, the call |
||
1861 | @example |
||
1862 | fftw_one(plan, in, out) |
||
1863 | @end example |
||
1864 | is equivalent to |
||
1865 | @example |
||
1866 | fftw(plan, 1, in, 1, 1, out, 1, 1) |
||
1867 | @end example |
||
1868 | |||
1869 | @node fftw_destroy_plan, What FFTW Really Computes, fftw, One-dimensional Transforms Reference |
||
1870 | @subsection Destroying a One-dimensional Plan |
||
1871 | |||
1872 | @example |
||
1873 | #include <fftw.h> |
||
1874 | |||
1875 | void fftw_destroy_plan(fftw_plan plan); |
||
1876 | @end example |
||
1877 | @tindex fftw_destroy_plan |
||
1878 | |||
1879 | The function @code{fftw_destroy_plan} frees the plan @code{plan} and |
||
1880 | releases all the memory associated with it. After destruction, a plan |
||
1881 | is no longer valid. |
||
1882 | |||
1883 | @node What FFTW Really Computes, , fftw_destroy_plan, One-dimensional Transforms Reference |
||
1884 | @subsection What FFTW Really Computes |
||
1885 | @cindex Discrete Fourier Transform |
||
1886 | In this section, we define precisely what FFTW computes. Please be |
||
1887 | warned that different authors and software packages might employ |
||
1888 | different conventions than FFTW does. |
||
1889 | |||
1890 | The forward transform of a complex array @math{X} of size |
||
1891 | @math{n} computes an array @math{Y}, where |
||
1892 | @iftex |
||
1893 | @tex |
||
1894 | $$ |
||
1895 | Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n} \ . |
||
1896 | $$ |
||
1897 | @end tex |
||
1898 | @end iftex |
||
1899 | @ifinfo |
||
1900 | @center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n) . |
||
1901 | @end ifinfo |
||
1902 | @ifhtml |
||
1903 | <center><IMG SRC="equation-1.gif" ALIGN="top"></center> |
||
1904 | @end ifhtml |
||
1905 | |||
1906 | The backward transform computes |
||
1907 | @iftex |
||
1908 | @tex |
||
1909 | $$ |
||
1910 | Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n} \ . |
||
1911 | $$ |
||
1912 | @end tex |
||
1913 | @end iftex |
||
1914 | @ifinfo |
||
1915 | @center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n) . |
||
1916 | @end ifinfo |
||
1917 | @ifhtml |
||
1918 | <center><IMG SRC="equation-2.gif" ALIGN="top"></center> |
||
1919 | @end ifhtml |
||
1920 | |||
1921 | @cindex normalization |
||
1922 | FFTW computes an unnormalized transform, that is, the equation |
||
1923 | @math{IFFT(FFT(X)) = n X} holds. In other words, applying the forward |
||
1924 | and then the backward transform will multiply the input by @math{n}. |
||
1925 | |||
1926 | @cindex frequency |
||
1927 | An @code{FFTW_FORWARD} transform corresponds to a sign of @math{-1} in |
||
1928 | the exponent of the DFT. Note also that we use the standard |
||
1929 | ``in-order'' output ordering---the @math{k}-th output corresponds to the |
||
1930 | frequency @math{k/n} (or @math{k/T}, where @math{T} is your total |
||
1931 | sampling period). For those who like to think in terms of positive and |
||
1932 | negative frequencies, this means that the positive frequencies are |
||
1933 | stored in the first half of the output and the negative frequencies are |
||
1934 | stored in backwards order in the second half of the output. (The |
||
1935 | frequency @math{-k/n} is the same as the frequency @math{(n-k)/n}.) |
||
1936 | |||
1937 | @c ------------------------------------------------------- |
||
1938 | @node Multi-dimensional Transforms Reference, Real One-dimensional Transforms Reference, One-dimensional Transforms Reference, FFTW Reference |
||
1939 | @section Multi-dimensional Transforms Reference |
||
1940 | @cindex complex multi-dimensional transform |
||
1941 | @cindex multi-dimensional transform |
||
1942 | The multi-dimensional complex routines are generally prefixed with |
||
1943 | @code{fftwnd_}. Programs using FFTWND should be linked with @code{-lfftw |
||
1944 | -lm} on Unix systems, or with the FFTW and standard math libraries in |
||
1945 | general. |
||
1946 | @cindex FFTWND |
||
1947 | |||
1948 | @menu |
||
1949 | * fftwnd_create_plan:: Plan Creation |
||
1950 | * fftwnd:: Plan Execution |
||
1951 | * fftwnd_destroy_plan:: Plan Destruction |
||
1952 | * What FFTWND Really Computes:: |
||
1953 | @end menu |
||
1954 | |||
1955 | @node fftwnd_create_plan, fftwnd, Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference |
||
1956 | @subsection Plan Creation for Multi-dimensional Transforms |
||
1957 | |||
1958 | @example |
||
1959 | #include <fftw.h> |
||
1960 | |||
1961 | fftwnd_plan fftwnd_create_plan(int rank, const int *n, |
||
1962 | fftw_direction dir, int flags); |
||
1963 | |||
1964 | fftwnd_plan fftw2d_create_plan(int nx, int ny, |
||
1965 | fftw_direction dir, int flags); |
||
1966 | |||
1967 | fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz, |
||
1968 | fftw_direction dir, int flags); |
||
1969 | |||
1970 | fftwnd_plan fftwnd_create_plan_specific(int rank, const int *n, |
||
1971 | fftw_direction dir, |
||
1972 | int flags, |
||
1973 | fftw_complex *in, int istride, |
||
1974 | fftw_complex *out, int ostride); |
||
1975 | |||
1976 | fftwnd_plan fftw2d_create_plan_specific(int nx, int ny, |
||
1977 | fftw_direction dir, |
||
1978 | int flags, |
||
1979 | fftw_complex *in, int istride, |
||
1980 | fftw_complex *out, int ostride); |
||
1981 | |||
1982 | fftwnd_plan fftw3d_create_plan_specific(int nx, int ny, int nz, |
||
1983 | fftw_direction dir, int flags, |
||
1984 | fftw_complex *in, int istride, |
||
1985 | fftw_complex *out, int ostride); |
||
1986 | @end example |
||
1987 | @tindex fftwnd_plan |
||
1988 | @tindex fftw_direction |
||
1989 | @findex fftwnd_create_plan |
||
1990 | @findex fftw2d_create_plan |
||
1991 | @findex fftw3d_create_plan |
||
1992 | @findex fftwnd_create_plan_specific |
||
1993 | @findex fftw2d_create_plan_specific |
||
1994 | @findex fftw3d_create_plan_specific |
||
1995 | |||
1996 | The function @code{fftwnd_create_plan} creates a plan, which is a data |
||
1997 | structure containing all the information that @code{fftwnd} needs in |
||
1998 | order to compute a multi-dimensional Fourier transform. You can create |
||
1999 | as many plans as you need, but only one plan for a given array size is |
||
2000 | required (a plan can be reused many times). The functions |
||
2001 | @code{fftw2d_create_plan} and @code{fftw3d_create_plan} are optional, |
||
2002 | alternative interfaces to @code{fftwnd_create_plan} for two and three |
||
2003 | dimensions, respectively. |
||
2004 | |||
2005 | @code{fftwnd_create_plan} returns a valid plan, or @code{NULL} if, for |
||
2006 | some reason, the plan can't be created. This can happen if memory runs |
||
2007 | out or if the arguments are invalid in some way (e.g. if @code{rank} < |
||
2008 | 0). |
||
2009 | |||
2010 | The @code{create_plan_specific} variants take as additional arguments |
||
2011 | specific input/output arrays and their strides. For the last four |
||
2012 | arguments, you should pass the arrays and strides that you will |
||
2013 | eventually be passing to @code{fftwnd}. The resulting plans will be |
||
2014 | optimized for those arrays and strides, although they may be used on |
||
2015 | other arrays as well. Note: the contents of the in and out arrays are |
||
2016 | @emph{destroyed} by the specific planner (the initial contents are |
||
2017 | ignored, so the arrays need not have been initialized). |
||
2018 | @xref{Discussion on Specific Plans}, for a discussion on specific plans. |
||
2019 | |||
2020 | @subsubheading Arguments |
||
2021 | @itemize @bullet |
||
2022 | @item |
||
2023 | @code{rank} is the dimensionality of the arrays to be transformed. It |
||
2024 | can be any non-negative integer. |
||
2025 | |||
2026 | @item |
||
2027 | @code{n} is a pointer to an array of @code{rank} integers, giving the |
||
2028 | size of each dimension of the arrays to be transformed. These sizes, |
||
2029 | which must be positive integers, correspond to the dimensions of |
||
2030 | @cindex row-major |
||
2031 | row-major arrays---i.e. @code{n[0]} is the size of the dimension whose |
||
2032 | indices vary most slowly, and so on. (@xref{Multi-dimensional Array |
||
2033 | Format}, for more information on row-major storage.) |
||
2034 | @xref{fftw_create_plan,,Plan Creation for One-dimensional Transforms}, |
||
2035 | for more information regarding optimal array sizes. |
||
2036 | |||
2037 | @item |
||
2038 | @code{nx} and @code{ny} in @code{fftw2d_create_plan} are positive |
||
2039 | integers specifying the dimensions of the rank 2 array to be |
||
2040 | transformed. i.e. they specify that the transform will operate on |
||
2041 | @code{nx x ny} arrays in row-major order, where @code{nx} is the number |
||
2042 | of rows and @code{ny} is the number of columns. |
||
2043 | |||
2044 | @item |
||
2045 | @code{nx}, @code{ny} and @code{nz} in @code{fftw3d_create_plan} are |
||
2046 | positive integers specifying the dimensions of the rank 3 array to be |
||
2047 | transformed. i.e. they specify that the transform will operate on |
||
2048 | @code{nx x ny x nz} arrays in row-major order. |
||
2049 | |||
2050 | @item |
||
2051 | @code{dir} is the sign of the exponent in the formula that defines the |
||
2052 | Fourier transform. It can be @math{-1} or @math{+1}. The aliases |
||
2053 | @code{FFTW_FORWARD} and @code{FFTW_BACKWARD} are provided, where |
||
2054 | @code{FFTW_FORWARD} stands for @math{-1}. |
||
2055 | |||
2056 | @item |
||
2057 | @cindex flags |
||
2058 | @code{flags} is a boolean OR (@samp{|}) of zero or more of the following: |
||
2059 | @itemize @minus |
||
2060 | @item |
||
2061 | @code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by |
||
2062 | actually @emph{computing} several FFTs and measuring their execution |
||
2063 | time. |
||
2064 | |||
2065 | @item |
||
2066 | @code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable'' |
||
2067 | plan (for a RISC processor with many registers). If neither |
||
2068 | @code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is |
||
2069 | @code{FFTW_ESTIMATE}. |
||
2070 | |||
2071 | @item |
||
2072 | @code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input |
||
2073 | and output arrays will be distinct (this is the default). |
||
2074 | |||
2075 | @item |
||
2076 | @code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform |
||
2077 | the transform in-place. (Unlike the one-dimensional transform, this |
||
2078 | ``really'' @footnote{@code{fftwnd} actually may use some temporary |
||
2079 | storage (hidden in the plan), but this storage space is only the size of |
||
2080 | the largest dimension of the array, rather than being as big as the |
||
2081 | entire array. (Unless you use @code{fftwnd} to perform one-dimensional |
||
2082 | transforms, in which case the temporary storage required for in-place |
||
2083 | transforms @emph{is} as big as the entire array.)} performs the |
||
2084 | transform in-place.) Note that, if you want to perform in-place |
||
2085 | transforms, you @emph{must} use a plan created with this option. |
||
2086 | |||
2087 | The default mode of operation is @code{FFTW_OUT_OF_PLACE}. |
||
2088 | |||
2089 | @item |
||
2090 | @cindex wisdom |
||
2091 | @code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help |
||
2092 | in the creation of the plan. (@xref{Words of Wisdom}.) This can greatly |
||
2093 | speed the creation of plans, especially with the @code{FFTW_MEASURE} |
||
2094 | option. @code{FFTW_ESTIMATE} plans can also take advantage of |
||
2095 | @code{wisdom} to produce a more optimal plan (based on past |
||
2096 | measurements) than the estimation heuristic would normally |
||
2097 | generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom} |
||
2098 | will also be generated if the current transform size is not completely |
||
2099 | understood by existing @code{wisdom}. Note that the same @code{wisdom} |
||
2100 | is shared between one-dimensional and multi-dimensional transforms. |
||
2101 | |||
2102 | @end itemize |
||
2103 | |||
2104 | @item |
||
2105 | @code{in}, @code{out}, @code{istride}, @code{ostride} (only for the |
||
2106 | @code{_create_plan_specific} variants): see corresponding arguments in |
||
2107 | the description of @code{fftwnd}. (@xref{fftwnd,,Computing the |
||
2108 | Multi-dimensional Transform}.) |
||
2109 | |||
2110 | @end itemize |
||
2111 | |||
2112 | @node fftwnd, fftwnd_destroy_plan, fftwnd_create_plan, Multi-dimensional Transforms Reference |
||
2113 | @subsection Computing the Multi-dimensional Transform |
||
2114 | |||
2115 | @example |
||
2116 | #include <fftw.h> |
||
2117 | |||
2118 | void fftwnd(fftwnd_plan plan, int howmany, |
||
2119 | fftw_complex *in, int istride, int idist, |
||
2120 | fftw_complex *out, int ostride, int odist); |
||
2121 | |||
2122 | void fftwnd_one(fftwnd_plan p, fftw_complex *in, |
||
2123 | fftw_complex *out); |
||
2124 | @end example |
||
2125 | @findex fftwnd |
||
2126 | @findex fftwnd_one |
||
2127 | |||
2128 | The function @code{fftwnd} computes the multi-dimensional Fourier |
||
2129 | Transform, using a plan created by @code{fftwnd_create_plan} |
||
2130 | (@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional |
||
2131 | Transforms}). (Note that the plan determines the rank and dimensions of |
||
2132 | the array to be transformed.) The function @code{fftwnd_one} provides a |
||
2133 | simplified interface for the common case of single input array of stride |
||
2134 | 1. |
||
2135 | @cindex stride |
||
2136 | |||
2137 | @subsubheading Arguments |
||
2138 | @itemize @bullet |
||
2139 | @item |
||
2140 | @code{plan} is the plan created by @code{fftwnd_create_plan}. |
||
2141 | (@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional |
||
2142 | Transforms}). In the case of two and three-dimensional transforms, it |
||
2143 | could also have been created by @code{fftw2d_create_plan} or |
||
2144 | @code{fftw3d_create_plan}, respectively. |
||
2145 | |||
2146 | @item |
||
2147 | @code{howmany} is the number of transforms @code{fftwnd} will compute. |
||
2148 | |||
2149 | @item |
||
2150 | @code{in}, @code{istride} and @code{idist} describe the input array(s). |
||
2151 | There are @code{howmany} input arrays; the first one is pointed to by |
||
2152 | @code{in}, the second one is pointed to by @code{in + idist}, and so on, |
||
2153 | up to @code{in + (howmany - 1) * idist}. Each input array consists of |
||
2154 | complex numbers (@pxref{Data Types}), stored in row-major format |
||
2155 | (@pxref{Multi-dimensional Array Format}), which are not necessarily |
||
2156 | contiguous in memory. Specifically, @code{in[0]} is the first element |
||
2157 | of the first array, @code{in[istride]} is the second element of the |
||
2158 | first array, and so on. In general, the @code{i}-th element of the |
||
2159 | @code{j}-th input array will be in position @code{in[i * istride + j * |
||
2160 | idist]}. Note that, here, @code{i} refers to an index into the row-major |
||
2161 | format for the multi-dimensional array, rather than an index in any |
||
2162 | particular dimension. |
||
2163 | |||
2164 | @itemize @minus |
||
2165 | @item @emph{In-place transforms}: |
||
2166 | @cindex in-place transform |
||
2167 | For plans created with the @code{FFTW_IN_PLACE} option, the transform is |
||
2168 | computed in-place---the output is returned in the @code{in} array, using |
||
2169 | the same strides, etcetera, as were used in the input. |
||
2170 | @end itemize |
||
2171 | |||
2172 | @item |
||
2173 | @code{out}, @code{ostride} and @code{odist} describe the output array(s). |
||
2174 | The format is the same as for the input array. |
||
2175 | |||
2176 | @itemize @minus |
||
2177 | @item @emph{In-place transforms}: |
||
2178 | These parameters are ignored for plans created with the |
||
2179 | @code{FFTW_IN_PLACE} option. |
||
2180 | @end itemize |
||
2181 | |||
2182 | @end itemize |
||
2183 | |||
2184 | The function @code{fftwnd_one} transforms a single, contiguous input |
||
2185 | array to a contiguous output array. By definition, the call |
||
2186 | @example |
||
2187 | fftwnd_one(plan, in, out) |
||
2188 | @end example |
||
2189 | is equivalent to |
||
2190 | @example |
||
2191 | fftwnd(plan, 1, in, 1, 1, out, 1, 1) |
||
2192 | @end example |
||
2193 | |||
2194 | @node fftwnd_destroy_plan, What FFTWND Really Computes, fftwnd, Multi-dimensional Transforms Reference |
||
2195 | @subsection Destroying a Multi-dimensional Plan |
||
2196 | |||
2197 | @example |
||
2198 | #include <fftw.h> |
||
2199 | |||
2200 | void fftwnd_destroy_plan(fftwnd_plan plan); |
||
2201 | @end example |
||
2202 | @findex fftwnd_destroy_plan |
||
2203 | |||
2204 | The function @code{fftwnd_destroy_plan} frees the plan @code{plan} |
||
2205 | and releases all the memory associated with it. After destruction, |
||
2206 | a plan is no longer valid. |
||
2207 | |||
2208 | @node What FFTWND Really Computes, , fftwnd_destroy_plan, Multi-dimensional Transforms Reference |
||
2209 | @subsection What FFTWND Really Computes |
||
2210 | @cindex Discrete Fourier Transform |
||
2211 | |||
2212 | The conventions that we follow for the multi-dimensional transform are |
||
2213 | analogous to those for the one-dimensional transform. In particular, the |
||
2214 | forward transform has a negative sign in the exponent and neither the |
||
2215 | forward nor the backward transforms will perform any normalization. |
||
2216 | Computing the backward transform of the forward transform will multiply |
||
2217 | the array by the product of its dimensions. The output is in-order, and |
||
2218 | the zeroth element of the output is the amplitude of the zero frequency |
||
2219 | component. |
||
2220 | |||
2221 | @iftex |
||
2222 | @tex |
||
2223 | The exact mathematical definition of our multi-dimensional transform |
||
2224 | follows. Let $X$ be a $d$-dimensional complex array whose elements are |
||
2225 | $X[j_1, j_2, \ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ |
||
2226 | 1, 2, \ldots, d \}$. Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for |
||
2227 | all ~$s \in \{ 1, 2, \ldots, d \}$. |
||
2228 | |||
2229 | The forward transform computes a complex array~$Y$, whose |
||
2230 | structure is the same as that of~$X$, defined by |
||
2231 | |||
2232 | $$ |
||
2233 | Y[i_1, i_2, \ldots, i_d] = |
||
2234 | \sum_{j_1 = 0}^{n_1 - 1} |
||
2235 | \sum_{j_2 = 0}^{n_2 - 1} |
||
2236 | \cdots |
||
2237 | \sum_{j_d = 0}^{n_d - 1} |
||
2238 | X[j_1, j_2, \ldots, j_d] |
||
2239 | \omega_1^{-i_1 j_1} |
||
2240 | \omega_2^{-i_2 j_2} |
||
2241 | \cdots |
||
2242 | \omega_d^{-i_d j_d} \ . |
||
2243 | $$ |
||
2244 | |||
2245 | The backward transform computes |
||
2246 | $$ |
||
2247 | Y[i_1, i_2, \ldots, i_d] = |
||
2248 | \sum_{j_1 = 0}^{n_1 - 1} |
||
2249 | \sum_{j_2 = 0}^{n_2 - 1} |
||
2250 | \cdots |
||
2251 | \sum_{j_d = 0}^{n_d - 1} |
||
2252 | X[j_1, j_2, \ldots, j_d] |
||
2253 | \omega_1^{i_1 j_1} |
||
2254 | \omega_2^{i_2 j_2} |
||
2255 | \cdots |
||
2256 | \omega_d^{i_d j_d} \ . |
||
2257 | $$ |
||
2258 | |||
2259 | Computing the forward transform followed by the backward transform |
||
2260 | will multiply the array by $\prod_{s=1}^{d} n_d$. |
||
2261 | @end tex |
||
2262 | @end iftex |
||
2263 | @ifinfo |
||
2264 | The @TeX{} version of this manual contains the exact definition of the |
||
2265 | @math{n}-dimensional transform FFTW uses. It is not possible to |
||
2266 | display the definition on a ASCII terminal properly. |
||
2267 | @end ifinfo |
||
2268 | @ifhtml |
||
2269 | The Gods forbade using HTML to display mathematical formulas. Please |
||
2270 | see the TeX or Postscript version of this manual for the proper |
||
2271 | definition of the n-dimensional Fourier transform that FFTW |
||
2272 | uses. For completeness, we include a bitmap of the TeX output below: |
||
2273 | <P><center><IMG SRC="equation-3.gif" ALIGN="top"></center> |
||
2274 | @end ifhtml |
||
2275 | |||
2276 | @c ------------------------------------------------------- |
||
2277 | @node Real One-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference, FFTW Reference |
||
2278 | @section Real One-dimensional Transforms Reference |
||
2279 | |||
2280 | The one-dimensional real routines are generally prefixed with |
||
2281 | @code{rfftw_}. @footnote{The etymologically-correct spelling would be |
||
2282 | @code{frftw_}, but it is hard to remember.} Programs using RFFTW |
||
2283 | should be linked with @code{-lrfftw -lfftw -lm} on Unix systems, or with |
||
2284 | the RFFTW, the FFTW, and the standard math libraries in general. |
||
2285 | @cindex RFFTW |
||
2286 | @cindex real transform |
||
2287 | @cindex complex to real transform |
||
2288 | |||
2289 | @menu |
||
2290 | * rfftw_create_plan:: Plan Creation |
||
2291 | * rfftw:: Plan Execution |
||
2292 | * rfftw_destroy_plan:: Plan Destruction |
||
2293 | * What RFFTW Really Computes:: |
||
2294 | @end menu |
||
2295 | |||
2296 | @node rfftw_create_plan, rfftw, Real One-dimensional Transforms Reference, Real One-dimensional Transforms Reference |
||
2297 | @subsection Plan Creation for Real One-dimensional Transforms |
||
2298 | |||
2299 | @example |
||
2300 | #include <rfftw.h> |
||
2301 | |||
2302 | rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags); |
||
2303 | |||
2304 | rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, |
||
2305 | int flags, fftw_real *in, int istride, |
||
2306 | fftw_real *out, int ostride); |
||
2307 | @end example |
||
2308 | @tindex rfftw_plan |
||
2309 | @findex rfftw_create_plan |
||
2310 | @findex rfftw_create_plan_specific |
||
2311 | |||
2312 | The function @code{rfftw_create_plan} creates a plan, which is a data |
||
2313 | structure containing all the information that @code{rfftw} needs in |
||
2314 | order to compute the 1D real Fourier transform. You can create as many |
||
2315 | plans as you need, but only one plan for a given array size is required |
||
2316 | (a plan can be reused many times). |
||
2317 | |||
2318 | @code{rfftw_create_plan} returns a valid plan, or @code{NULL} if, for |
||
2319 | some reason, the plan can't be created. In the default installation, |
||
2320 | this cannot happen, but it is possible to configure RFFTW in such a way |
||
2321 | that some input sizes are forbidden, and RFFTW cannot create a plan. |
||
2322 | |||
2323 | The @code{rfftw_create_plan_specific} variant takes as additional |
||
2324 | arguments specific input/output arrays and their strides. For the last |
||
2325 | four arguments, you should pass the arrays and strides that you will |
||
2326 | eventually be passing to @code{rfftw}. The resulting plans will be |
||
2327 | optimized for those arrays and strides, although they may be used on |
||
2328 | other arrays as well. Note: the contents of the in and out arrays are |
||
2329 | @emph{destroyed} by the specific planner (the initial contents are |
||
2330 | ignored, so the arrays need not have been initialized). |
||
2331 | @xref{Discussion on Specific Plans}, for a discussion on specific plans. |
||
2332 | |||
2333 | @subsubheading Arguments |
||
2334 | @itemize @bullet |
||
2335 | @item |
||
2336 | @code{n} is the size of the transform. It can be |
||
2337 | any positive integer. |
||
2338 | |||
2339 | @itemize @minus |
||
2340 | @item |
||
2341 | RFFTW is best at handling sizes of the form |
||
2342 | @ifinfo |
||
2343 | @math{2^a 3^b 5^c 7^d 11^e 13^f}, |
||
2344 | @end ifinfo |
||
2345 | @iftex |
||
2346 | @tex |
||
2347 | $2^a 3^b 5^c 7^d 11^e 13^f$, |
||
2348 | @end tex |
||
2349 | @end iftex |
||
2350 | @ifhtml |
||
2351 | 2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP> |
||
2352 | 11<SUP>e</SUP> 13<SUP>f</SUP>, |
||
2353 | @end ifhtml |
||
2354 | where @math{e+f} is either @math{0} or |
||
2355 | @math{1}, and the other exponents are arbitrary. Other sizes are |
||
2356 | computed by means of a slow, general-purpose routine (reducing to |
||
2357 | @ifinfo |
||
2358 | @math{O(n^2)} |
||
2359 | @end ifinfo |
||
2360 | @iftex |
||
2361 | @tex |
||
2362 | $O(n^2)$ |
||
2363 | @end tex |
||
2364 | @end iftex |
||
2365 | @ifhtml |
||
2366 | O(n<sup>2</sup>) |
||
2367 | @end ifhtml |
||
2368 | performance for prime sizes). (It is possible to customize RFFTW for |
||
2369 | different array sizes. @xref{Installation and Customization}, for more |
||
2370 | information.) Transforms whose sizes are powers of @math{2} are |
||
2371 | especially fast. |
||
2372 | @end itemize |
||
2373 | |||
2374 | @item |
||
2375 | @code{dir} is the direction of the desired transform, either |
||
2376 | @code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL}, |
||
2377 | corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD}, |
||
2378 | respectively. |
||
2379 | @ctindex FFTW_REAL_TO_COMPLEX |
||
2380 | @ctindex FFTW_COMPLEX_TO_REAL |
||
2381 | |||
2382 | @item |
||
2383 | @cindex flags |
||
2384 | @code{flags} is a boolean OR (@samp{|}) of zero or more of the following: |
||
2385 | @itemize @minus |
||
2386 | @item |
||
2387 | @code{FFTW_MEASURE}: this flag tells RFFTW to find the optimal plan by |
||
2388 | actually @emph{computing} several FFTs and measuring their |
||
2389 | execution time. Depending on the installation, this can take some |
||
2390 | time. |
||
2391 | |||
2392 | @item |
||
2393 | @code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable'' |
||
2394 | plan (for a RISC processor with many registers). If neither |
||
2395 | @code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is |
||
2396 | @code{FFTW_ESTIMATE}. |
||
2397 | |||
2398 | @item |
||
2399 | @code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input |
||
2400 | and output arrays will be distinct (this is the default). |
||
2401 | |||
2402 | @item |
||
2403 | @code{FFTW_IN_PLACE}: produce a plan assuming that you want the output |
||
2404 | in the input array. The algorithm used is not necessarily in place: |
||
2405 | RFFTW is able to compute true in-place transforms only for small values |
||
2406 | of @code{n}. If RFFTW is not able to compute the transform in-place, it |
||
2407 | will allocate a temporary array (unless you provide one yourself), |
||
2408 | compute the transform out of place, and copy the result back. |
||
2409 | @emph{Warning: This option changes the meaning of some parameters of |
||
2410 | @code{rfftw}} (@pxref{rfftw,,Computing the Real One-dimensional Transform}). |
||
2411 | |||
2412 | The default mode of operation is @code{FFTW_OUT_OF_PLACE}. |
||
2413 | |||
2414 | @item |
||
2415 | @code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help |
||
2416 | in the creation of the plan. (@xref{Words of Wisdom}.) |
||
2417 | This can greatly speed the creation of plans, especially with the |
||
2418 | @code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take |
||
2419 | advantage of @code{wisdom} to produce a more optimal plan (based on past |
||
2420 | measurements) than the estimation heuristic would normally |
||
2421 | generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom} |
||
2422 | will also be generated if the current transform size is not completely |
||
2423 | understood by existing @code{wisdom}. |
||
2424 | |||
2425 | @end itemize |
||
2426 | |||
2427 | @item |
||
2428 | @code{in}, @code{out}, @code{istride}, @code{ostride} (only for |
||
2429 | @code{rfftw_create_plan_specific}): see corresponding arguments in the |
||
2430 | description of @code{rfftw}. (@xref{rfftw,,Computing the Real |
||
2431 | One-dimensional Transform}.) In particular, the @code{out} and |
||
2432 | @code{ostride} parameters have the same special meaning for |
||
2433 | @code{FFTW_IN_PLACE} transforms as they have for @code{rfftw}. |
||
2434 | |||
2435 | @end itemize |
||
2436 | |||
2437 | @node rfftw, rfftw_destroy_plan, rfftw_create_plan, Real One-dimensional Transforms Reference |
||
2438 | @subsection Computing the Real One-dimensional Transform |
||
2439 | |||
2440 | @example |
||
2441 | #include <rfftw.h> |
||
2442 | |||
2443 | void rfftw(rfftw_plan plan, int howmany, |
||
2444 | fftw_real *in, int istride, int idist, |
||
2445 | fftw_real *out, int ostride, int odist); |
||
2446 | |||
2447 | void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out); |
||
2448 | @end example |
||
2449 | @findex rfftw |
||
2450 | @findex rfftw_one |
||
2451 | |||
2452 | The function @code{rfftw} computes the Real One-dimensional Fourier |
||
2453 | Transform, using a plan created by @code{rfftw_create_plan} |
||
2454 | (@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional |
||
2455 | Transforms}). The function @code{rfftw_one} provides a simplified |
||
2456 | interface for the common case of single input array of stride 1. |
||
2457 | @cindex stride |
||
2458 | |||
2459 | @emph{Important:} When invoked for an out-of-place, |
||
2460 | @code{FFTW_COMPLEX_TO_REAL} transform, the input array is overwritten |
||
2461 | with scratch values by these routines. The input array is not modified |
||
2462 | for @code{FFTW_REAL_TO_COMPLEX} transforms. |
||
2463 | |||
2464 | @subsubheading Arguments |
||
2465 | @itemize @bullet |
||
2466 | @item |
||
2467 | @code{plan} is the plan created by @code{rfftw_create_plan} |
||
2468 | (@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional |
||
2469 | Transforms}). |
||
2470 | |||
2471 | @item |
||
2472 | @code{howmany} is the number of transforms @code{rfftw} will compute. |
||
2473 | It is faster to tell RFFTW to compute many transforms, instead of |
||
2474 | simply calling @code{rfftw} many times. |
||
2475 | |||
2476 | @item |
||
2477 | @code{in}, @code{istride} and @code{idist} describe the input array(s). |
||
2478 | There are two cases. If the @code{plan} defines a |
||
2479 | @code{FFTW_REAL_TO_COMPLEX} transform, @code{in} is a real array. |
||
2480 | Otherwise, for @code{FFTW_COMPLEX_TO_REAL} transforms, @code{in} is a |
||
2481 | halfcomplex array @emph{whose contents will be destroyed}. |
||
2482 | |||
2483 | @item |
||
2484 | @code{out}, @code{ostride} and @code{odist} describe the output |
||
2485 | array(s), and have the same meaning as the corresponding parameters for |
||
2486 | the input array. |
||
2487 | |||
2488 | @itemize @minus |
||
2489 | @item @emph{In-place transforms}: |
||
2490 | If the @code{plan} specifies an in-place transform, @code{ostride} and |
||
2491 | @code{odist} are always ignored. If @code{out} is @code{NULL}, |
||
2492 | @code{out} is ignored, too. Otherwise, @code{out} is interpreted as a |
||
2493 | pointer to an array of @code{n} complex numbers, that FFTW will use as |
||
2494 | temporary space to perform the in-place computation. @code{out} is used |
||
2495 | as scratch space and its contents destroyed. In this case, @code{out} |
||
2496 | must be an ordinary array whose elements are contiguous in memory (no |
||
2497 | striding). |
||
2498 | @end itemize |
||
2499 | |||
2500 | @end itemize |
||
2501 | |||
2502 | The function @code{rfftw_one} transforms a single, contiguous input array |
||
2503 | to a contiguous output array. By definition, the call |
||
2504 | @example |
||
2505 | rfftw_one(plan, in, out) |
||
2506 | @end example |
||
2507 | is equivalent to |
||
2508 | @example |
||
2509 | rfftw(plan, 1, in, 1, 1, out, 1, 1) |
||
2510 | @end example |
||
2511 | |||
2512 | @node rfftw_destroy_plan, What RFFTW Really Computes, rfftw, Real One-dimensional Transforms Reference |
||
2513 | @subsection Destroying a Real One-dimensional Plan |
||
2514 | |||
2515 | @example |
||
2516 | #include <rfftw.h> |
||
2517 | |||
2518 | void rfftw_destroy_plan(rfftw_plan plan); |
||
2519 | @end example |
||
2520 | @findex rfftw_destroy_plan |
||
2521 | |||
2522 | The function @code{rfftw_destroy_plan} frees the plan @code{plan} and |
||
2523 | releases all the memory associated with it. After destruction, a plan |
||
2524 | is no longer valid. |
||
2525 | |||
2526 | @node What RFFTW Really Computes, , rfftw_destroy_plan, Real One-dimensional Transforms Reference |
||
2527 | @subsection What RFFTW Really Computes |
||
2528 | @cindex Discrete Fourier Transform |
||
2529 | In this section, we define precisely what RFFTW computes. |
||
2530 | |||
2531 | The real to complex (@code{FFTW_REAL_TO_COMPLEX}) transform of a real |
||
2532 | array @math{X} of size @math{n} computes an hermitian array @math{Y}, |
||
2533 | where |
||
2534 | @iftex |
||
2535 | @tex |
||
2536 | $$ |
||
2537 | Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n} |
||
2538 | $$ |
||
2539 | @end tex |
||
2540 | @end iftex |
||
2541 | @ifinfo |
||
2542 | @center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n) |
||
2543 | @end ifinfo |
||
2544 | @ifhtml |
||
2545 | <center><IMG SRC="equation-1.gif" ALIGN="top"></center> |
||
2546 | @end ifhtml |
||
2547 | (That @math{Y} is a hermitian array is not intended to be obvious, |
||
2548 | although the proof is easy.) The hermitian array @math{Y} is stored in |
||
2549 | halfcomplex order (@pxref{Data Types}). Currently, RFFTW provides no |
||
2550 | way to compute a real to complex transform with a positive sign in the |
||
2551 | exponent. |
||
2552 | |||
2553 | The complex to real (@code{FFTW_COMPLEX_TO_REAL}) transform of a hermitian |
||
2554 | array @math{X} of size @math{n} computes a real array @math{Y}, where |
||
2555 | @iftex |
||
2556 | @tex |
||
2557 | $$ |
||
2558 | Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n} |
||
2559 | $$ |
||
2560 | @end tex |
||
2561 | @end iftex |
||
2562 | @ifinfo |
||
2563 | @center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n) |
||
2564 | @end ifinfo |
||
2565 | @ifhtml |
||
2566 | <center><IMG SRC="equation-2.gif" ALIGN="top"></center> |
||
2567 | @end ifhtml |
||
2568 | (That @math{Y} is a real array is not intended to be obvious, although |
||
2569 | the proof is easy.) The hermitian input array @math{X} is stored in |
||
2570 | halfcomplex order (@pxref{Data Types}). Currently, RFFTW provides no |
||
2571 | way to compute a complex to real transform with a negative sign in the |
||
2572 | exponent. |
||
2573 | |||
2574 | @cindex normalization |
||
2575 | Like FFTW, RFFTW computes an unnormalized transform. In other words, |
||
2576 | applying the real to complex (forward) and then the complex to real |
||
2577 | (backward) transform will multiply the input by @math{n}. |
||
2578 | |||
2579 | @c ------------------------------------------------------- |
||
2580 | @node Real Multi-dimensional Transforms Reference, Wisdom Reference, Real One-dimensional Transforms Reference, FFTW Reference |
||
2581 | @section Real Multi-dimensional Transforms Reference |
||
2582 | @cindex real multi-dimensional transform |
||
2583 | @cindex multi-dimensional transform |
||
2584 | |||
2585 | The multi-dimensional real routines are generally prefixed with |
||
2586 | @code{rfftwnd_}. Programs using RFFTWND should be linked with |
||
2587 | @code{-lrfftw -lfftw -lm} on Unix systems, or with the FFTW, RFFTW, and |
||
2588 | standard math libraries in general. |
||
2589 | @cindex RFFTWND |
||
2590 | |||
2591 | @menu |
||
2592 | * rfftwnd_create_plan:: Plan Creation |
||
2593 | * rfftwnd:: Plan Execution |
||
2594 | * Array Dimensions for Real Multi-dimensional Transforms:: |
||
2595 | * Strides in In-place RFFTWND:: |
||
2596 | * rfftwnd_destroy_plan:: Plan Destruction |
||
2597 | * What RFFTWND Really Computes:: |
||
2598 | @end menu |
||
2599 | |||
2600 | @node rfftwnd_create_plan, rfftwnd, Real Multi-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference |
||
2601 | @subsection Plan Creation for Real Multi-dimensional Transforms |
||
2602 | |||
2603 | @example |
||
2604 | #include <rfftw.h> |
||
2605 | |||
2606 | rfftwnd_plan rfftwnd_create_plan(int rank, const int *n, |
||
2607 | fftw_direction dir, int flags); |
||
2608 | |||
2609 | rfftwnd_plan rfftw2d_create_plan(int nx, int ny, |
||
2610 | fftw_direction dir, int flags); |
||
2611 | |||
2612 | rfftwnd_plan rfftw3d_create_plan(int nx, int ny, int nz, |
||
2613 | fftw_direction dir, int flags); |
||
2614 | @end example |
||
2615 | @tindex rfftwnd_plan |
||
2616 | @tindex fftw_direction |
||
2617 | @findex rfftwnd_create_plan |
||
2618 | @findex rfftw2d_create_plan |
||
2619 | @findex rfftw3d_create_plan |
||
2620 | |||
2621 | The function @code{rfftwnd_create_plan} creates a plan, which is a data |
||
2622 | structure containing all the information that @code{rfftwnd} needs in |
||
2623 | order to compute a multi-dimensional real Fourier transform. You can |
||
2624 | create as many plans as you need, but only one plan for a given array |
||
2625 | size is required (a plan can be reused many times). The functions |
||
2626 | @code{rfftw2d_create_plan} and @code{rfftw3d_create_plan} are optional, |
||
2627 | alternative interfaces to @code{rfftwnd_create_plan} for two and three |
||
2628 | dimensions, respectively. |
||
2629 | |||
2630 | @code{rfftwnd_create_plan} returns a valid plan, or @code{NULL} if, for |
||
2631 | some reason, the plan can't be created. This can happen if the |
||
2632 | arguments are invalid in some way (e.g. if @code{rank} < 0). |
||
2633 | |||
2634 | @subsubheading Arguments |
||
2635 | @itemize @bullet |
||
2636 | @item |
||
2637 | @code{rank} is the dimensionality of the arrays to be transformed. It |
||
2638 | can be any non-negative integer. |
||
2639 | |||
2640 | @item |
||
2641 | @code{n} is a pointer to an array of @code{rank} integers, giving the |
||
2642 | size of each dimension of the arrays to be transformed. Note that these |
||
2643 | are always the dimensions of the @emph{real} arrays; the complex arrays |
||
2644 | have different dimensions (@pxref{Array Dimensions for Real |
||
2645 | Multi-dimensional Transforms}). These sizes, which must be positive |
||
2646 | integers, correspond to the dimensions of row-major |
||
2647 | arrays---i.e. @code{n[0]} is the size of the dimension whose indices |
||
2648 | vary most slowly, and so on. (@xref{Multi-dimensional Array Format}, for |
||
2649 | more information.) |
||
2650 | @itemize @minus |
||
2651 | @item |
||
2652 | @xref{rfftw_create_plan,,Plan Creation for Real One-dimensional Transforms}, |
||
2653 | for more information regarding optimal array sizes. |
||
2654 | @end itemize |
||
2655 | |||
2656 | @item |
||
2657 | @code{nx} and @code{ny} in @code{rfftw2d_create_plan} are positive |
||
2658 | integers specifying the dimensions of the rank 2 array to be |
||
2659 | transformed. i.e. they specify that the transform will operate on |
||
2660 | @code{nx x ny} arrays in row-major order, where @code{nx} is the number |
||
2661 | of rows and @code{ny} is the number of columns. |
||
2662 | |||
2663 | @item |
||
2664 | @code{nx}, @code{ny} and @code{nz} in @code{rfftw3d_create_plan} are |
||
2665 | positive integers specifying the dimensions of the rank 3 array to be |
||
2666 | transformed. i.e. they specify that the transform will operate on |
||
2667 | @code{nx x ny x nz} arrays in row-major order. |
||
2668 | |||
2669 | @item |
||
2670 | @code{dir} is the direction of the desired transform, either |
||
2671 | @code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL}, |
||
2672 | corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD}, |
||
2673 | respectively. |
||
2674 | |||
2675 | @item |
||
2676 | @cindex flags |
||
2677 | @code{flags} is a boolean OR (@samp{|}) of zero or more of the following: |
||
2678 | @itemize @minus |
||
2679 | @item |
||
2680 | @code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by |
||
2681 | actually @emph{computing} several FFTs and measuring their execution |
||
2682 | time. |
||
2683 | |||
2684 | @item |
||
2685 | @code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable'' |
||
2686 | plan (for a RISC processor with many registers). If neither |
||
2687 | @code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is |
||
2688 | @code{FFTW_ESTIMATE}. |
||
2689 | |||
2690 | @item |
||
2691 | @code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input |
||
2692 | and output arrays will be distinct (this is the default). |
||
2693 | |||
2694 | @item |
||
2695 | @cindex in-place transform |
||
2696 | @code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform |
||
2697 | the transform in-place. (Unlike the one-dimensional transform, this |
||
2698 | ``really'' performs the transform in-place.) Note that, if you want to |
||
2699 | perform in-place transforms, you @emph{must} use a plan created with |
||
2700 | this option. The use of this option has important implications for the |
||
2701 | size of the input/output array (@pxref{rfftwnd,,Computing the Real |
||
2702 | Multi-dimensional Transform}). |
||
2703 | |||
2704 | The default mode of operation is @code{FFTW_OUT_OF_PLACE}. |
||
2705 | |||
2706 | @item |
||
2707 | @cindex wisdom |
||
2708 | @code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help |
||
2709 | in the creation of the plan. (@xref{Words of Wisdom}.) This can greatly |
||
2710 | speed the creation of plans, especially with the @code{FFTW_MEASURE} |
||
2711 | option. @code{FFTW_ESTIMATE} plans can also take advantage of |
||
2712 | @code{wisdom} to produce a more optimal plan (based on past |
||
2713 | measurements) than the estimation heuristic would normally |
||
2714 | generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom} |
||
2715 | will also be generated if the current transform size is not completely |
||
2716 | understood by existing @code{wisdom}. Note that the same @code{wisdom} |
||
2717 | is shared between one-dimensional and multi-dimensional transforms. |
||
2718 | |||
2719 | @end itemize |
||
2720 | |||
2721 | @end itemize |
||
2722 | |||
2723 | @node rfftwnd, Array Dimensions for Real Multi-dimensional Transforms, rfftwnd_create_plan, Real Multi-dimensional Transforms Reference |
||
2724 | @subsection Computing the Real Multi-dimensional Transform |
||
2725 | |||
2726 | @example |
||
2727 | #include <rfftw.h> |
||
2728 | |||
2729 | void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, |
||
2730 | fftw_real *in, int istride, int idist, |
||
2731 | fftw_complex *out, int ostride, int odist); |
||
2732 | void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, |
||
2733 | fftw_complex *in, int istride, int idist, |
||
2734 | fftw_real *out, int ostride, int odist); |
||
2735 | |||
2736 | void rfftwnd_one_real_to_complex(rfftwnd_plan p, fftw_real *in, |
||
2737 | fftw_complex *out); |
||
2738 | void rfftwnd_one_complex_to_real(rfftwnd_plan p, fftw_complex *in, |
||
2739 | fftw_real *out); |
||
2740 | @end example |
||
2741 | @findex rfftwnd_real_to_complex |
||
2742 | @findex rfftwnd_complex_to_real |
||
2743 | @findex rfftwnd_one_real_to_complex |
||
2744 | @findex rfftwnd_one_complex_to_real |
||
2745 | |||
2746 | These functions compute the real multi-dimensional Fourier Transform, |
||
2747 | using a plan created by @code{rfftwnd_create_plan} |
||
2748 | (@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional |
||
2749 | Transforms}). (Note that the plan determines the rank and dimensions of |
||
2750 | the array to be transformed.) The @samp{@code{rfftwnd_one_}} functions |
||
2751 | provide a simplified interface for the common case of single input array |
||
2752 | of stride 1. Unlike other transform routines in FFTW, we here use |
||
2753 | separate functions for the two directions of the transform in order to |
||
2754 | correctly express the datatypes of the parameters. |
||
2755 | |||
2756 | @emph{Important:} When invoked for an out-of-place, |
||
2757 | @code{FFTW_COMPLEX_TO_REAL} transform with @code{rank > 1}, the input |
||
2758 | array is overwritten with scratch values by these routines. The input |
||
2759 | array is not modified for @code{FFTW_REAL_TO_COMPLEX} transforms or for |
||
2760 | @code{FFTW_COMPLEX_TO_REAL} with @code{rank == 1}. |
||
2761 | |||
2762 | @subsubheading Arguments |
||
2763 | @itemize @bullet |
||
2764 | @item |
||
2765 | @code{plan} is the plan created by @code{rfftwnd_create_plan}. |
||
2766 | (@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional |
||
2767 | Transforms}). In the case of two and three-dimensional transforms, it |
||
2768 | could also have been created by @code{rfftw2d_create_plan} or |
||
2769 | @code{rfftw3d_create_plan}, respectively. |
||
2770 | |||
2771 | @code{FFTW_REAL_TO_COMPLEX} plans must be used with the |
||
2772 | @samp{@code{real_to_complex}} functions, and @code{FFTW_COMPLEX_TO_REAL} |
||
2773 | plans must be used with the @samp{@code{complex_to_real}} functions. It |
||
2774 | is an error to mismatch the plan direction and the transform function. |
||
2775 | |||
2776 | @item |
||
2777 | @code{howmany} is the number of transforms to be computed. |
||
2778 | |||
2779 | @item |
||
2780 | @cindex stride |
||
2781 | @code{in}, @code{istride} and @code{idist} describe the input array(s). |
||
2782 | There are @code{howmany} input arrays; the first one is pointed to by |
||
2783 | @code{in}, the second one is pointed to by @code{in + idist}, and so on, |
||
2784 | up to @code{in + (howmany - 1) * idist}. Each input array is stored in |
||
2785 | row-major format (@pxref{Multi-dimensional Array Format}), and is not |
||
2786 | necessarily contiguous in memory. Specifically, @code{in[0]} is the |
||
2787 | first element of the first array, @code{in[istride]} is the second |
||
2788 | element of the first array, and so on. In general, the @code{i}-th |
||
2789 | element of the @code{j}-th input array will be in position @code{in[i * |
||
2790 | istride + j * idist]}. Note that, here, @code{i} refers to an index into |
||
2791 | the row-major format for the multi-dimensional array, rather than an |
||
2792 | index in any particular dimension. |
||
2793 | |||
2794 | The dimensions of the arrays are different for real and complex data, |
||
2795 | and are discussed in more detail below (@pxref{Array Dimensions for Real |
||
2796 | Multi-dimensional Transforms}). |
||
2797 | |||
2798 | @itemize @minus |
||
2799 | @item @emph{In-place transforms}: |
||
2800 | For plans created with the @code{FFTW_IN_PLACE} option, the transform is |
||
2801 | computed in-place---the output is returned in the @code{in} array. The |
||
2802 | meaning of the @code{stride} and @code{dist} parameters in this case is |
||
2803 | subtle and is discussed below (@pxref{Strides in In-place RFFTWND}). |
||
2804 | @end itemize |
||
2805 | |||
2806 | @item |
||
2807 | @code{out}, @code{ostride} and @code{odist} describe the output |
||
2808 | array(s). The format is the same as that for the input array. See |
||
2809 | below for a discussion of the dimensions of the output array for real |
||
2810 | and complex data. |
||
2811 | |||
2812 | @itemize @minus |
||
2813 | @item @emph{In-place transforms}: |
||
2814 | These parameters are ignored for plans created with the |
||
2815 | @code{FFTW_IN_PLACE} option. |
||
2816 | @end itemize |
||
2817 | |||
2818 | @end itemize |
||
2819 | |||
2820 | The function @code{rfftwnd_one} transforms a single, contiguous input |
||
2821 | array to a contiguous output array. By definition, the call |
||
2822 | @example |
||
2823 | rfftwnd_one_...(plan, in, out) |
||
2824 | @end example |
||
2825 | is equivalent to |
||
2826 | @example |
||
2827 | rfftwnd_...(plan, 1, in, 1, 1, out, 1, 1) |
||
2828 | @end example |
||
2829 | |||
2830 | @node Array Dimensions for Real Multi-dimensional Transforms, Strides in In-place RFFTWND, rfftwnd, Real Multi-dimensional Transforms Reference |
||
2831 | @subsection Array Dimensions for Real Multi-dimensional Transforms |
||
2832 | |||
2833 | @cindex rfftwnd array format |
||
2834 | The output of a multi-dimensional transform of real data contains |
||
2835 | symmetries that, in principle, make half of the outputs redundant |
||
2836 | (@pxref{What RFFTWND Really Computes}). In practice, it is not |
||
2837 | possible to entirely realize these savings in an efficient and |
||
2838 | understandable format. Instead, the output of the rfftwnd transforms is |
||
2839 | @emph{slightly} over half of the output of the corresponding complex |
||
2840 | transform. We do not ``pack'' the data in any way, but store it as an |
||
2841 | ordinary array of @code{fftw_complex} values. In fact, this data is |
||
2842 | simply a subsection of what would be the array in the corresponding |
||
2843 | complex transform. |
||
2844 | |||
2845 | Specifically, for a real transform of dimensions |
||
2846 | @iftex |
||
2847 | @tex |
||
2848 | $n_1 \times n_2 \times \cdots \times n_d$, |
||
2849 | @end tex |
||
2850 | @end iftex |
||
2851 | @ifinfo |
||
2852 | n1 x n2 x ... x nd, |
||
2853 | @end ifinfo |
||
2854 | @ifhtml |
||
2855 | n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub>, |
||
2856 | @end ifhtml |
||
2857 | the complex data is an |
||
2858 | @iftex |
||
2859 | @tex |
||
2860 | $n_1 \times n_2 \times \cdots \times (n_d/2+1)$ |
||
2861 | @end tex |
||
2862 | @end iftex |
||
2863 | @ifinfo |
||
2864 | n1 x n2 x ... x (nd/2+1) |
||
2865 | @end ifinfo |
||
2866 | @ifhtml |
||
2867 | n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1) |
||
2868 | @end ifhtml |
||
2869 | array of @code{fftw_complex} values in row-major order (with the |
||
2870 | division rounded down). That is, we only store the lower half (plus one |
||
2871 | element) of the last dimension of the data from the ordinary complex |
||
2872 | transform. (We could have instead taken half of any other dimension, |
||
2873 | but implementation turns out to be simpler if the last, contiguous, |
||
2874 | dimension is used.) |
||
2875 | |||
2876 | @cindex in-place transform |
||
2877 | @cindex padding |
||
2878 | Since the complex data is slightly larger than the real data, some |
||
2879 | complications arise for in-place transforms. In this case, the final |
||
2880 | dimension of the real data must be padded with extra values to |
||
2881 | accommodate the size of the complex data---two extra if the last |
||
2882 | dimension is even and one if it is odd. That is, the last dimension of |
||
2883 | the real data must physically contain |
||
2884 | @iftex |
||
2885 | @tex |
||
2886 | $2 (n_d/2+1)$ |
||
2887 | @end tex |
||
2888 | @end iftex |
||
2889 | @ifinfo |
||
2890 | 2 * (nd/2+1) |
||
2891 | @end ifinfo |
||
2892 | @ifhtml |
||
2893 | 2 * (n<sub>d</sub>/2+1) |
||
2894 | @end ifhtml |
||
2895 | @code{fftw_real} values (exactly enough to hold the complex data). |
||
2896 | This physical array size does not, however, change the @emph{logical} |
||
2897 | array size---only |
||
2898 | @iftex |
||
2899 | @tex |
||
2900 | $n_d$ |
||
2901 | @end tex |
||
2902 | @end iftex |
||
2903 | @ifinfo |
||
2904 | nd |
||
2905 | @end ifinfo |
||
2906 | @ifhtml |
||
2907 | n<sub>d</sub> |
||
2908 | @end ifhtml |
||
2909 | values are actually stored in the last dimension, and |
||
2910 | @iftex |
||
2911 | @tex |
||
2912 | $n_d$ |
||
2913 | @end tex |
||
2914 | @end iftex |
||
2915 | @ifinfo |
||
2916 | nd |
||
2917 | @end ifinfo |
||
2918 | @ifhtml |
||
2919 | n<sub>d</sub> |
||
2920 | @end ifhtml |
||
2921 | is the last dimension passed to @code{rfftwnd_create_plan}. |
||
2922 | |||
2923 | @node Strides in In-place RFFTWND, rfftwnd_destroy_plan, Array Dimensions for Real Multi-dimensional Transforms, Real Multi-dimensional Transforms Reference |
||
2924 | @subsection Strides in In-place RFFTWND |
||
2925 | |||
2926 | @cindex rfftwnd array format |
||
2927 | @cindex stride |
||
2928 | The fact that the input and output datatypes are different for rfftwnd |
||
2929 | complicates the meaning of the @code{stride} and @code{dist} parameters |
||
2930 | of in-place transforms---are they in units of @code{fftw_real} or |
||
2931 | @code{fftw_complex} elements? When reading the input, they are |
||
2932 | interpreted in units of the datatype of the input data. When writing |
||
2933 | the output, the @code{istride} and @code{idist} are translated to the |
||
2934 | output datatype's ``units'' in one of two ways, corresponding to the two |
||
2935 | most common situations in which @code{stride} and @code{dist} parameters |
||
2936 | are useful. Below, we refer to these ``translated'' parameters as |
||
2937 | @code{ostride_t} and @code{odist_t}. (Note that these are computed |
||
2938 | internally by rfftwnd; the actual @code{ostride} and @code{odist} |
||
2939 | parameters are ignored for in-place transforms.) |
||
2940 | |||
2941 | First, there is the case where you are transforming a number of |
||
2942 | contiguous arrays located one after another in memory. In this |
||
2943 | situation, @code{istride} is @code{1} and @code{idist} is the product of |
||
2944 | the physical dimensions of the array. @code{ostride_t} and |
||
2945 | @code{odist_t} are then chosen so that the output arrays are contiguous |
||
2946 | and lie on top of the input arrays. @code{ostride_t} is therefore |
||
2947 | @code{1}. For a real-to-complex transform, @code{odist_t} is |
||
2948 | @code{idist/2}; for a complex-to-real transform, @code{odist_t} is |
||
2949 | @code{idist*2}. |
||
2950 | |||
2951 | The second case is when you have an array in which each element has |
||
2952 | @code{nc} components (e.g. a structure with @code{nc} numeric fields), |
||
2953 | and you want to transform all of the components at once. Here, |
||
2954 | @code{istride} is @code{nc} and @code{idist} is @code{1}. For this |
||
2955 | case, it is natural to want the output to also have @code{nc} |
||
2956 | consecutive components, now of the output data type; this is exactly |
||
2957 | what rfftwnd does. Specifically, it uses an @code{ostride_t} equal to |
||
2958 | @code{istride}, and an @code{odist_t} of @code{1}. (Astute readers will |
||
2959 | realize that some extra buffer space is required in order to perform |
||
2960 | such a transform; this is handled automatically by rfftwnd.) |
||
2961 | |||
2962 | The general rule is as follows. @code{ostride_t} equals @code{istride}. |
||
2963 | If @code{idist} is @code{1}, then @code{odist_t} is @code{1}. |
||
2964 | Otherwise, for a real-to-complex transform @code{odist_t} is |
||
2965 | @code{idist/2} and for a complex-to-real transform @code{odist_t} is |
||
2966 | @code{idist*2}. |
||
2967 | |||
2968 | @node rfftwnd_destroy_plan, What RFFTWND Really Computes, Strides in In-place RFFTWND, Real Multi-dimensional Transforms Reference |
||
2969 | @subsection Destroying a Multi-dimensional Plan |
||
2970 | |||
2971 | @example |
||
2972 | #include <rfftw.h> |
||
2973 | |||
2974 | void rfftwnd_destroy_plan(rfftwnd_plan plan); |
||
2975 | @end example |
||
2976 | @findex rfftwnd_destroy_plan |
||
2977 | |||
2978 | The function @code{rfftwnd_destroy_plan} frees the plan @code{plan} |
||
2979 | and releases all the memory associated with it. After destruction, |
||
2980 | a plan is no longer valid. |
||
2981 | |||
2982 | @node What RFFTWND Really Computes, , rfftwnd_destroy_plan, Real Multi-dimensional Transforms Reference |
||
2983 | @subsection What RFFTWND Really Computes |
||
2984 | @cindex Discrete Fourier Transform |
||
2985 | |||
2986 | The conventions that we follow for the real multi-dimensional transform |
||
2987 | are analogous to those for the complex multi-dimensional transform. In |
||
2988 | particular, the forward transform has a negative sign in the exponent |
||
2989 | and neither the forward nor the backward transforms will perform any |
||
2990 | normalization. Computing the backward transform of the forward |
||
2991 | transform will multiply the array by the product of its dimensions (that |
||
2992 | is, the logical dimensions of the real data). The forward transform is |
||
2993 | real-to-complex and the backward transform is complex-to-real. |
||
2994 | |||
2995 | @cindex Discrete Fourier Transform |
||
2996 | @cindex hermitian array |
||
2997 | @iftex |
||
2998 | @tex |
||
2999 | The exact mathematical definition of our real multi-dimensional |
||
3000 | transform follows. |
||
3001 | |||
3002 | @noindent@emph{Real to complex (forward) transform.} |
||
3003 | Let $X$ be a $d$-dimensional real array whose elements are $X[j_1, j_2, |
||
3004 | \ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, |
||
3005 | d \}$. Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ |
||
3006 | 1, 2, \ldots, d \}$. |
||
3007 | |||
3008 | The real to complex transform computes a complex array~$Y$, whose |
||
3009 | structure is the same as that of~$X$, defined by |
||
3010 | |||
3011 | $$ |
||
3012 | Y[i_1, i_2, \ldots, i_d] = |
||
3013 | \sum_{j_1 = 0}^{n_1 - 1} |
||
3014 | \sum_{j_2 = 0}^{n_2 - 1} |
||
3015 | \cdots |
||
3016 | \sum_{j_d = 0}^{n_d - 1} |
||
3017 | X[j_1, j_2, \ldots, j_d] |
||
3018 | \omega_1^{-i_1 j_1} |
||
3019 | \omega_2^{-i_2 j_2} |
||
3020 | \cdots |
||
3021 | \omega_d^{-i_d j_d} \ . |
||
3022 | $$ |
||
3023 | |||
3024 | The output array $Y$ enjoys a multidimensional hermitian symmetry, that |
||
3025 | is, the identity $Y[i_1, i_2, \ldots, i_d] = Y[n_1-i_1, n_2-i_2, \ldots, |
||
3026 | n_d - i_d]^{*}$ holds for all $0 \leq i_s < n_s$. Because of this |
||
3027 | symmetry, $Y$ is stored in the peculiar way described in @ref{Array |
||
3028 | Dimensions for Real Multi-dimensional Transforms}. |
||
3029 | @cindex hermitian array |
||
3030 | |||
3031 | @noindent@emph{Complex to real (backward) transform.} Let $X$ be a |
||
3032 | $d$-dimensional complex array whose elements are $X[j_1, j_2, \ldots, |
||
3033 | j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, d \}$. |
||
3034 | The array $X$ must be hermitian, that is, the identity $X[j_1, j_2, |
||
3035 | \ldots, j_d] = X[n_1-j_1, n_2-j_2, \ldots, n_d - j_d]^{*}$ must hold for all |
||
3036 | $0 \leq j_s < n_s$. Moreover, $X$ must be stored in memory in the |
||
3037 | peculiar way described in @ref{Array Dimensions for Real |
||
3038 | Multi-dimensional Transforms}. |
||
3039 | |||
3040 | Let $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ 1, 2, |
||
3041 | \ldots, d \}$. The complex to real transform computes a real array~$Y$, whose |
||
3042 | structure is the same as that of~$X$, defined by |
||
3043 | |||
3044 | $$ |
||
3045 | Y[i_1, i_2, \ldots, i_d] = |
||
3046 | \sum_{j_1 = 0}^{n_1 - 1} |
||
3047 | \sum_{j_2 = 0}^{n_2 - 1} |
||
3048 | \cdots |
||
3049 | \sum_{j_d = 0}^{n_d - 1} |
||
3050 | X[j_1, j_2, \ldots, j_d] |
||
3051 | \omega_1^{i_1 j_1} |
||
3052 | \omega_2^{i_2 j_2} |
||
3053 | \cdots |
||
3054 | \omega_d^{i_d j_d} \ . |
||
3055 | $$ |
||
3056 | |||
3057 | (That $Y$ is real is not meant to be obvious, although the proof is |
||
3058 | easy.) |
||
3059 | |||
3060 | Computing the forward transform followed by the backward transform |
||
3061 | will multiply the array by $\prod_{s=1}^{d} n_d$. |
||
3062 | @end tex |
||
3063 | @end iftex |
||
3064 | @ifinfo |
||
3065 | The @TeX{} version of this manual contains the exact definition of the |
||
3066 | @math{n}-dimensional transform RFFTWND uses. It is not possible to |
||
3067 | display the definition on a ASCII terminal properly. |
||
3068 | @end ifinfo |
||
3069 | @ifhtml |
||
3070 | The Gods forbade using HTML to display mathematical formulas. Please |
||
3071 | see the TeX or Postscript version of this manual for the proper |
||
3072 | definition of the n-dimensional real Fourier transform that RFFTW |
||
3073 | uses. For completeness, we include a bitmap of the TeX output below: |
||
3074 | <P><center><IMG SRC="equation-4.gif" ALIGN="top"></center> |
||
3075 | @end ifhtml |
||
3076 | |||
3077 | |||
3078 | @c ------------------------------------------------------- |
||
3079 | @node Wisdom Reference, Memory Allocator Reference, Real Multi-dimensional Transforms Reference, FFTW Reference |
||
3080 | @section Wisdom Reference |
||
3081 | @menu |
||
3082 | * fftw_export_wisdom:: |
||
3083 | * fftw_import_wisdom:: |
||
3084 | * fftw_forget_wisdom:: |
||
3085 | @end menu |
||
3086 | |||
3087 | @cindex wisdom |
||
3088 | @node fftw_export_wisdom, fftw_import_wisdom, Wisdom Reference, Wisdom Reference |
||
3089 | @subsection Exporting Wisdom |
||
3090 | |||
3091 | @example |
||
3092 | #include <fftw.h> |
||
3093 | |||
3094 | void fftw_export_wisdom(void (*emitter)(char c, void *), void *data); |
||
3095 | void fftw_export_wisdom_to_file(FILE *output_file); |
||
3096 | char *fftw_export_wisdom_to_string(void); |
||
3097 | @end example |
||
3098 | @findex fftw_export_wisdom |
||
3099 | @findex fftw_export_wisdom_to_file |
||
3100 | @findex fftw_export_wisdom_to_string |
||
3101 | |||
3102 | These functions allow you to export all currently accumulated |
||
3103 | @code{wisdom} in a form from which it can be later imported and |
||
3104 | restored, even during a separate run of the program. (@xref{Words of Wisdom}.) The current store of @code{wisdom} is not |
||
3105 | affected by calling any of these routines. |
||
3106 | |||
3107 | @code{fftw_export_wisdom} exports the @code{wisdom} to any output |
||
3108 | medium, as specified by the callback function |
||
3109 | @code{emitter}. @code{emitter} is a @code{putc}-like function that |
||
3110 | writes the character @code{c} to some output; its second parameter is |
||
3111 | the @code{data} pointer passed to @code{fftw_export_wisdom}. For |
||
3112 | convenience, the following two ``wrapper'' routines are provided: |
||
3113 | |||
3114 | @code{fftw_export_wisdom_to_file} writes the @code{wisdom} to the |
||
3115 | current position in @code{output_file}, which should be open with write |
||
3116 | permission. Upon exit, the file remains open and is positioned at the |
||
3117 | end of the @code{wisdom} data. |
||
3118 | |||
3119 | @code{fftw_export_wisdom_to_string} returns a pointer to a |
||
3120 | @code{NULL}-terminated string holding the @code{wisdom} data. This |
||
3121 | string is dynamically allocated, and it is the responsibility of the |
||
3122 | caller to deallocate it with @code{fftw_free} when it is no longer |
||
3123 | needed. |
||
3124 | |||
3125 | All of these routines export the wisdom in the same format, which we |
||
3126 | will not document here except to say that it is LISP-like ASCII text |
||
3127 | that is insensitive to white space. |
||
3128 | |||
3129 | @node fftw_import_wisdom, fftw_forget_wisdom, fftw_export_wisdom, Wisdom Reference |
||
3130 | @subsection Importing Wisdom |
||
3131 | |||
3132 | @example |
||
3133 | #include <fftw.h> |
||
3134 | |||
3135 | fftw_status fftw_import_wisdom(int (*get_input)(void *), void *data); |
||
3136 | fftw_status fftw_import_wisdom_from_file(FILE *input_file); |
||
3137 | fftw_status fftw_import_wisdom_from_string(const char *input_string); |
||
3138 | @end example |
||
3139 | @findex fftw_import_wisdom |
||
3140 | @findex fftw_import_wisdom_from_file |
||
3141 | @findex fftw_import_wisdom_from_string |
||
3142 | |||
3143 | These functions import @code{wisdom} into a program from data stored by |
||
3144 | the @code{fftw_export_wisdom} functions above. (@xref{Words of Wisdom}.) |
||
3145 | The imported @code{wisdom} supplements rather than replaces any |
||
3146 | @code{wisdom} already accumulated by the running program (except when |
||
3147 | there is conflicting @code{wisdom}, in which case the existing wisdom is |
||
3148 | replaced). |
||
3149 | |||
3150 | @code{fftw_import_wisdom} imports @code{wisdom} from any input medium, |
||
3151 | as specified by the callback function @code{get_input}. @code{get_input} |
||
3152 | is a @code{getc}-like function that returns the next character in the |
||
3153 | input; its parameter is the @code{data} pointer passed to |
||
3154 | @code{fftw_import_wisdom}. If the end of the input data is reached |
||
3155 | (which should never happen for valid data), it may return either |
||
3156 | @code{NULL} (ASCII 0) or @code{EOF} (as defined in @code{<stdio.h>}). |
||
3157 | For convenience, the following two ``wrapper'' routines are provided: |
||
3158 | |||
3159 | @code{fftw_import_wisdom_from_file} reads @code{wisdom} from the |
||
3160 | current position in @code{input_file}, which should be open with read |
||
3161 | permission. Upon exit, the file remains open and is positioned at the |
||
3162 | end of the @code{wisdom} data. |
||
3163 | |||
3164 | @code{fftw_import_wisdom_from_string} reads @code{wisdom} from the |
||
3165 | @code{NULL}-terminated string @code{input_string}. |
||
3166 | |||
3167 | The return value of these routines is @code{FFTW_SUCCESS} if the wisdom |
||
3168 | was read successfully, and @code{FFTW_FAILURE} otherwise. Note that, in |
||
3169 | all of these functions, any data in the input stream past the end of the |
||
3170 | @code{wisdom} data is simply ignored (it is not even read if the |
||
3171 | @code{wisdom} data is well-formed). |
||
3172 | |||
3173 | @node fftw_forget_wisdom, , fftw_import_wisdom, Wisdom Reference |
||
3174 | @subsection Forgetting Wisdom |
||
3175 | |||
3176 | @example |
||
3177 | #include <fftw.h> |
||
3178 | |||
3179 | void fftw_forget_wisdom(void); |
||
3180 | @end example |
||
3181 | @findex fftw_forget_wisdom |
||
3182 | |||
3183 | Calling @code{fftw_forget_wisdom} causes all accumulated @code{wisdom} |
||
3184 | to be discarded and its associated memory to be freed. (New |
||
3185 | @code{wisdom} can still be gathered subsequently, however.) |
||
3186 | |||
3187 | @c ------------------------------------------------------- |
||
3188 | @node Memory Allocator Reference, Thread safety, Wisdom Reference, FFTW Reference |
||
3189 | @section Memory Allocator Reference |
||
3190 | |||
3191 | @example |
||
3192 | #include <fftw.h> |
||
3193 | |||
3194 | void *(*fftw_malloc_hook) (size_t n); |
||
3195 | void (*fftw_free_hook) (void *p); |
||
3196 | @end example |
||
3197 | @vindex fftw_malloc_hook |
||
3198 | @findex fftw_malloc |
||
3199 | @ffindex malloc |
||
3200 | @vindex fftw_free_hook |
||
3201 | |||
3202 | Whenever it has to allocate and release memory, FFTW ordinarily calls |
||
3203 | @code{malloc} and @code{free}. |
||
3204 | If @code{malloc} fails, FFTW prints an error message and exits. This |
||
3205 | behavior may be undesirable in some applications. Also, special |
||
3206 | memory-handling functions may be necessary in certain |
||
3207 | environments. Consequently, FFTW provides means by which you can install |
||
3208 | your own memory allocator and take whatever error-correcting action you |
||
3209 | find appropriate. The variables @code{fftw_malloc_hook} and |
||
3210 | @code{fftw_free_hook} are pointers to functions, and they are normally |
||
3211 | @code{NULL}. If you set those variables to point to other functions, |
||
3212 | then FFTW will use your routines instead of @code{malloc} and |
||
3213 | @code{free}. @code{fftw_malloc_hook} must point to a @code{malloc}-like |
||
3214 | function, and @code{fftw_free_hook} must point to a @code{free}-like |
||
3215 | function. |
||
3216 | |||
3217 | @c ------------------------------------------------------- |
||
3218 | @node Thread safety, , Memory Allocator Reference, FFTW Reference |
||
3219 | @section Thread safety |
||
3220 | |||
3221 | @cindex threads |
||
3222 | @cindex thread safety |
||
3223 | Users writing multi-threaded programs must concern themselves with the |
||
3224 | @dfn{thread safety} of the libraries they use---that is, whether it is |
||
3225 | safe to call routines in parallel from multiple threads. FFTW can be |
||
3226 | used in such an environment, but some care must be taken because certain |
||
3227 | parts of FFTW use private global variables to share data between calls. |
||
3228 | In particular, the plan-creation functions share trigonometric tables |
||
3229 | and accumulated @code{wisdom}. (Users should note that these comments |
||
3230 | only apply to programs using shared-memory threads. Parallelism using |
||
3231 | MPI or forked processes involves a separate address-space and global |
||
3232 | variables for each process, and is not susceptible to problems of this |
||
3233 | sort.) |
||
3234 | |||
3235 | The central restriction of FFTW is that it is not safe to create |
||
3236 | multiple plans in parallel. You must either create all of your plans |
||
3237 | from a single thread, or instead use a semaphore, mutex, or other |
||
3238 | mechanism to ensure that different threads don't attempt to create plans |
||
3239 | at the same time. The same restriction also holds for destruction of |
||
3240 | plans and importing/forgetting @code{wisdom}. Once created, a plan may |
||
3241 | safely be used in any thread. |
||
3242 | |||
3243 | The actual transform routines in FFTW (@code{fftw_one}, etcetera) are |
||
3244 | re-entrant and thread-safe, so it is fine to call them simultaneously |
||
3245 | from multiple threads. Another question arises, however---is it safe to |
||
3246 | use the @emph{same plan} for multiple transforms in parallel? (It would |
||
3247 | be unsafe if, for example, the plan were modified in some way by the |
||
3248 | transform.) We address this question by defining an additional planner |
||
3249 | flag, @code{FFTW_THREADSAFE}. |
||
3250 | @ctindex FFTW_THREADSAFE |
||
3251 | When included in the flags for any of the plan-creation routines, |
||
3252 | @code{FFTW_THREADSAFE} guarantees that the resulting plan will be |
||
3253 | read-only and safe to use in parallel by multiple threads. |
||
3254 | |||
3255 | @c ************************************************************ |
||
3256 | @node Parallel FFTW, Calling FFTW from Fortran, FFTW Reference, Top |
||
3257 | @chapter Parallel FFTW |
||
3258 | |||
3259 | @cindex parallel transform |
||
3260 | In this chapter we discuss the use of FFTW in a parallel environment, |
||
3261 | documenting the different parallel libraries that we have provided. |
||
3262 | (Users calling FFTW from a multi-threaded program should also consult |
||
3263 | @ref{Thread safety}.) The FFTW package currently contains three parallel |
||
3264 | transform implementations that leverage the uniprocessor FFTW code: |
||
3265 | |||
3266 | @itemize @bullet |
||
3267 | |||
3268 | @item |
||
3269 | @cindex threads |
||
3270 | The first set of routines utilizes shared-memory threads for parallel |
||
3271 | one- and multi-dimensional transforms of both real and complex data. |
||
3272 | Any program using FFTW can be trivially modified to use the |
||
3273 | multi-threaded routines. This code can use any common threads |
||
3274 | implementation, including POSIX threads. (POSIX threads are available |
||
3275 | on most Unix variants, including Linux.) These routines are located in |
||
3276 | the @code{threads} directory, and are documented in @ref{Multi-threaded |
||
3277 | FFTW}. |
||
3278 | |||
3279 | @item |
||
3280 | @cindex MPI |
||
3281 | @cindex distributed memory |
||
3282 | The @code{mpi} directory contains multi-dimensional transforms |
||
3283 | of real and complex data for parallel machines supporting MPI. It also |
||
3284 | includes parallel one-dimensional transforms for complex data. The main |
||
3285 | feature of this code is that it supports distributed-memory transforms, |
||
3286 | so it runs on everything from workstation clusters to massively-parallel |
||
3287 | supercomputers. More information on MPI can be found at the |
||
3288 | @uref{http://www.mcs.anl.gov/mpi, MPI home page}. The FFTW MPI routines |
||
3289 | are documented in @ref{MPI FFTW}. |
||
3290 | |||
3291 | @item |
||
3292 | @cindex Cilk |
||
3293 | We also have an experimental parallel implementation written in Cilk, a |
||
3294 | C-like parallel language developed at MIT and currently available for |
||
3295 | several SMP platforms. For more information on Cilk see |
||
3296 | @uref{http://supertech.lcs.mit.edu/cilk, the Cilk home page}. The FFTW |
||
3297 | Cilk code can be found in the @code{cilk} directory, with parallelized |
||
3298 | one- and multi-dimensional transforms of complex data. The Cilk FFTW |
||
3299 | routines are documented in @code{cilk/README}. |
||
3300 | |||
3301 | @end itemize |
||
3302 | |||
3303 | @menu |
||
3304 | * Multi-threaded FFTW:: |
||
3305 | * MPI FFTW:: |
||
3306 | @end menu |
||
3307 | |||
3308 | @c ------------------------------------------------------------ |
||
3309 | @node Multi-threaded FFTW, MPI FFTW, Parallel FFTW, Parallel FFTW |
||
3310 | @section Multi-threaded FFTW |
||
3311 | |||
3312 | @cindex threads |
||
3313 | In this section we document the parallel FFTW routines for shared-memory |
||
3314 | threads on SMP hardware. These routines, which support parellel one- |
||
3315 | and multi-dimensional transforms of both real and complex data, are the |
||
3316 | easiest way to take advantage of multiple processors with FFTW. They |
||
3317 | work just like the corresponding uniprocessor transform routines, except |
||
3318 | that they take the number of parallel threads to use as an extra |
||
3319 | parameter. Any program that uses the uniprocessor FFTW can be trivially |
||
3320 | modified to use the multi-threaded FFTW. |
||
3321 | |||
3322 | @menu |
||
3323 | * Installation and Supported Hardware/Software:: |
||
3324 | * Usage of Multi-threaded FFTW:: |
||
3325 | * How Many Threads to Use?:: |
||
3326 | * Using Multi-threaded FFTW in a Multi-threaded Program:: |
||
3327 | * Tips for Optimal Threading:: |
||
3328 | @end menu |
||
3329 | |||
3330 | @c ------------------------------------------------------- |
||
3331 | @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW |
||
3332 | @subsection Installation and Supported Hardware/Software |
||
3333 | |||
3334 | All of the FFTW threads code is located in the @code{threads} |
||
3335 | subdirectory of the FFTW package. On Unix systems, the FFTW threads |
||
3336 | libraries and header files can be automatically configured, compiled, |
||
3337 | and installed along with the uniprocessor FFTW libraries simply by |
||
3338 | including @code{--enable-threads} in the flags to the @code{configure} |
||
3339 | script (@pxref{Installation on Unix}). (Note also that the threads |
||
3340 | routines, when enabled, are automatically tested by the @samp{@code{make |
||
3341 | check}} self-tests.) |
||
3342 | @fpindex configure |
||
3343 | |||
3344 | The threads routines require your operating system to have some sort of |
||
3345 | shared-memory threads support. Specifically, the FFTW threads package |
||
3346 | works with POSIX threads (available on most Unix variants, including |
||
3347 | Linux), Solaris threads, @uref{http://www.be.com,BeOS} threads (tested |
||
3348 | on BeOS DR8.2), Mach C threads (reported to work by users), and Win32 |
||
3349 | threads (reported to work by users). (There is also untested code to |
||
3350 | use MacOS MP threads.) If you have a shared-memory machine that uses a |
||
3351 | different threads API, it should be a simple matter of programming to |
||
3352 | include support for it; see the file @code{fftw_threads-int.h} for more |
||
3353 | detail. |
||
3354 | |||
3355 | SMP hardware is not required, although of course you need multiple |
||
3356 | processors to get any benefit from the multithreaded transforms. |
||
3357 | |||
3358 | @c ------------------------------------------------------- |
||
3359 | @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW |
||
3360 | @subsection Usage of Multi-threaded FFTW |
||
3361 | |||
3362 | Here, it is assumed that the reader is already familiar with the usage |
||
3363 | of the uniprocessor FFTW routines, described elsewhere in this manual. |
||
3364 | We only describe what one has to change in order to use the |
||
3365 | multi-threaded routines. |
||
3366 | |||
3367 | First, instead of including @code{<fftw.h>} or @code{<rfftw.h>}, you |
||
3368 | should include the files @code{<fftw_threads.h>} or |
||
3369 | @code{<rfftw_threads.h>}, respectively. |
||
3370 | |||
3371 | Second, before calling any FFTW routines, you should call the function: |
||
3372 | |||
3373 | @example |
||
3374 | int fftw_threads_init(void); |
||
3375 | @end example |
||
3376 | @findex fftw_threads_init |
||
3377 | |||
3378 | This function, which should only be called once (probably in your |
||
3379 | @code{main()} function), performs any one-time initialization required |
||
3380 | to use threads on your system. It returns zero if successful, and a |
||
3381 | non-zero value if there was an error (in which case, something is |
||
3382 | seriously wrong and you should probably exit the program). |
||
3383 | |||
3384 | Third, when you want to actually compute the transform, you should use |
||
3385 | one of the following transform routines instead of the ordinary FFTW |
||
3386 | functions: |
||
3387 | |||
3388 | @example |
||
3389 | fftw_threads(nthreads, plan, howmany, in, istride, |
||
3390 | idist, out, ostride, odist); |
||
3391 | @findex fftw_threads |
||
3392 | |||
3393 | fftw_threads_one(nthreads, plan, in, out); |
||
3394 | @findex fftw_threads_one |
||
3395 | |||
3396 | fftwnd_threads(nthreads, plan, howmany, in, istride, |
||
3397 | idist, out, ostride, odist); |
||
3398 | @findex fftwnd_threads |
||
3399 | |||
3400 | fftwnd_threads_one(nthreads, plan, in, out); |
||
3401 | @findex fftwnd_threads_one |
||
3402 | |||
3403 | rfftw_threads(nthreads, plan, howmany, in, istride, |
||
3404 | idist, out, ostride, odist); |
||
3405 | @findex rfftw_threads |
||
3406 | |||
3407 | rfftw_threads_one(nthreads, plan, in, out); |
||
3408 | @findex rfftw_threads_one |
||
3409 | |||
3410 | rfftwnd_threads_real_to_complex(nthreads, plan, howmany, in, |
||
3411 | istride, idist, out, ostride, odist); |
||
3412 | @findex rfftwnd_threads_real_to_complex |
||
3413 | |||
3414 | rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out); |
||
3415 | @findex rfftwnd_threads_one_real_to_complex |
||
3416 | |||
3417 | rfftwnd_threads_complex_to_real(nthreads, plan, howmany, in, |
||
3418 | istride, idist, out, ostride, odist); |
||
3419 | @findex rfftwnd_threads_complex_to_real |
||
3420 | |||
3421 | rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out); |
||
3422 | @findex rfftwnd_threads_one_real_to_complex |
||
3423 | |||
3424 | rfftwnd_threads_one_complex_to_real(nthreads, plan, in, out); |
||
3425 | @findex rfftwnd_threads_one_complex_to_real |
||
3426 | @end example |
||
3427 | |||
3428 | All of these routines take exactly the same arguments and have exactly |
||
3429 | the same effects as their uniprocessor counterparts (i.e. without the |
||
3430 | @samp{@code{_threads}}) @emph{except} that they take one extra |
||
3431 | parameter, @code{nthreads} (of type @code{int}), before the normal |
||
3432 | parameters.@footnote{There is one exception: when performing |
||
3433 | one-dimensional in-place transforms, the @code{out} parameter is always |
||
3434 | ignored by the multi-threaded routines, instead of being used as a |
||
3435 | workspace if it is non-@code{NULL} as in the uniprocessor routines. The |
||
3436 | multi-threaded routines always allocate their own workspace (the size of |
||
3437 | which depends upon the number of threads).} The @code{nthreads} |
||
3438 | parameter specifies the number of threads of execution to use when |
||
3439 | performing the transform (actually, the maximum number of threads). |
||
3440 | @cindex number of threads |
||
3441 | |||
3442 | For example, to parallelize a single one-dimensional transform of |
||
3443 | complex data, instead of calling the uniprocessor @code{fftw_one(plan, |
||
3444 | in, out)}, you would call @code{fftw_threads_one(nthreads, plan, in, |
||
3445 | out)}. Passing an @code{nthreads} of @code{1} means to use only one |
||
3446 | thread (the main thread), and is equivalent to calling the uniprocessor |
||
3447 | routine. Passing an @code{nthreads} of @code{2} means that the |
||
3448 | transform is potentially parallelized over two threads (and two |
||
3449 | processors, if you have them), and so on. |
||
3450 | |||
3451 | These are the only changes you need to make to your source code. Calls |
||
3452 | to all other FFTW routines (plan creation, destruction, wisdom, |
||
3453 | etcetera) are not parallelized and remain the same. (The same plans and |
||
3454 | wisdom are used by both uniprocessor and multi-threaded transforms.) |
||
3455 | Your arrays are allocated and formatted in the same way, and so on. |
||
3456 | |||
3457 | Programs using the parallel complex transforms should be linked with |
||
3458 | @code{-lfftw_threads -lfftw -lm} on Unix. Programs using the parallel |
||
3459 | real transforms should be linked with @code{-lrfftw_threads |
||
3460 | -lfftw_threads -lrfftw -lfftw -lm}. You will also need to link with |
||
3461 | whatever library is responsible for threads on your system |
||
3462 | (e.g. @code{-lpthread} on Linux). |
||
3463 | @cindex linking on Unix |
||
3464 | |||
3465 | @c ------------------------------------------------------- |
||
3466 | @node How Many Threads to Use?, Using Multi-threaded FFTW in a Multi-threaded Program, Usage of Multi-threaded FFTW, Multi-threaded FFTW |
||
3467 | @subsection How Many Threads to Use? |
||
3468 | |||
3469 | @cindex number of threads |
||
3470 | There is a fair amount of overhead involved in spawning and synchronizing |
||
3471 | threads, so the optimal number of threads to use depends upon the size |
||
3472 | of the transform as well as on the number of processors you have. |
||
3473 | |||
3474 | As a general rule, you don't want to use more threads than you have |
||
3475 | processors. (Using more threads will work, but there will be extra |
||
3476 | overhead with no benefit.) In fact, if the problem size is too small, |
||
3477 | you may want to use fewer threads than you have processors. |
||
3478 | |||
3479 | You will have to experiment with your system to see what level of |
||
3480 | parallelization is best for your problem size. Useful tools to help you |
||
3481 | do this are the test programs that are automatically compiled along with |
||
3482 | the threads libraries, @code{fftw_threads_test} and |
||
3483 | @code{rfftw_threads_test} (in the @code{threads} subdirectory). These |
||
3484 | @pindex fftw_threads_test |
||
3485 | @pindex rfftw_threads_test |
||
3486 | take the same arguments as the other FFTW test programs (see |
||
3487 | @code{tests/README}), except that they also take the number of threads |
||
3488 | to use as a first argument, and report the parallel speedup in speed |
||
3489 | tests. For example, |
||
3490 | |||
3491 | @example |
||
3492 | fftw_threads_test 2 -s 128x128 |
||
3493 | @end example |
||
3494 | |||
3495 | will benchmark complex 128x128 transforms using two threads and report |
||
3496 | the speedup relative to the uniprocessor transform. |
||
3497 | @cindex benchmark |
||
3498 | |||
3499 | For instance, on a 4-processor 200MHz Pentium Pro system running Linux |
||
3500 | 2.2.0, we found that the "crossover" point at which 2 threads became |
||
3501 | beneficial for complex transforms was about 4k points, while 4 threads |
||
3502 | became beneficial at 8k points. |
||
3503 | |||
3504 | @c ------------------------------------------------------- |
||
3505 | @node Using Multi-threaded FFTW in a Multi-threaded Program, Tips for Optimal Threading, How Many Threads to Use?, Multi-threaded FFTW |
||
3506 | @subsection Using Multi-threaded FFTW in a Multi-threaded Program |
||
3507 | |||
3508 | @cindex thread safety |
||
3509 | It is perfectly possible to use the multi-threaded FFTW routines from a |
||
3510 | multi-threaded program (e.g. have multiple threads computing |
||
3511 | multi-threaded transforms simultaneously). If you have the processors, |
||
3512 | more power to you! However, the same restrictions apply as for the |
||
3513 | uniprocessor FFTW routines (@pxref{Thread safety}). In particular, you |
||
3514 | should recall that you may not create or destroy plans in parallel. |
||
3515 | |||
3516 | @c ------------------------------------------------------- |
||
3517 | @node Tips for Optimal Threading, , Using Multi-threaded FFTW in a Multi-threaded Program, Multi-threaded FFTW |
||
3518 | @subsection Tips for Optimal Threading |
||
3519 | |||
3520 | Not all transforms are equally well-parallelized by the multi-threaded |
||
3521 | FFTW routines. (This is merely a consequence of laziness on the part of |
||
3522 | the implementors, and is not inherent to the algorithms employed.) |
||
3523 | Mainly, the limitations are in the parallel one-dimensional transforms. |
||
3524 | The things to avoid if you want optimal parallelization are as follows: |
||
3525 | |||
3526 | @subsection Parallelization deficiencies in one-dimensional transforms |
||
3527 | |||
3528 | @itemize @bullet |
||
3529 | |||
3530 | @item |
||
3531 | Large prime factors can sometimes parallelize poorly. Of course, you |
||
3532 | should avoid these anyway if you want high performance. |
||
3533 | |||
3534 | @item |
||
3535 | @cindex in-place transform |
||
3536 | Single in-place transforms don't parallelize completely. (Multiple |
||
3537 | in-place transforms, i.e. @code{howmany > 1}, are fine.) Again, you |
||
3538 | should avoid these in any case if you want high performance, as they |
||
3539 | require transforming to a scratch array and copying back. |
||
3540 | |||
3541 | @item |
||
3542 | Single real-complex (@code{rfftw}) transforms don't parallelize |
||
3543 | completely. This is unfortunate, but parallelizing this correctly would |
||
3544 | have involved a lot of extra code (and a much larger library). You |
||
3545 | still get some benefit from additional processors, but if you have a |
||
3546 | very large number of processors you will probably be better off using |
||
3547 | the parallel complex (@code{fftw}) transforms. Note that |
||
3548 | multi-dimensional real transforms or multiple one-dimensional real |
||
3549 | transforms are fine. |
||
3550 | |||
3551 | @end itemize |
||
3552 | |||
3553 | @c ------------------------------------------------------------ |
||
3554 | @node MPI FFTW, , Multi-threaded FFTW, Parallel FFTW |
||
3555 | @section MPI FFTW |
||
3556 | |||
3557 | @cindex MPI |
||
3558 | This section describes the MPI FFTW routines for distributed-memory (and |
||
3559 | shared-memory) machines supporting MPI (Message Passing Interface). The |
||
3560 | MPI routines are significantly different from the ordinary FFTW because |
||
3561 | the transform data here are @emph{distributed} over multiple processes, |
||
3562 | so that each process gets only a portion of the array. |
||
3563 | @cindex distributed memory |
||
3564 | Currently, multi-dimensional transforms of both real and complex data, |
||
3565 | as well as one-dimensional transforms of complex data, are supported. |
||
3566 | |||
3567 | @menu |
||
3568 | * MPI FFTW Installation:: |
||
3569 | * Usage of MPI FFTW for Complex Multi-dimensional Transforms:: |
||
3570 | * MPI Data Layout:: |
||
3571 | * Usage of MPI FFTW for Real Multi-dimensional Transforms:: |
||
3572 | * Usage of MPI FFTW for Complex One-dimensional Transforms:: |
||
3573 | * MPI Tips:: |
||
3574 | @end menu |
||
3575 | |||
3576 | @c ------------------------------------------------------- |
||
3577 | @node MPI FFTW Installation, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW, MPI FFTW |
||
3578 | @subsection MPI FFTW Installation |
||
3579 | |||
3580 | The FFTW MPI library code is all located in the @code{mpi} subdirectoy |
||
3581 | of the FFTW package (along with source code for test programs). On Unix |
||
3582 | systems, the FFTW MPI libraries and header files can be automatically |
||
3583 | configured, compiled, and installed along with the uniprocessor FFTW |
||
3584 | libraries simply by including @code{--enable-mpi} in the flags to the |
||
3585 | @code{configure} script (@pxref{Installation on Unix}). |
||
3586 | @fpindex configure |
||
3587 | |||
3588 | The only requirement of the FFTW MPI code is that you have the standard |
||
3589 | MPI 1.1 (or later) libraries and header files installed on your system. |
||
3590 | A free implementation of MPI is available from |
||
3591 | @uref{http://www-unix.mcs.anl.gov/mpi/mpich/,the MPICH home page}. |
||
3592 | |||
3593 | Previous versions of the FFTW MPI routines have had an unfortunate |
||
3594 | tendency to expose bugs in MPI implementations. The current version has |
||
3595 | been largely rewritten, and hopefully avoids some of the problems. If |
||
3596 | you run into difficulties, try passing the optional workspace to |
||
3597 | @code{(r)fftwnd_mpi} (see below), as this allows us to use the standard |
||
3598 | (and hopefully well-tested) @code{MPI_Alltoall} primitive for |
||
3599 | @ffindex MPI_Alltoall |
||
3600 | communications. Please let us know (@email{fftw@@theory.lcs.mit.edu}) |
||
3601 | how things work out. |
||
3602 | |||
3603 | @pindex fftw_mpi_test |
||
3604 | @pindex rfftw_mpi_test |
||
3605 | Several test programs are included in the @code{mpi} directory. The |
||
3606 | ones most useful to you are probably the @code{fftw_mpi_test} and |
||
3607 | @code{rfftw_mpi_test} programs, which are run just like an ordinary MPI |
||
3608 | program and accept the same parameters as the other FFTW test programs |
||
3609 | (c.f. @code{tests/README}). For example, @code{mpirun @i{...params...} |
||
3610 | fftw_mpi_test -r 0} will run non-terminating complex-transform |
||
3611 | correctness tests of random dimensions. They can also do performance |
||
3612 | benchmarks. |
||
3613 | |||
3614 | @c ------------------------------------------------------- |
||
3615 | @node Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI Data Layout, MPI FFTW Installation, MPI FFTW |
||
3616 | @subsection Usage of MPI FFTW for Complex Multi-dimensional Transforms |
||
3617 | |||
3618 | Usage of the MPI FFTW routines is similar to that of the uniprocessor |
||
3619 | FFTW. We assume that the reader already understands the usage of the |
||
3620 | uniprocessor FFTW routines, described elsewhere in this manual. Some |
||
3621 | familiarity with MPI is also helpful. |
||
3622 | |||
3623 | A typical program performing a complex two-dimensional MPI transform |
||
3624 | might look something like: |
||
3625 | |||
3626 | @example |
||
3627 | #include <fftw_mpi.h> |
||
3628 | |||
3629 | int main(int argc, char **argv) |
||
3630 | @{ |
||
3631 | const int NX = ..., NY = ...; |
||
3632 | fftwnd_mpi_plan plan; |
||
3633 | fftw_complex *data; |
||
3634 | |||
3635 | MPI_Init(&argc,&argv); |
||
3636 | |||
3637 | plan = fftw2d_mpi_create_plan(MPI_COMM_WORLD, |
||
3638 | NX, NY, |
||
3639 | FFTW_FORWARD, FFTW_ESTIMATE); |
||
3640 | |||
3641 | ...allocate and initialize data... |
||
3642 | |||
3643 | fftwnd_mpi(p, 1, data, NULL, FFTW_NORMAL_ORDER); |
||
3644 | |||
3645 | ... |
||
3646 | |||
3647 | fftwnd_mpi_destroy_plan(plan); |
||
3648 | MPI_Finalize(); |
||
3649 | @} |
||
3650 | @end example |
||
3651 | |||
3652 | The calls to @code{MPI_Init} and @code{MPI_Finalize} are required in all |
||
3653 | @ffindex MPI_Init |
||
3654 | @ffindex MPI_Finalize |
||
3655 | MPI programs; see the @uref{http://www.mcs.anl.gov/mpi/,MPI home page} |
||
3656 | for more information. Note that all of your processes run the program |
||
3657 | in parallel, as a group; there is no explicit launching of |
||
3658 | threads/processes in an MPI program. |
||
3659 | |||
3660 | @cindex plan |
||
3661 | As in the ordinary FFTW, the first thing we do is to create a plan (of |
||
3662 | type @code{fftwnd_mpi_plan}), using: |
||
3663 | |||
3664 | @example |
||
3665 | fftwnd_mpi_plan fftw2d_mpi_create_plan(MPI_Comm comm, |
||
3666 | int nx, int ny, |
||
3667 | fftw_direction dir, int flags); |
||
3668 | @end example |
||
3669 | @findex fftw2d_mpi_create_plan |
||
3670 | @tindex fftwnd_mpi_plan |
||
3671 | |||
3672 | Except for the first argument, the parameters are identical to those of |
||
3673 | @code{fftw2d_create_plan}. (There are also analogous |
||
3674 | @code{fftwnd_mpi_create_plan} and @code{fftw3d_mpi_create_plan} |
||
3675 | functions. Transforms of any rank greater than one are supported.) |
||
3676 | @findex fftwnd_mpi_create_plan |
||
3677 | @findex fftw3d_mpi_create_plan |
||
3678 | The first argument is an MPI @dfn{communicator}, which specifies the |
||
3679 | group of processes that are to be involved in the transform; the |
||
3680 | standard constant @code{MPI_COMM_WORLD} indicates all available |
||
3681 | processes. |
||
3682 | @fcindex MPI_COMM_WORLD |
||
3683 | |||
3684 | Next, one has to allocate and initialize the data. This is somewhat |
||
3685 | tricky, because the transform data is distributed across the processes |
||
3686 | involved in the transform. It is discussed in detail by the next |
||
3687 | section (@pxref{MPI Data Layout}). |
||
3688 | |||
3689 | The actual computation of the transform is performed by the function |
||
3690 | @code{fftwnd_mpi}, which differs somewhat from its uniprocessor |
||
3691 | equivalent and is described by: |
||
3692 | |||
3693 | @example |
||
3694 | void fftwnd_mpi(fftwnd_mpi_plan p, |
||
3695 | int n_fields, |
||
3696 | fftw_complex *local_data, fftw_complex *work, |
||
3697 | fftwnd_mpi_output_order output_order); |
||
3698 | @end example |
||
3699 | @findex fftwnd_mpi |
||
3700 | |||
3701 | There are several things to notice here: |
||
3702 | |||
3703 | @itemize @bullet |
||
3704 | |||
3705 | @item |
||
3706 | @cindex in-place transform |
||
3707 | First of all, all @code{fftw_mpi} transforms are in-place: the output is |
||
3708 | in the @code{local_data} parameter, and there is no need to specify |
||
3709 | @code{FFTW_IN_PLACE} in the plan flags. |
||
3710 | |||
3711 | @item |
||
3712 | @cindex n_fields |
||
3713 | @cindex stride |
||
3714 | The MPI transforms also only support a limited subset of the |
||
3715 | @code{howmany}/@code{stride}/@code{dist} functionality of the |
||
3716 | uniprocessor routines: the @code{n_fields} parameter is equivalent to |
||
3717 | @code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}. |
||
3718 | (Conceptually, the @code{n_fields} parameter allows you to transform an |
||
3719 | array of contiguous vectors, each with length @code{n_fields}.) |
||
3720 | @code{n_fields} is @code{1} if you are only transforming a single, |
||
3721 | ordinary array. |
||
3722 | |||
3723 | @item |
||
3724 | The @code{work} parameter is an optional workspace. If it is not |
||
3725 | @code{NULL}, it should be exactly the same size as the @code{local_data} |
||
3726 | array. If it is provided, FFTW is able to use the built-in |
||
3727 | @code{MPI_Alltoall} primitive for (often) greater efficiency at the |
||
3728 | @ffindex MPI_Alltoall |
||
3729 | expense of extra storage space. |
||
3730 | |||
3731 | @item |
||
3732 | Finally, the last parameter specifies whether the output data has the |
||
3733 | same ordering as the input data (@code{FFTW_NORMAL_ORDER}), or if it is |
||
3734 | transposed (@code{FFTW_TRANSPOSED_ORDER}). Leaving the data transposed |
||
3735 | @ctindex FFTW_NORMAL_ORDER |
||
3736 | @ctindex FFTW_TRANSPOSED_ORDER |
||
3737 | results in significant performance improvements due to a saved |
||
3738 | communication step (needed to un-transpose the data). Specifically, the |
||
3739 | first two dimensions of the array are transposed, as is described in |
||
3740 | more detail by the next section. |
||
3741 | |||
3742 | @end itemize |
||
3743 | |||
3744 | @cindex normalization |
||
3745 | The output of @code{fftwnd_mpi} is identical to that of the |
||
3746 | corresponding uniprocessor transform. In particular, you should recall |
||
3747 | our conventions for normalization and the sign of the transform |
||
3748 | exponent. |
||
3749 | |||
3750 | The same plan can be used to compute many transforms of the same size. |
||
3751 | After you are done with it, you should deallocate it by calling |
||
3752 | @code{fftwnd_mpi_destroy_plan}. |
||
3753 | @findex fftwnd_mpi_destroy_plan |
||
3754 | |||
3755 | @cindex blocking |
||
3756 | @ffindex MPI_Barrier |
||
3757 | @b{Important:} The FFTW MPI routines must be called in the same order by |
||
3758 | all processes involved in the transform. You should assume that they |
||
3759 | all are blocking, as if each contained a call to @code{MPI_Barrier}. |
||
3760 | |||
3761 | Programs using the FFTW MPI routines should be linked with |
||
3762 | @code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries |
||
3763 | are required for MPI. |
||
3764 | @cindex linking on Unix |
||
3765 | |||
3766 | @c ------------------------------------------------------- |
||
3767 | @node MPI Data Layout, Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW |
||
3768 | @subsection MPI Data Layout |
||
3769 | |||
3770 | @cindex distributed memory |
||
3771 | @cindex distributed array format |
||
3772 | The transform data used by the MPI FFTW routines is @dfn{distributed}: a |
||
3773 | distinct portion of it resides with each process involved in the |
||
3774 | transform. This allows the transform to be parallelized, for example, |
||
3775 | over a cluster of workstations, each with its own separate memory, so |
||
3776 | that you can take advantage of the total memory of all the processors |
||
3777 | you are parallelizing over. |
||
3778 | |||
3779 | In particular, the array is divided according to the rows (first |
||
3780 | dimension) of the data: each process gets a subset of the rows of the |
||
3781 | @cindex slab decomposition |
||
3782 | data. (This is sometimes called a ``slab decomposition.'') One |
||
3783 | consequence of this is that you can't take advantage of more processors |
||
3784 | than you have rows (e.g. @code{64x64x64} matrix can at most use 64 |
||
3785 | processors). This isn't usually much of a limitation, however, as each |
||
3786 | processor needs a fair amount of data in order for the |
||
3787 | parallel-computation benefits to outweight the communications costs. |
||
3788 | |||
3789 | Below, the first dimension of the data will be referred to as |
||
3790 | @samp{@code{x}} and the second dimension as @samp{@code{y}}. |
||
3791 | |||
3792 | FFTW supplies a routine to tell you exactly how much data resides on the |
||
3793 | current process: |
||
3794 | |||
3795 | @example |
||
3796 | void fftwnd_mpi_local_sizes(fftwnd_mpi_plan p, |
||
3797 | int *local_nx, |
||
3798 | int *local_x_start, |
||
3799 | int *local_ny_after_transpose, |
||
3800 | int *local_y_start_after_transpose, |
||
3801 | int *total_local_size); |
||
3802 | @end example |
||
3803 | @findex fftwnd_mpi_local_sizes |
||
3804 | |||
3805 | Given a plan @code{p}, the other parameters of this routine are set to |
||
3806 | values describing the required data layout, described below. |
||
3807 | |||
3808 | @code{total_local_size} is the number of @code{fftw_complex} elements |
||
3809 | that you must allocate for your local data (and workspace, if you |
||
3810 | choose). (This value should, of course, be multiplied by |
||
3811 | @code{n_fields} if that parameter to @code{fftwnd_mpi} is not @code{1}.) |
||
3812 | |||
3813 | The data on the current process has @code{local_nx} rows, starting at |
||
3814 | row @code{local_x_start}. If @code{fftwnd_mpi} is called with |
||
3815 | @code{FFTW_TRANSPOSED_ORDER} output, then @code{y} will be the first |
||
3816 | dimension of the output, and the local @code{y} extent will be given by |
||
3817 | @code{local_ny_after_transpose} and |
||
3818 | @code{local_y_start_after_transpose}. Otherwise, the output has the |
||
3819 | same dimensions and layout as the input. |
||
3820 | |||
3821 | For instance, suppose you want to transform three-dimensional data of |
||
3822 | size @code{nx x ny x nz}. Then, the current process will store a subset |
||
3823 | of this data, of size @code{local_nx x ny x nz}, where the @code{x} |
||
3824 | indices correspond to the range @code{local_x_start} to |
||
3825 | @code{local_x_start+local_nx-1} in the ``real'' (i.e. logical) array. |
||
3826 | If @code{fftwnd_mpi} is called with @code{FFTW_TRANSPOSED_ORDER} output, |
||
3827 | @ctindex FFTW_TRANSPOSED_ORDER |
||
3828 | then the result will be a @code{ny x nx x nz} array, of which a |
||
3829 | @code{local_ny_after_transpose x nx x nz} subset is stored on the |
||
3830 | current process (corresponding to @code{y} values starting at |
||
3831 | @code{local_y_start_after_transpose}). |
||
3832 | |||
3833 | The following is an example of allocating such a three-dimensional array |
||
3834 | array (@code{local_data}) before the transform and initializing it to |
||
3835 | some function @code{f(x,y,z)}: |
||
3836 | |||
3837 | @example |
||
3838 | fftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start, |
||
3839 | &local_ny_after_transpose, |
||
3840 | &local_y_start_after_transpose, |
||
3841 | &total_local_size); |
||
3842 | |||
3843 | local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * |
||
3844 | total_local_size); |
||
3845 | |||
3846 | for (x = 0; x < local_nx; ++x) |
||
3847 | for (y = 0; y < ny; ++y) |
||
3848 | for (z = 0; z < nz; ++z) |
||
3849 | local_data[(x*ny + y)*nz + z] |
||
3850 | = f(x + local_x_start, y, z); |
||
3851 | @end example |
||
3852 | |||
3853 | Some important things to remember: |
||
3854 | |||
3855 | @itemize @bullet |
||
3856 | |||
3857 | @item |
||
3858 | Although the local data is of dimensions @code{local_nx x ny x nz} in |
||
3859 | the above example, do @emph{not} allocate the array to be of size |
||
3860 | @code{local_nx*ny*nz}. Use @code{total_local_size} instead. |
||
3861 | |||
3862 | @item |
||
3863 | The amount of data on each process will not necessarily be the same; in |
||
3864 | fact, @code{local_nx} may even be zero for some processes. (For |
||
3865 | example, suppose you are doing a @code{6x6} transform on four |
||
3866 | processors. There is no way to effectively use the fourth processor in |
||
3867 | a slab decomposition, so we leave it empty. Proof left as an exercise |
||
3868 | for the reader.) |
||
3869 | |||
3870 | @item |
||
3871 | @cindex row-major |
||
3872 | All arrays are, of course, in row-major order (@pxref{Multi-dimensional |
||
3873 | Array Format}). |
||
3874 | |||
3875 | @item |
||
3876 | If you want to compute the inverse transform of the output of |
||
3877 | @code{fftwnd_mpi}, the dimensions of the inverse transform are given by |
||
3878 | the dimensions of the output of the forward transform. For example, if |
||
3879 | you are using @code{FFTW_TRANSPOSED_ORDER} output in the above example, |
||
3880 | then the inverse plan should be created with dimensions @code{ny x nx x |
||
3881 | nz}. |
||
3882 | |||
3883 | @item |
||
3884 | The data layout only depends upon the dimensions of the array, not on |
||
3885 | the plan, so you are guaranteed that different plans for the same size |
||
3886 | (or inverse plans) will use the same (consistent) data layouts. |
||
3887 | |||
3888 | @end itemize |
||
3889 | |||
3890 | @c ------------------------------------------------------- |
||
3891 | @node Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Data Layout, MPI FFTW |
||
3892 | @subsection Usage of MPI FFTW for Real Multi-dimensional Transforms |
||
3893 | |||
3894 | MPI transforms specialized for real data are also available, similiar to |
||
3895 | the uniprocessor @code{rfftwnd} transforms. Just as in the uniprocessor |
||
3896 | case, the real-data MPI functions gain roughly a factor of two in speed |
||
3897 | (and save a factor of two in space) at the expense of more complicated |
||
3898 | data formats in the calling program. Before reading this section, you |
||
3899 | should definitely understand how to call the uniprocessor @code{rfftwnd} |
||
3900 | functions and also the complex MPI FFTW functions. |
||
3901 | |||
3902 | The following is an example of a program using @code{rfftwnd_mpi}. It |
||
3903 | computes the size @code{nx x ny x nz} transform of a real function |
||
3904 | @code{f(x,y,z)}, multiplies the imaginary part by @code{2} for fun, then |
||
3905 | computes the inverse transform. (We'll also use |
||
3906 | @code{FFTW_TRANSPOSED_ORDER} output for the transform, and additionally |
||
3907 | supply the optional workspace parameter to @code{rfftwnd_mpi}, just to |
||
3908 | add a little spice.) |
||
3909 | |||
3910 | @example |
||
3911 | #include <rfftw_mpi.h> |
||
3912 | |||
3913 | int main(int argc, char **argv) |
||
3914 | @{ |
||
3915 | const int nx = ..., ny = ..., nz = ...; |
||
3916 | int local_nx, local_x_start, local_ny_after_transpose, |
||
3917 | local_y_start_after_transpose, total_local_size; |
||
3918 | int x, y, z; |
||
3919 | rfftwnd_mpi_plan plan, iplan; |
||
3920 | fftw_real *data, *work; |
||
3921 | fftw_complex *cdata; |
||
3922 | |||
3923 | MPI_Init(&argc,&argv); |
||
3924 | |||
3925 | /* create the forward and backward plans: */ |
||
3926 | plan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD, |
||
3927 | nx, ny, nz, |
||
3928 | FFTW_REAL_TO_COMPLEX, |
||
3929 | FFTW_ESTIMATE); |
||
3930 | @findex rfftw3d_mpi_create_plan |
||
3931 | iplan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD, |
||
3932 | /* dim.'s of REAL data --> */ nx, ny, nz, |
||
3933 | FFTW_COMPLEX_TO_REAL, |
||
3934 | FFTW_ESTIMATE); |
||
3935 | |||
3936 | rfftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start, |
||
3937 | &local_ny_after_transpose, |
||
3938 | &local_y_start_after_transpose, |
||
3939 | &total_local_size); |
||
3940 | @findex rfftwnd_mpi_local_sizes |
||
3941 | |||
3942 | data = (fftw_real*) malloc(sizeof(fftw_real) * total_local_size); |
||
3943 | |||
3944 | /* workspace is the same size as the data: */ |
||
3945 | work = (fftw_real*) malloc(sizeof(fftw_real) * total_local_size); |
||
3946 | |||
3947 | /* initialize data to f(x,y,z): */ |
||
3948 | for (x = 0; x < local_nx; ++x) |
||
3949 | for (y = 0; y < ny; ++y) |
||
3950 | for (z = 0; z < nz; ++z) |
||
3951 | data[(x*ny + y) * (2*(nz/2+1)) + z] |
||
3952 | = f(x + local_x_start, y, z); |
||
3953 | |||
3954 | /* Now, compute the forward transform: */ |
||
3955 | rfftwnd_mpi(plan, 1, data, work, FFTW_TRANSPOSED_ORDER); |
||
3956 | @findex rfftwnd_mpi |
||
3957 | |||
3958 | /* the data is now complex, so typecast a pointer: */ |
||
3959 | cdata = (fftw_complex*) data; |
||
3960 | |||
3961 | /* multiply imaginary part by 2, for fun: |
||
3962 | (note that the data is transposed) */ |
||
3963 | for (y = 0; y < local_ny_after_transpose; ++y) |
||
3964 | for (x = 0; x < nx; ++x) |
||
3965 | for (z = 0; z < (nz/2+1); ++z) |
||
3966 | cdata[(y*nx + x) * (nz/2+1) + z].im |
||
3967 | *= 2.0; |
||
3968 | |||
3969 | /* Finally, compute the inverse transform; the result |
||
3970 | is transposed back to the original data layout: */ |
||
3971 | rfftwnd_mpi(iplan, 1, data, work, FFTW_TRANSPOSED_ORDER); |
||
3972 | |||
3973 | free(data); |
||
3974 | free(work); |
||
3975 | rfftwnd_mpi_destroy_plan(plan); |
||
3976 | @findex rfftwnd_mpi_destroy_plan |
||
3977 | rfftwnd_mpi_destroy_plan(iplan); |
||
3978 | MPI_Finalize(); |
||
3979 | @} |
||
3980 | @end example |
||
3981 | |||
3982 | There's a lot of stuff in this example, but it's all just what you would |
||
3983 | have guessed, right? We replaced all the @code{fftwnd_mpi*} functions |
||
3984 | by @code{rfftwnd_mpi*}, but otherwise the parameters were pretty much |
||
3985 | the same. The data layout distributed among the processes just like for |
||
3986 | the complex transforms (@pxref{MPI Data Layout}), but in addition the |
||
3987 | final dimension is padded just like it is for the uniprocessor in-place |
||
3988 | real transforms (@pxref{Array Dimensions for Real Multi-dimensional |
||
3989 | Transforms}). |
||
3990 | @cindex padding |
||
3991 | In particular, the @code{z} dimension of the real input data is padded |
||
3992 | to a size @code{2*(nz/2+1)}, and after the transform it contains |
||
3993 | @code{nz/2+1} complex values. |
||
3994 | @cindex distributed array format |
||
3995 | @cindex rfftwnd array format |
||
3996 | |||
3997 | Some other important things to know about the real MPI transforms: |
||
3998 | |||
3999 | @itemize @bullet |
||
4000 | |||
4001 | @item |
||
4002 | As for the uniprocessor @code{rfftwnd_create_plan}, the dimensions |
||
4003 | passed for the @code{FFTW_COMPLEX_TO_REAL} plan are those of the |
||
4004 | @emph{real} data. In particular, even when @code{FFTW_TRANSPOSED_ORDER} |
||
4005 | @ctindex FFTW_COMPLEX_TO_REAL |
||
4006 | @ctindex FFTW_TRANSPOSED_ORDER |
||
4007 | is used as in this case, the dimensions are those of the (untransposed) |
||
4008 | real output, not the (transposed) complex input. (For the complex MPI |
||
4009 | transforms, on the other hand, the dimensions are always those of the |
||
4010 | input array.) |
||
4011 | |||
4012 | @item |
||
4013 | The output ordering of the transform (@code{FFTW_TRANSPOSED_ORDER} or |
||
4014 | @code{FFTW_TRANSPOSED_ORDER}) @emph{must} be the same for both forward |
||
4015 | and backward transforms. (This is not required in the complex case.) |
||
4016 | |||
4017 | @item |
||
4018 | @code{total_local_size} is the required size in @code{fftw_real} values, |
||
4019 | not @code{fftw_complex} values as it is for the complex transforms. |
||
4020 | |||
4021 | @item |
||
4022 | @code{local_ny_after_transpose} and @code{local_y_start_after_transpose} |
||
4023 | describe the portion of the array after the transform; that is, they are |
||
4024 | indices in the complex array for an @code{FFTW_REAL_TO_COMPLEX} transform |
||
4025 | and in the real array for an @code{FFTW_COMPLEX_TO_REAL} transform. |
||
4026 | |||
4027 | @item |
||
4028 | @code{rfftwnd_mpi} always expects @code{fftw_real*} array arguments, but |
||
4029 | of course these pointers can refer to either real or complex arrays, |
||
4030 | depending upon which side of the transform you are on. Just as for |
||
4031 | in-place uniprocessor real transforms (and also in the example above), |
||
4032 | this is most easily handled by typecasting to a complex pointer when |
||
4033 | handling the complex data. |
||
4034 | |||
4035 | @item |
||
4036 | As with the complex transforms, there are also |
||
4037 | @code{rfftwnd_create_plan} and @code{rfftw2d_create_plan} functions, and |
||
4038 | any rank greater than one is supported. |
||
4039 | @findex rfftwnd_create_plan |
||
4040 | @findex rfftw2d_create_plan |
||
4041 | |||
4042 | @end itemize |
||
4043 | |||
4044 | Programs using the MPI FFTW real transforms should link with |
||
4045 | @code{-lrfftw_mpi -lfftw_mpi -lrfftw -lfftw -lm} on Unix. |
||
4046 | @cindex linking on Unix |
||
4047 | |||
4048 | @c ------------------------------------------------------- |
||
4049 | @node Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Tips, Usage of MPI FFTW for Real Multi-dimensional Transforms, MPI FFTW |
||
4050 | @subsection Usage of MPI FFTW for Complex One-dimensional Transforms |
||
4051 | |||
4052 | The MPI FFTW also includes routines for parallel one-dimensional |
||
4053 | transforms of complex data (only). Although the speedup is generally |
||
4054 | worse than it is for the multi-dimensional routines,@footnote{The 1D |
||
4055 | transforms require much more communication. All the communication in |
||
4056 | our FFT routines takes the form of an all-to-all communication: the |
||
4057 | multi-dimensional transforms require two all-to-all communications (or |
||
4058 | one, if you use @code{FFTW_TRANSPOSED_ORDER}), while the one-dimensional |
||
4059 | transforms require @emph{three} (or two, if you use scrambled input or |
||
4060 | output).} these distributed-memory one-dimensional transforms are |
||
4061 | especially useful for performing one-dimensional transforms that don't |
||
4062 | fit into the memory of a single machine. |
||
4063 | |||
4064 | The usage of these routines is straightforward, and is similar to that |
||
4065 | of the multi-dimensional MPI transform functions. You first include the |
||
4066 | header @code{<fftw_mpi.h>} and then create a plan by calling: |
||
4067 | |||
4068 | @example |
||
4069 | fftw_mpi_plan fftw_mpi_create_plan(MPI_Comm comm, int n, |
||
4070 | fftw_direction dir, int flags); |
||
4071 | @end example |
||
4072 | @findex fftw_mpi_create_plan |
||
4073 | @tindex fftw_mpi_plan |
||
4074 | |||
4075 | The last three arguments are the same as for @code{fftw_create_plan} |
||
4076 | (except that all MPI transforms are automatically @code{FFTW_IN_PLACE}). |
||
4077 | The first argument specifies the group of processes you are using, and |
||
4078 | is usually @code{MPI_COMM_WORLD} (all processes). |
||
4079 | @fcindex MPI_COMM_WORLD |
||
4080 | A plan can be used for many transforms of the same size, and is |
||
4081 | destroyed when you are done with it by calling |
||
4082 | @code{fftw_mpi_destroy_plan(plan)}. |
||
4083 | @findex fftw_mpi_destroy_plan |
||
4084 | |||
4085 | If you don't care about the ordering of the input or output data of the |
||
4086 | transform, you can include @code{FFTW_SCRAMBLED_INPUT} and/or |
||
4087 | @code{FFTW_SCRAMBLED_OUTPUT} in the @code{flags}. |
||
4088 | @ctindex FFTW_SCRAMBLED_INPUT |
||
4089 | @ctindex FFTW_SCRAMBLED_OUTPUT |
||
4090 | @cindex flags |
||
4091 | These save some communications at the expense of having the input and/or |
||
4092 | output reordered in an undocumented way. For example, if you are |
||
4093 | performing an FFT-based convolution, you might use |
||
4094 | @code{FFTW_SCRAMBLED_OUTPUT} for the forward transform and |
||
4095 | @code{FFTW_SCRAMBLED_INPUT} for the inverse transform. |
||
4096 | |||
4097 | The transform itself is computed by: |
||
4098 | |||
4099 | @example |
||
4100 | void fftw_mpi(fftw_mpi_plan p, int n_fields, |
||
4101 | fftw_complex *local_data, fftw_complex *work); |
||
4102 | @end example |
||
4103 | @findex fftw_mpi |
||
4104 | |||
4105 | @cindex n_fields |
||
4106 | @code{n_fields}, as in @code{fftwnd_mpi}, is equivalent to |
||
4107 | @code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}, and |
||
4108 | should be @code{1} when you are computing the transform of a single |
||
4109 | array. @code{local_data} contains the portion of the array local to the |
||
4110 | current process, described below. @code{work} is either @code{NULL} or |
||
4111 | an array exactly the same size as @code{local_data}; in the latter case, |
||
4112 | FFTW can use the @code{MPI_Alltoall} communications primitive which is |
||
4113 | (usually) faster at the expense of extra storage. Upon return, |
||
4114 | @code{local_data} contains the portion of the output local to the |
||
4115 | current process (see below). |
||
4116 | @ffindex MPI_Alltoall |
||
4117 | |||
4118 | @cindex distributed array format |
||
4119 | To find out what portion of the array is stored local to the current |
||
4120 | process, you call the following routine: |
||
4121 | |||
4122 | @example |
||
4123 | void fftw_mpi_local_sizes(fftw_mpi_plan p, |
||
4124 | int *local_n, int *local_start, |
||
4125 | int *local_n_after_transform, |
||
4126 | int *local_start_after_transform, |
||
4127 | int *total_local_size); |
||
4128 | @end example |
||
4129 | @findex fftw_mpi_local_sizes |
||
4130 | |||
4131 | @code{total_local_size} is the number of @code{fftw_complex} elements |
||
4132 | you should actually allocate for @code{local_data} (and @code{work}). |
||
4133 | @code{local_n} and @code{local_start} indicate that the current process |
||
4134 | stores @code{local_n} elements corresponding to the indices |
||
4135 | @code{local_start} to @code{local_start+local_n-1} in the ``real'' |
||
4136 | array. @emph{After the transform, the process may store a different |
||
4137 | portion of the array.} The portion of the data stored on the process |
||
4138 | after the transform is given by @code{local_n_after_transform} and |
||
4139 | @code{local_start_after_transform}. This data is exactly the same as a |
||
4140 | contiguous segment of the corresponding uniprocessor transform output |
||
4141 | (i.e. an in-order sequence of sequential frequency bins). |
||
4142 | |||
4143 | Note that, if you compute both a forward and a backward transform of the |
||
4144 | same size, the local sizes are guaranteed to be consistent. That is, |
||
4145 | the local size after the forward transform will be the same as the local |
||
4146 | size before the backward transform, and vice versa. |
||
4147 | |||
4148 | Programs using the FFTW MPI routines should be linked with |
||
4149 | @code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries |
||
4150 | are required for MPI. |
||
4151 | @cindex linking on Unix |
||
4152 | |||
4153 | @c ------------------------------------------------------- |
||
4154 | @node MPI Tips, , Usage of MPI FFTW for Complex One-dimensional Transforms, MPI FFTW |
||
4155 | @subsection MPI Tips |
||
4156 | |||
4157 | There are several things you should consider in order to get the best |
||
4158 | performance out of the MPI FFTW routines. |
||
4159 | |||
4160 | @cindex load-balancing |
||
4161 | First, if possible, the first and second dimensions of your data should |
||
4162 | be divisible by the number of processes you are using. (If only one can |
||
4163 | be divisible, then you should choose the first dimension.) This allows |
||
4164 | the computational load to be spread evenly among the processes, and also |
||
4165 | reduces the communications complexity and overhead. In the |
||
4166 | one-dimensional transform case, the size of the transform should ideally |
||
4167 | be divisible by the @emph{square} of the number of processors. |
||
4168 | |||
4169 | @ctindex FFTW_TRANSPOSED_ORDER |
||
4170 | Second, you should consider using the @code{FFTW_TRANSPOSED_ORDER} |
||
4171 | output format if it is not too burdensome. The speed gains from |
||
4172 | communications savings are usually substantial. |
||
4173 | |||
4174 | Third, you should consider allocating a workspace for |
||
4175 | @code{(r)fftw(nd)_mpi}, as this can often |
||
4176 | (but not always) improve performance (at the cost of extra storage). |
||
4177 | |||
4178 | Fourth, you should experiment with the best number of processors to use |
||
4179 | for your problem. (There comes a point of diminishing returns, when the |
||
4180 | communications costs outweigh the computational benefits.@footnote{An |
||
4181 | FFT is particularly hard on communications systems, as it requires an |
||
4182 | @dfn{all-to-all} communication, which is more or less the worst possible |
||
4183 | case.}) The @code{fftw_mpi_test} program can output helpful performance |
||
4184 | benchmarks. |
||
4185 | @pindex fftw_mpi_test |
||
4186 | @cindex benchmark |
||
4187 | It accepts the same parameters as the uniprocessor test programs |
||
4188 | (c.f. @code{tests/README}) and is run like an ordinary MPI program. For |
||
4189 | example, @code{mpirun -np 4 fftw_mpi_test -s 128x128x128} will benchmark |
||
4190 | a @code{128x128x128} transform on four processors, reporting timings and |
||
4191 | parallel speedups for all variants of @code{fftwnd_mpi} (transposed, |
||
4192 | with workspace, etcetera). (Note also that there is the |
||
4193 | @code{rfftw_mpi_test} program for the real transforms.) |
||
4194 | @pindex rfftw_mpi_test |
||
4195 | |||
4196 | |||
4197 | @c ************************************************************ |
||
4198 | @node Calling FFTW from Fortran, Installation and Customization, Parallel FFTW, Top |
||
4199 | @chapter Calling FFTW from Fortran |
||
4200 | |||
4201 | @cindex Fortran-callable wrappers |
||
4202 | The standard FFTW libraries include special wrapper functions that allow |
||
4203 | Fortran programs to call FFTW subroutines. This chapter describes how |
||
4204 | those functions may be employed to use FFTW from Fortran. We assume |
||
4205 | here that the reader is already familiar with the usage of FFTW in C, as |
||
4206 | described elsewhere in this manual. |
||
4207 | |||
4208 | In general, it is not possible to call C functions directly from |
||
4209 | Fortran, due to Fortran's inability to pass arguments by value and also |
||
4210 | because Fortran compilers typically expect identifiers to be mangled |
||
4211 | @cindex compiler |
||
4212 | somehow for linking. However, if C functions are written in a special |
||
4213 | way, they @emph{are} callable from Fortran, and we have employed this |
||
4214 | technique to create Fortran-callable ``wrapper'' functions around the |
||
4215 | main FFTW routines. These wrapper functions are included in the FFTW |
||
4216 | libraries by default, unless a Fortran compiler isn't found on your |
||
4217 | system or @code{--disable-fortran} is included in the @code{configure} |
||
4218 | flags. |
||
4219 | |||
4220 | As a result, calling FFTW from Fortran requires little more than |
||
4221 | appending @samp{@code{_f77}} to the function names and then linking |
||
4222 | normally with the FFTW libraries. There are a few wrinkles, however, as |
||
4223 | we shall discuss below. |
||
4224 | |||
4225 | @menu |
||
4226 | * Wrapper Routines:: |
||
4227 | * FFTW Constants in Fortran:: |
||
4228 | * Fortran Examples:: |
||
4229 | @end menu |
||
4230 | |||
4231 | @c ------------------------------------------------------- |
||
4232 | @node Wrapper Routines, FFTW Constants in Fortran, Calling FFTW from Fortran, Calling FFTW from Fortran |
||
4233 | @section Wrapper Routines |
||
4234 | |||
4235 | All of the uniprocessor and multi-threaded transform routines have |
||
4236 | Fortran-callable wrappers, except for the wisdom import/export functions |
||
4237 | (since it is not possible to exchange string and file arguments portably |
||
4238 | with Fortran). The name of the wrapper routine is the same as that of |
||
4239 | the corresponding C routine, but with @code{fftw/fftwnd/rfftw/rfftwnd} |
||
4240 | replaced by @code{fftw_f77/fftwnd_f77/rfftw_f77/rfftwnd_f77}. For |
||
4241 | example, in Fortran, instead of calling @code{fftw_one} you would call |
||
4242 | @code{fftw_f77_one}.@footnote{Technically, Fortran 77 identifiers are |
||
4243 | not allowed to have more than 6 characters, nor may they contain |
||
4244 | underscores. Any compiler that enforces this limitation doesn't deserve |
||
4245 | to link to FFTW.} |
||
4246 | @findex fftw_f77_one |
||
4247 | For the most part, all of the arguments to the functions are the same, |
||
4248 | with the following exceptions: |
||
4249 | |||
4250 | @itemize @bullet |
||
4251 | |||
4252 | @item |
||
4253 | Any function that returns a value (e.g. @code{fftw_create_plan}) is |
||
4254 | converted into a subroutine. The return value is converted into an |
||
4255 | additional (first) parameter of the wrapper subroutine. (The reason for |
||
4256 | this is that some Fortran implementations seem to have trouble with C |
||
4257 | function return values.) |
||
4258 | |||
4259 | @item |
||
4260 | @cindex in-place transform |
||
4261 | When performing one-dimensional @code{FFTW_IN_PLACE} transforms, you |
||
4262 | don't have the option of passing @code{NULL} for the @code{out} argument |
||
4263 | (since there is no way to pass @code{NULL} from Fortran). Therefore, |
||
4264 | when performing such transforms, you @emph{must} allocate and pass a |
||
4265 | contiguous scratch array of the same size as the transform. Note that |
||
4266 | for in-place multi-dimensional (@code{(r)fftwnd}) transforms, the |
||
4267 | @code{out} argument is ignored, so you can pass anything for that |
||
4268 | parameter. |
||
4269 | |||
4270 | @item |
||
4271 | @cindex column-major |
||
4272 | The wrapper routines expect multi-dimensional arrays to be in |
||
4273 | column-major order, which is the ordinary format of Fortran arrays. |
||
4274 | They do this transparently and costlessly simply by reversing the order |
||
4275 | of the dimensions passed to FFTW, but this has one important consequence |
||
4276 | for multi-dimensional real-complex transforms, discussed below. |
||
4277 | |||
4278 | @item |
||
4279 | @code{plan} variables (what would be of type @code{fftw_plan}, |
||
4280 | @code{rfftwnd_plan}, etcetera, in C), must be declared as a type that is |
||
4281 | the same size as a pointer (address) on your machine. (Fortran has no |
||
4282 | generic pointer type.) The Fortran @code{integer} type is usually the |
||
4283 | same size as a pointer, but you need to be wary (especially on 64-bit |
||
4284 | machines). (You could also use @code{integer*4} on a 32-bit machine and |
||
4285 | @code{integer*8} on a 64-bit machine.) Ugh. (@code{g77} has a special |
||
4286 | type, @code{integer(kind=7)}, that is defined to be the same size as a |
||
4287 | pointer.) |
||
4288 | |||
4289 | @end itemize |
||
4290 | |||
4291 | @cindex floating-point precision |
||
4292 | In general, you should take care to use Fortran data types that |
||
4293 | correspond to (i.e. are the same size as) the C types used by FFTW. If |
||
4294 | your C and Fortran compilers are made by the same vendor, the |
||
4295 | correspondence is usually straightforward (i.e. @code{integer} |
||
4296 | corresponds to @code{int}, @code{real} corresponds to @code{float}, |
||
4297 | etcetera). Such simple correspondences are assumed in the examples |
||
4298 | below. The examples also assume that FFTW was compiled in |
||
4299 | double precision (the default). |
||
4300 | |||
4301 | @c ------------------------------------------------------- |
||
4302 | @node FFTW Constants in Fortran, Fortran Examples, Wrapper Routines, Calling FFTW from Fortran |
||
4303 | @section FFTW Constants in Fortran |
||
4304 | |||
4305 | When creating plans in FFTW, a number of constants are used to specify |
||
4306 | options, such as @code{FFTW_FORWARD} or @code{FFTW_USE_WISDOM}. The |
||
4307 | same constants must be used with the wrapper routines, but of course the |
||
4308 | C header files where the constants are defined can't be incorporated |
||
4309 | directly into Fortran code. |
||
4310 | |||
4311 | Instead, we have placed Fortran equivalents of the FFTW constant |
||
4312 | definitions in the file @code{fortran/fftw_f77.i} of the FFTW package. |
||
4313 | If your Fortran compiler supports a preprocessor, you can use that to |
||
4314 | incorporate this file into your code whenever you need to call FFTW. |
||
4315 | Otherwise, you will have to paste the constant definitions in directly. |
||
4316 | They are: |
||
4317 | |||
4318 | @example |
||
4319 | integer FFTW_FORWARD,FFTW_BACKWARD |
||
4320 | parameter (FFTW_FORWARD=-1,FFTW_BACKWARD=1) |
||
4321 | |||
4322 | integer FFTW_REAL_TO_COMPLEX,FFTW_COMPLEX_TO_REAL |
||
4323 | parameter (FFTW_REAL_TO_COMPLEX=-1,FFTW_COMPLEX_TO_REAL=1) |
||
4324 | |||
4325 | integer FFTW_ESTIMATE,FFTW_MEASURE |
||
4326 | parameter (FFTW_ESTIMATE=0,FFTW_MEASURE=1) |
||
4327 | |||
4328 | integer FFTW_OUT_OF_PLACE,FFTW_IN_PLACE,FFTW_USE_WISDOM |
||
4329 | parameter (FFTW_OUT_OF_PLACE=0) |
||
4330 | parameter (FFTW_IN_PLACE=8,FFTW_USE_WISDOM=16) |
||
4331 | |||
4332 | integer FFTW_THREADSAFE |
||
4333 | parameter (FFTW_THREADSAFE=128) |
||
4334 | @end example |
||
4335 | |||
4336 | @cindex flags |
||
4337 | In C, you combine different flags (like @code{FFTW_USE_WISDOM} and |
||
4338 | @code{FFTW_MEASURE}) using the @samp{@code{|}} operator; in Fortran you |
||
4339 | should just use @samp{@code{+}}. |
||
4340 | |||
4341 | @c ------------------------------------------------------- |
||
4342 | @node Fortran Examples, , FFTW Constants in Fortran, Calling FFTW from Fortran |
||
4343 | @section Fortran Examples |
||
4344 | |||
4345 | In C you might have something like the following to transform a |
||
4346 | one-dimensional complex array: |
||
4347 | |||
4348 | @example |
||
4349 | fftw_complex in[N], *out[N]; |
||
4350 | fftw_plan plan; |
||
4351 | |||
4352 | plan = fftw_create_plan(N,FFTW_FORWARD,FFTW_ESTIMATE); |
||
4353 | fftw_one(plan,in,out); |
||
4354 | fftw_destroy_plan(plan); |
||
4355 | @end example |
||
4356 | |||
4357 | In Fortran, you use the following to accomplish the same thing: |
||
4358 | |||
4359 | @example |
||
4360 | double complex in, out |
||
4361 | dimension in(N), out(N) |
||
4362 | integer plan |
||
4363 | |||
4364 | call fftw_f77_create_plan(plan,N,FFTW_FORWARD,FFTW_ESTIMATE) |
||
4365 | call fftw_f77_one(plan,in,out) |
||
4366 | call fftw_f77_destroy_plan(plan) |
||
4367 | @end example |
||
4368 | @findex fftw_f77_create_plan |
||
4369 | @findex fftw_f77_one |
||
4370 | @findex fftw_f77_destroy_plan |
||
4371 | |||
4372 | Notice how all routines are called as Fortran subroutines, and the plan |
||
4373 | is returned via the first argument to @code{fftw_f77_create_plan}. To |
||
4374 | do the same thing, but using 8 threads in parallel |
||
4375 | (@pxref{Multi-threaded FFTW}), you would simply replace the call to |
||
4376 | @code{fftw_f77_one} with: |
||
4377 | |||
4378 | @example |
||
4379 | call fftw_f77_threads_one(8,plan,in,out) |
||
4380 | @end example |
||
4381 | @findex fftw_f77_threads_one |
||
4382 | |||
4383 | To transform a three-dimensional array in-place with C, you might do: |
||
4384 | |||
4385 | @example |
||
4386 | fftw_complex arr[L][M][N]; |
||
4387 | fftwnd_plan plan; |
||
4388 | int n[3] = @{L,M,N@}; |
||
4389 | |||
4390 | plan = fftwnd_create_plan(3,n,FFTW_FORWARD, |
||
4391 | FFTW_ESTIMATE | FFTW_IN_PLACE); |
||
4392 | fftwnd_one(plan, arr, 0); |
||
4393 | fftwnd_destroy_plan(plan); |
||
4394 | @end example |
||
4395 | |||
4396 | In Fortran, you would use this instead: |
||
4397 | |||
4398 | @example |
||
4399 | double complex arr |
||
4400 | dimension arr(L,M,N) |
||
4401 | integer n |
||
4402 | dimension n(3) |
||
4403 | integer plan |
||
4404 | |||
4405 | n(1) = L |
||
4406 | n(2) = M |
||
4407 | n(3) = N |
||
4408 | call fftwnd_f77_create_plan(plan,3,n,FFTW_FORWARD, |
||
4409 | + FFTW_ESTIMATE + FFTW_IN_PLACE) |
||
4410 | call fftwnd_f77_one(plan, arr, 0) |
||
4411 | call fftwnd_f77_destroy_plan(plan) |
||
4412 | @end example |
||
4413 | @findex fftwnd_f77_create_plan |
||
4414 | @findex fftwnd_f77_one |
||
4415 | @findex fftwnd_f77_destroy_plan |
||
4416 | |||
4417 | Instead of calling @code{fftwnd_f77_create_plan(plan,3,n,...)}, we could |
||
4418 | also have called @code{fftw3d_f77_create_plan(plan,L,M,N,...)}. |
||
4419 | @findex fftw3d_f77_create_plan |
||
4420 | |||
4421 | Note that we pass the array dimensions in the "natural" order; also |
||
4422 | note that the last argument to @code{fftwnd_f77} is ignored since the |
||
4423 | transform is @code{FFTW_IN_PLACE}. |
||
4424 | |||
4425 | To transform a one-dimensional real array in Fortran, you might do: |
||
4426 | |||
4427 | @example |
||
4428 | double precision in, out |
||
4429 | dimension in(N), out(N) |
||
4430 | integer plan |
||
4431 | |||
4432 | call rfftw_f77_create_plan(plan,N,FFTW_REAL_TO_COMPLEX, |
||
4433 | + FFTW_ESTIMATE) |
||
4434 | call rfftw_f77_one(plan,in,out) |
||
4435 | call rfftw_f77_destroy_plan(plan) |
||
4436 | @end example |
||
4437 | @findex rfftw_f77_create_plan |
||
4438 | @findex rfftw_f77_one |
||
4439 | @findex rfftw_f77_destroy_plan |
||
4440 | |||
4441 | To transform a two-dimensional real array, out of place, you might use |
||
4442 | the following: |
||
4443 | |||
4444 | @example |
||
4445 | double precision in |
||
4446 | double complex out |
||
4447 | dimension in(M,N), out(M/2 + 1, N) |
||
4448 | integer plan |
||
4449 | |||
4450 | call rfftw2d_f77_create_plan(plan,M,N,FFTW_REAL_TO_COMPLEX, |
||
4451 | + FFTW_ESTIMATE) |
||
4452 | call rfftwnd_f77_one_real_to_complex(plan, in, out) |
||
4453 | call rfftwnd_f77_destroy_plan(plan) |
||
4454 | @end example |
||
4455 | @findex rfftw2d_f77_create_plan |
||
4456 | @findex rfftwnd_f77_one_real_to_complex |
||
4457 | @findex rfftwnd_f77_destroy_plan |
||
4458 | |||
4459 | @b{Important:} Notice that it is the @emph{first} dimension of the |
||
4460 | complex output array that is cut in half in Fortran, rather than the |
||
4461 | last dimension as in C. This is a consequence of the wrapper routines |
||
4462 | reversing the order of the array dimensions passed to FFTW so that the |
||
4463 | Fortran program can use its ordinary column-major order. |
||
4464 | @cindex column-major |
||
4465 | @cindex rfftwnd array format |
||
4466 | |||
4467 | |||
4468 | @c ************************************************************ |
||
4469 | @node Installation and Customization, Acknowledgments, Calling FFTW from Fortran, Top |
||
4470 | @chapter Installation and Customization |
||
4471 | |||
4472 | This chapter describes the installation and customization of FFTW, the |
||
4473 | latest version of which may be downloaded from |
||
4474 | @uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}. |
||
4475 | |||
4476 | As distributed, FFTW makes very few assumptions about your system. All |
||
4477 | you need is an ANSI C compiler (@code{gcc} is fine, although |
||
4478 | vendor-provided compilers often produce faster code). |
||
4479 | @cindex compiler |
||
4480 | However, installation of FFTW is somewhat simpler if you have a Unix or |
||
4481 | a GNU system, such as Linux. In this chapter, we first describe the |
||
4482 | installation of FFTW on Unix and non-Unix systems. We then describe how |
||
4483 | you can customize FFTW to achieve better performance. Specifically, you |
||
4484 | can I) enable @code{gcc}/x86-specific hacks that improve performance on |
||
4485 | Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock |
||
4486 | of your machine, if any; III) produce code (@emph{codelets}) to support |
||
4487 | fast transforms of sizes that are not supported efficiently by the |
||
4488 | standard FFTW distribution. |
||
4489 | @cindex installation |
||
4490 | |||
4491 | @menu |
||
4492 | * Installation on Unix:: |
||
4493 | * Installation on non-Unix Systems:: |
||
4494 | * Installing FFTW in both single and double precision:: |
||
4495 | * gcc and Pentium/PentiumPro hacks:: |
||
4496 | * Customizing the timer:: |
||
4497 | * Generating your own code:: |
||
4498 | @end menu |
||
4499 | |||
4500 | @node Installation on Unix, Installation on non-Unix Systems, Installation and Customization, Installation and Customization |
||
4501 | @section Installation on Unix |
||
4502 | |||
4503 | FFTW comes with a @code{configure} program in the GNU style. |
||
4504 | Installation can be as simple as: |
||
4505 | @fpindex configure |
||
4506 | |||
4507 | @example |
||
4508 | ./configure |
||
4509 | make |
||
4510 | make install |
||
4511 | @end example |
||
4512 | |||
4513 | This will build the uniprocessor complex and real transform libraries |
||
4514 | along with the test programs. We strongly recommend that you use GNU |
||
4515 | @code{make} if it is available; on some systems it is called |
||
4516 | @code{gmake}. The ``@code{make install}'' command installs the fftw and |
||
4517 | rfftw libraries in standard places, and typically requires root |
||
4518 | privileges (unless you specify a different install directory with the |
||
4519 | @code{--prefix} flag to @code{configure}). You can also type |
||
4520 | ``@code{make check}'' to put the FFTW test programs through their paces. |
||
4521 | If you have problems during configuration or compilation, you may want |
||
4522 | to run ``@code{make distclean}'' before trying again; this ensures that |
||
4523 | you don't have any stale files left over from previous compilation |
||
4524 | attempts. |
||
4525 | |||
4526 | The @code{configure} script knows good @code{CFLAGS} (C compiler flags) |
||
4527 | @cindex compiler flags |
||
4528 | for a few systems. If your system is not known, the @code{configure} |
||
4529 | script will print out a warning. @footnote{Each version of @code{cc} |
||
4530 | seems to have its own magic incantation to get the fastest code most of |
||
4531 | the time---you'd think that people would have agreed upon some |
||
4532 | convention, e.g. "@code{-Omax}", by now.} In this case, you can compile |
||
4533 | FFTW with the command |
||
4534 | @example |
||
4535 | make CFLAGS="<write your CFLAGS here>" |
||
4536 | @end example |
||
4537 | If you do find an optimal set of @code{CFLAGS} for your system, please |
||
4538 | let us know what they are (along with the output of @code{config.guess}) |
||
4539 | so that we can include them in future releases. |
||
4540 | |||
4541 | The @code{configure} program supports all the standard flags defined by |
||
4542 | the GNU Coding Standards; see the @code{INSTALL} file in FFTW or |
||
4543 | @uref{http://www.gnu.org/prep/standards_toc.html, the GNU web page}. |
||
4544 | Note especially @code{--help} to list all flags and |
||
4545 | @code{--enable-shared} to create shared, rather than static, libraries. |
||
4546 | @code{configure} also accepts a few FFTW-specific flags, particularly: |
||
4547 | |||
4548 | @itemize @bullet |
||
4549 | |||
4550 | @item |
||
4551 | @cindex floating-point precision |
||
4552 | @code{--enable-float} Produces a single-precision version of FFTW |
||
4553 | (@code{float}) instead of the default double-precision (@code{double}). |
||
4554 | @xref{Installing FFTW in both single and double precision}. |
||
4555 | |||
4556 | @item |
||
4557 | @code{--enable-type-prefix} Adds a @samp{d} or @samp{s} prefix to all |
||
4558 | installed libraries and header files to indicate the floating-point |
||
4559 | precision. @xref{Installing FFTW in both single and double |
||
4560 | precision}. (@code{--enable-type-prefix=<prefix>} lets you add an |
||
4561 | arbitrary prefix.) By default, no prefix is used. |
||
4562 | |||
4563 | @item |
||
4564 | @cindex threads |
||
4565 | @code{--enable-threads} Enables compilation and installation of the FFTW |
||
4566 | threads library (@pxref{Multi-threaded FFTW}), which provides a |
||
4567 | simple interface to parallel transforms for SMP systems. (By default, |
||
4568 | the threads routines are not compiled.) |
||
4569 | |||
4570 | @item |
||
4571 | @cindex MPI |
||
4572 | @code{--enable-mpi} Enables compilation and installation of the FFTW MPI |
||
4573 | library (@pxref{MPI FFTW}), which provides parallel transforms for |
||
4574 | distributed-memory systems with MPI. (By default, the MPI routines are |
||
4575 | not compiled.) |
||
4576 | |||
4577 | @item |
||
4578 | @cindex Fortran-callable wrappers |
||
4579 | @code{--disable-fortran} Disables inclusion of Fortran-callable wrapper |
||
4580 | routines (@pxref{Calling FFTW from Fortran}) in the standard FFTW |
||
4581 | libraries. These wrapper routines increase the library size by only a |
||
4582 | negligible amount, so they are included by default as long as the |
||
4583 | @code{configure} script finds a Fortran compiler on your system. |
||
4584 | |||
4585 | @item |
||
4586 | @code{--with-gcc} Enables the use of @code{gcc}. By default, FFTW uses |
||
4587 | the vendor-supplied @code{cc} compiler if present. Unfortunately, |
||
4588 | @code{gcc} produces slower code than @code{cc} on many systems. |
||
4589 | |||
4590 | @item |
||
4591 | @code{--enable-i386-hacks} See below. |
||
4592 | |||
4593 | @item |
||
4594 | @code{--enable-pentium-timer} See below. |
||
4595 | |||
4596 | @end itemize |
||
4597 | |||
4598 | To force @code{configure} to use a particular C compiler (instead of the |
||
4599 | @cindex compiler |
||
4600 | default, usually @code{cc}), set the environment variable @code{CC} to |
||
4601 | the name of the desired compiler before running @code{configure}; you |
||
4602 | may also need to set the flags via the variable @code{CFLAGS}. |
||
4603 | @cindex compiler flags |
||
4604 | |||
4605 | @node Installation on non-Unix Systems, Installing FFTW in both single and double precision, Installation on Unix, Installation and Customization |
||
4606 | @section Installation on non-Unix Systems |
||
4607 | |||
4608 | It is quite straightforward to install FFTW even on non-Unix systems |
||
4609 | lacking the niceties of the @code{configure} script. The FFTW Home Page |
||
4610 | may include some FFTW packages preconfigured for particular |
||
4611 | systems/compilers, and also contains installation notes sent in by |
||
4612 | @cindex compiler |
||
4613 | users. All you really need to do, though, is to compile all of the |
||
4614 | @code{.c} files in the appropriate directories of the FFTW package. |
||
4615 | (You needn't worry about the many extraneous files lying around.) |
||
4616 | |||
4617 | For the complex transforms, compile all of the @code{.c} files in the |
||
4618 | @code{fftw} directory and link them into a library. Similarly, for the |
||
4619 | real transforms, compile all of the @code{.c} files in the @code{rfftw} |
||
4620 | directory into a library. Note that these sources @code{#include} |
||
4621 | various files in the @code{fftw} and @code{rfftw} directories, so you |
||
4622 | may need to set up the @code{#include} paths for your compiler |
||
4623 | appropriately. Be sure to enable the highest-possible level of |
||
4624 | optimization in your compiler. |
||
4625 | |||
4626 | @cindex floating-point precision |
||
4627 | By default, FFTW is compiled for double-precision transforms. To work |
||
4628 | in single precision rather than double precision, @code{#define} the |
||
4629 | symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} (in the @code{fftw} |
||
4630 | directory) and (re)compile FFTW. |
||
4631 | |||
4632 | These libraries should be linked with any program that uses the |
||
4633 | corresponding transforms. The required header files, @code{fftw.h} and |
||
4634 | @code{rfftw.h}, are located in the @code{fftw} and @code{rfftw} |
||
4635 | directories respectively; you may want to put them with the libraries, |
||
4636 | or wherever header files normally go on your system. |
||
4637 | |||
4638 | FFTW includes test programs, @code{fftw_test} and @code{rfftw_test}, in |
||
4639 | @pindex fftw_test |
||
4640 | @pindex rfftw_test |
||
4641 | the @code{tests} directory. These are compiled and linked like any |
||
4642 | program using FFTW, except that they use additional header files located |
||
4643 | in the @code{fftw} and @code{rfftw} directories, so you will need to set |
||
4644 | your compiler @code{#include} paths appropriately. @code{fftw_test} is |
||
4645 | compiled from @code{fftw_test.c} and @code{test_main.c}, while |
||
4646 | @code{rfftw_test} is compiled from @code{rfftw_test.c} and |
||
4647 | @code{test_main.c}. When you run these programs, you will be prompted |
||
4648 | interactively for various possible tests to perform; see also |
||
4649 | @code{tests/README} for more information. |
||
4650 | |||
4651 | @node Installing FFTW in both single and double precision, gcc and Pentium/PentiumPro hacks, Installation on non-Unix Systems, Installation and Customization |
||
4652 | @section Installing FFTW in both single and double precision |
||
4653 | |||
4654 | @cindex floating-point precision |
||
4655 | It is often useful to install both single- and double-precision versions |
||
4656 | of the FFTW libraries on the same machine, and we provide a convenient |
||
4657 | mechanism for achieving this on Unix systems. |
||
4658 | |||
4659 | @fpindex configure |
||
4660 | When the @code{--enable-type-prefix} option of configure is used, the |
||
4661 | FFTW libraries and header files are installed with a prefix of @samp{d} |
||
4662 | or @samp{s}, depending upon whether you compiled in double or single |
||
4663 | precision. Then, instead of linking your program with @code{-lrfftw |
||
4664 | -lfftw}, for example, you would link with @code{-ldrfftw -ldfftw} to use |
||
4665 | the double-precision version or with @code{-lsrfftw -lsfftw} to use the |
||
4666 | single-precision version. Also, you would @code{#include} |
||
4667 | @code{<drfftw.h>} or @code{<srfftw.h>} instead of @code{<rfftw.h>}, and |
||
4668 | so on. |
||
4669 | |||
4670 | @emph{The names of FFTW functions, data types, and constants remain |
||
4671 | unchanged!} You still call, for instance, @code{fftw_one} and not |
||
4672 | @code{dfftw_one}. Only the names of header files and libraries are |
||
4673 | modified. One consequence of this is that @emph{you @b{cannot} use both |
||
4674 | the single- and double-precision FFTW libraries in the same program, |
||
4675 | simultaneously,} as the function names would conflict. |
||
4676 | |||
4677 | So, to install both the single- and double-precision libraries on the |
||
4678 | same machine, you would do: |
||
4679 | |||
4680 | @example |
||
4681 | ./configure --enable-type-prefix @i{[ other options ]} |
||
4682 | make |
||
4683 | make install |
||
4684 | make clean |
||
4685 | ./configure --enable-float --enable-type-prefix @i{[ other options ]} |
||
4686 | make |
||
4687 | make install |
||
4688 | @end example |
||
4689 | |||
4690 | @node gcc and Pentium/PentiumPro hacks, Customizing the timer, Installing FFTW in both single and double precision, Installation and Customization |
||
4691 | @section @code{gcc} and Pentium/PentiumPro hacks |
||
4692 | @cindex Pentium hack |
||
4693 | The @code{configure} option @code{--enable-i386-hacks} enables specific |
||
4694 | optimizations for @code{gcc} and Pentium/PentiumPro, which can |
||
4695 | significantly improve performance of double-precision transforms. |
||
4696 | Specifically, we have tested these hacks on Linux with @code{gcc} |
||
4697 | 2.[78] and versions of @code{egcs} since 1.0.3. These optimizations |
||
4698 | only affect the performance, not the correctness of FFTW (i.e. it is |
||
4699 | always safe to try them out). |
||
4700 | |||
4701 | These hacks provide a workaround to the incorrect alignment of local |
||
4702 | @code{double} variables in @code{gcc}. The compiler aligns these |
||
4703 | @cindex compiler |
||
4704 | variables to multiples of 4 bytes, but execution is much faster (on |
||
4705 | Pentium and PentiumPro) if @code{double}s are aligned to a multiple of 8 |
||
4706 | bytes. By carefully counting the number of variables allocated by the |
||
4707 | compiler in performance-critical regions of the code, we have been able |
||
4708 | to introduce dummy allocations (using @code{alloca}) that align the |
||
4709 | stack properly. The hack depends crucially on the compiler flags that |
||
4710 | are used. For example, it won't work without |
||
4711 | @code{-fomit-frame-pointer}. |
||
4712 | |||
4713 | The @code{fftw_test} program outputs speed measurements that you can use |
||
4714 | to see if these hacks are beneficial. |
||
4715 | @pindex fftw_test |
||
4716 | @cindex benchmark |
||
4717 | |||
4718 | The @code{configure} option @code{--enable-pentium-timer} enables the |
||
4719 | use of the Pentium and PentiumPro cycle counter for timing purposes. In |
||
4720 | order to get correct results, you must define @code{FFTW_CYCLES_PER_SEC} |
||
4721 | in @code{fftw/config.h} to be the clock speed of your processor; the |
||
4722 | resulting FFTW library will be nonportable. The use of this option is |
||
4723 | deprecated. On serious operating systems (such as Linux), FFTW uses |
||
4724 | @code{gettimeofday()}, which has enough resolution and is portable. |
||
4725 | (Note that Win32 has its own high-resolution timing routines as well. |
||
4726 | FFTW contains unsupported code to use these routines.) |
||
4727 | |||
4728 | @node Customizing the timer, Generating your own code, gcc and Pentium/PentiumPro hacks, Installation and Customization |
||
4729 | @section Customizing the timer |
||
4730 | @cindex timer, customization of |
||
4731 | |||
4732 | FFTW needs a reasonably-precise clock in order to find the optimal way |
||
4733 | to compute a transform. On Unix systems, @code{configure} looks for |
||
4734 | @code{gettimeofday} and other system-specific timers. If it does not |
||
4735 | find any high resolution clock, it defaults to using the @code{clock()} |
||
4736 | function, which is very portable, but forces FFTW to run for a long time |
||
4737 | in order to get reliable measurements. |
||
4738 | @ffindex gettimeofday |
||
4739 | @ffindex clock |
||
4740 | |||
4741 | If your machine supports a high-resolution clock not recognized by FFTW, |
||
4742 | it is therefore advisable to use it. You must edit |
||
4743 | @code{fftw/fftw-int.h}. There are a few macros you must redefine. The |
||
4744 | code is documented and should be self-explanatory. (By the way, |
||
4745 | @code{fftw-int} stands for @code{fftw-internal}, but for some |
||
4746 | inexplicable reason people are still using primitive systems with 8.3 |
||
4747 | filenames.) |
||
4748 | |||
4749 | Even if you don't install high-resolution timing code, we still |
||
4750 | recommend that you look at the @code{FFTW_TIME_MIN} constant in |
||
4751 | @ctindex FFTW_TIME_MIN |
||
4752 | @code{fftw/fftw-int.h}. This constant holds the minimum time interval (in |
||
4753 | seconds) required to get accurate timing measurements, and should be (at |
||
4754 | least) several hundred times the resolution of your clock. The default |
||
4755 | constants are on the conservative side, and may cause FFTW to take |
||
4756 | longer than necessary when you create a plan. Set @code{FFTW_TIME_MIN} |
||
4757 | to whatever is appropriate on your system (be sure to set the |
||
4758 | @emph{right} @code{FFTW_TIME_MIN}@dots{}there are several definitions in |
||
4759 | @code{fftw-int.h}, corresponding to different platforms and timers). |
||
4760 | |||
4761 | As an aid in checking the resolution of your clock, you can use the |
||
4762 | @code{tests/fftw_test} program with the @code{-t} option |
||
4763 | (c.f. @code{tests/README}). Remember, the mere fact that your clock |
||
4764 | reports times in, say, picoseconds, does not mean that it is actually |
||
4765 | @emph{accurate} to that resolution. |
||
4766 | |||
4767 | @node Generating your own code, , Customizing the timer, Installation and Customization |
||
4768 | @section Generating your own code |
||
4769 | @cindex Caml |
||
4770 | @cindex ML |
||
4771 | @cindex code generator |
||
4772 | |||
4773 | If you know that you will only use transforms of a certain size (say, |
||
4774 | powers of @math{2}) and want to reduce the size of the library, you can |
||
4775 | reconfigure FFTW to support only those sizes you are interested in. You |
||
4776 | may even generate code to enable efficient transforms of a size not |
||
4777 | supported by the default distribution. The default distribution |
||
4778 | supports transforms of any size, but not all sizes are equally fast. |
||
4779 | The default installation of FFTW is best at handling sizes of the form |
||
4780 | @ifinfo |
||
4781 | @math{2^a 3^b 5^c 7^d 11^e 13^f}, |
||
4782 | @end ifinfo |
||
4783 | @iftex |
||
4784 | @tex |
||
4785 | $2^a 3^b 5^c 7^d 11^e 13^f$, |
||
4786 | @end tex |
||
4787 | @end iftex |
||
4788 | @ifhtml |
||
4789 | 2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP> |
||
4790 | 11<SUP>e</SUP> 13<SUP>f</SUP>, |
||
4791 | @end ifhtml |
||
4792 | where @math{e+f} is either @math{0} or |
||
4793 | @math{1}, and the other exponents are arbitrary. Other sizes are |
||
4794 | computed by means of a slow, general-purpose routine. However, if you |
||
4795 | have an application that requires fast transforms of size, say, |
||
4796 | @code{17}, there is a way to generate specialized code to handle that. |
||
4797 | |||
4798 | The directory @code{gensrc} contains all the programs and scripts that |
||
4799 | were used to generate FFTW. In particular, the program |
||
4800 | @code{gensrc/genfft.ml} was used to generate the code that FFTW uses to |
||
4801 | compute the transforms. We do not expect casual users to use it. |
||
4802 | @code{genfft} is a rather sophisticated program that generates directed |
||
4803 | acyclic graphs of FFT algorithms and performs algebraic simplifications |
||
4804 | on them. @code{genfft} is written in Objective Caml, a dialect of ML. |
||
4805 | Objective Caml is described at @uref{http://pauillac.inria.fr/ocaml/} |
||
4806 | and can be downloaded from from @uref{ftp://ftp.inria.fr/lang/caml-light}. |
||
4807 | @pindex genfft |
||
4808 | @cindex Caml |
||
4809 | |||
4810 | If you have Objective Caml installed, you can type @code{sh |
||
4811 | bootstrap.sh} in the top-level directory to re-generate the files. If |
||
4812 | you change the @code{gensrc/config} file, you can optimize FFTW for |
||
4813 | sizes that are not currently supported efficiently (say, 17 or 19). |
||
4814 | |||
4815 | We do not provide more details about the code-generation process, since |
||
4816 | we do not expect that users will need to generate their own code. |
||
4817 | However, feel free to contact us at @email{fftw@@theory.lcs.mit.edu} if |
||
4818 | you are interested in the subject. |
||
4819 | |||
4820 | @cindex monadic programming |
||
4821 | You might find it interesting to learn Caml and/or some modern |
||
4822 | programming techniques that we used in the generator (including monadic |
||
4823 | programming), especially if you heard the rumor that Java and |
||
4824 | object-oriented programming are the latest advancement in the field. |
||
4825 | The internal operation of the codelet generator is described in the |
||
4826 | paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is |
||
4827 | available from the @uref{http://theory.lcs.mit.edu/~fftw,FFTW home page} |
||
4828 | and will appear in the @cite{Proceedings of the 1999 ACM SIGPLAN |
||
4829 | Conference on Programming Language Design and Implementation (PLDI)}. |
||
4830 | |||
4831 | @c ************************************************************ |
||
4832 | @node Acknowledgments, License and Copyright, Installation and Customization, Top |
||
4833 | @chapter Acknowledgments |
||
4834 | |||
4835 | Matteo Frigo was supported in part by the Defense Advanced Research |
||
4836 | Projects Agency (DARPA) under Grants N00014-94-1-0985 and |
||
4837 | F30602-97-1-0270, and by a Digital Equipment Corporation Fellowship. |
||
4838 | Steven G. Johnson was supported in part by a DoD NDSEG Fellowship, an |
||
4839 | MIT Karl Taylor Compton Fellowship, and by the Materials Research |
||
4840 | Science and Engineering Center program of the National Science |
||
4841 | Foundation under award DMR-9400334. |
||
4842 | |||
4843 | Both authors were also supported in part by their respective |
||
4844 | girlfriends, by the letters ``Q'' and ``R'', and by the number 12. |
||
4845 | @cindex girlfriends |
||
4846 | |||
4847 | We are grateful to SUN Microsystems Inc.@ for its donation of a cluster |
||
4848 | of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These machines |
||
4849 | served as the primary platform for the development of earlier versions |
||
4850 | of FFTW. |
||
4851 | |||
4852 | We thank Intel Corporation for donating a four-processor Pentium Pro |
||
4853 | machine. We thank the Linux community for giving us a decent OS to run |
||
4854 | on that machine. |
||
4855 | |||
4856 | The @code{genfft} program was written using Objective Caml, a dialect of |
||
4857 | ML. Objective Caml is a small and elegant language developed by Xavier |
||
4858 | Leroy. The implementation is available from @code{ftp.inria.fr} in the |
||
4859 | directory @code{lang/caml-light}. We used versions 1.07 and 2.00 of the |
||
4860 | software. In previous releases of FFTW, @code{genfft} was written in |
||
4861 | Caml Light, by the same authors. An even earlier implementation of |
||
4862 | @code{genfft} was written in Scheme, but Caml is definitely better for |
||
4863 | this kind of application. |
||
4864 | @pindex genfft |
||
4865 | @cindex Caml |
||
4866 | @cindex LISP |
||
4867 | |||
4868 | FFTW uses many tools from the GNU project, including @code{automake}, |
||
4869 | @code{texinfo}, and @code{libtool}. |
||
4870 | |||
4871 | Prof.@ Charles E.@ Leiserson of MIT provided continuous support and |
||
4872 | encouragement. This program would not exist without him. Charles also |
||
4873 | proposed the name ``codelets'' for the basic FFT blocks. |
||
4874 | |||
4875 | Prof.@ John D.@ Joannopoulos of MIT demonstrated continuing tolerance of |
||
4876 | Steven's ``extra-curricular'' computer-science activities. Steven's |
||
4877 | chances at a physics degree would not exist without him. |
||
4878 | |||
4879 | Andrew Sterian contributed the Windows timing code. |
||
4880 | |||
4881 | Didier Miras reported a bug in the test procedure used in FFTW 1.2. We |
||
4882 | now use a completely different test algorithm by Funda Ergun that does |
||
4883 | not require a separate FFT program to compare against. |
||
4884 | |||
4885 | Wolfgang Reimer contributed the Pentium cycle counter and a few fixes |
||
4886 | that help portability. |
||
4887 | |||
4888 | Ming-Chang Liu uncovered a well-hidden bug in the complex transforms of |
||
4889 | FFTW 2.0 and supplied a patch to correct it. |
||
4890 | |||
4891 | The FFTW FAQ was written in @code{bfnn} (Bizarre Format With No Name) |
||
4892 | and formatted using the tools developed by Ian Jackson for the Linux |
||
4893 | FAQ. |
||
4894 | |||
4895 | @emph{We are especially thankful to all of our users for their |
||
4896 | continuing support, feedback, and interest during our development of |
||
4897 | FFTW.} |
||
4898 | |||
4899 | @c ************************************************************ |
||
4900 | @node License and Copyright, Concept Index, Acknowledgments, Top |
||
4901 | @chapter License and Copyright |
||
4902 | |||
4903 | FFTW is copyright @copyright{} 1997--1999 Massachusetts Institute of |
||
4904 | Technology. |
||
4905 | |||
4906 | FFTW is free software; you can redistribute it and/or modify |
||
4907 | it under the terms of the GNU General Public License as published by |
||
4908 | the Free Software Foundation; either version 2 of the License, or |
||
4909 | (at your option) any later version. |
||
4910 | |||
4911 | This program is distributed in the hope that it will be useful, |
||
4912 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
||
4913 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||
4914 | GNU General Public License for more details. |
||
4915 | |||
4916 | You should have received a copy of the GNU General Public License along |
||
4917 | with this program; if not, write to the Free Software Foundation, Inc., |
||
4918 | 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. You can also |
||
4919 | find the @uref{http://www.gnu.org/copyleft/gpl.html, GPL on the GNU web |
||
4920 | site}. |
||
4921 | |||
4922 | In addition, we kindly ask you to acknowledge FFTW and its authors in |
||
4923 | any program or publication in which you use FFTW. (You are not |
||
4924 | @emph{required} to do so; it is up to your common sense to decide |
||
4925 | whether you want to comply with this request or not.) |
||
4926 | |||
4927 | Non-free versions of FFTW are available under terms different than the |
||
4928 | General Public License. (e.g. they do not require you to accompany any |
||
4929 | object code using FFTW with the corresponding source code.) For these |
||
4930 | alternate terms you must purchase a license from MIT's Technology |
||
4931 | Licensing Office. Users interested in such a license should contact us |
||
4932 | (@email{fftw@@theory.lcs.mit.edu}) for more information. |
||
4933 | |||
4934 | @node Concept Index, Library Index, License and Copyright, Top |
||
4935 | @chapter Concept Index |
||
4936 | @printindex cp |
||
4937 | |||
4938 | @node Library Index, , Concept Index, Top |
||
4939 | @chapter Library Index |
||
4940 | @printindex fn |
||
4941 | |||
4942 | @c ************************************************************ |
||
4943 | @contents |
||
4944 | |||
4945 | @bye |
||
4946 | @c LocalWords: texinfo fftw texi Exp setfilename settitle setchapternewpage |
||
4947 | @c LocalWords: syncodeindex fn cp vr pg tp ifinfo titlepage sp Matteo Frigo |
||
4948 | @c LocalWords: vskip pt filll dir detailmenu halfcomplex DFT fftwnd rfftw gcc |
||
4949 | @c LocalWords: rfftwnd Pentium PentiumPro NJ FFT cindex ESSL FFTW's emph uref |
||
4950 | @c LocalWords: http lcs mit edu benchfft benchFFT pindex dfn iftex tex cdot |
||
4951 | @c LocalWords: ifhtml nbsp TR Sep Proc ICASSP Tukey Rader's ref xref int FFTs |
||
4952 | @c LocalWords: findex tindex vindex im strided pxref DC lfftw lm samp const |
||
4953 | @c LocalWords: nx ny nz ldots rk ik hc freq lrfftw datatypes cdots pointwise |
||
4954 | @c LocalWords: pinv ij rote increaseth Ecclesiastes nerd stdout fopen printf |
||
4955 | @c LocalWords: fclose dimension's malloc sizeof comp lang www eskimo com scs |
||
4956 | @c LocalWords: faq html README Cilk SMP cilk POSIX Solaris BeOS MacOS mpi mcs |
||
4957 | @c LocalWords: anl gov mutex THREADSAFE struct leq conj lt hbox istride lg rt |
||
4958 | @c LocalWords: ostride subsubheading howmany idist odist exp IMG SRC gif IFFT |
||
4959 | @c LocalWords: frftw dist datatype datatype's nc noindent callback putc getc |
||
4960 | @c LocalWords: EOF Pentia PentiumPro's codelets CFLAGS cc Omax config org toc |
||
4961 | @c LocalWords: pentium preconfigured egcs alloca fomit SEC gettimeofday MIN |
||
4962 | @c LocalWords: picoseconds Caml gensrc genfft pauillac inria fr ocaml ftp sh |
||
4963 | @c LocalWords: caml DoD NDSEG DMR HPC SMPs Gflops automake libtool Leiserson |
||
4964 | @c LocalWords: Sterian Didier Miras Funda Ergun Reimer bfnn copyleft gpl |
||
4965 | @c LocalWords: printindex LocalWords |