Subversion Repositories shark

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 pj 1
\input texinfo   @c -*- texinfo -*-
2
@c % $Id: fftw.tex,v 1.1.1.1 2002-03-29 14:12:55 pj Exp $
3
@c %**start of header
4
@setfilename fftw.info
5
@settitle FFTW
6
@c %**end of header
7
 
8
@include version.texi
9
@setchapternewpage odd
10
@c define constant index (ct)
11
@defcodeindex ct
12
@syncodeindex ct fn
13
@syncodeindex vr fn
14
@syncodeindex pg fn
15
@syncodeindex tp fn
16
@c define foreign function index (ff)
17
@defcodeindex ff
18
@syncodeindex ff cp
19
@c define foreign constant index (fc)
20
@defcodeindex fc
21
@syncodeindex fc cp
22
@c define foreign program index (fp)
23
@defcodeindex fp
24
@syncodeindex fp cp
25
 
26
@ifinfo
27
This is the FFTW User's manual.
28
 
29
Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology
30
 
31
Permission is granted to make and distribute verbatim copies of this
32
manual provided the copyright notice and this permission notice are
33
preserved on all copies.
34
 
35
Permission is granted to copy and distribute modified versions of this
36
manual under the conditions for verbatim copying, provided that the
37
entire resulting derived work is distributed under the terms of a
38
permission notice identical to this one.
39
 
40
Permission is granted to copy and distribute translations of this manual
41
into another language, under the above conditions for modified versions,
42
except that this permission notice may be stated in a translation
43
approved by the Free Software Foundation.
44
@end ifinfo
45
 
46
@titlepage
47
@sp 10
48
@comment The title is printed in a large font.
49
@title{FFTW User's Manual}
50
@subtitle For version @value{VERSION}, @value{UPDATED}
51
@author{Matteo Frigo}
52
@author{Steven G. Johnson}
53
 
54
@c The following two commands start the copyright page.
55
@page
56
@vskip 0pt plus 1filll
57
Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology.
58
 
59
Permission is granted to make and distribute verbatim copies of this
60
manual provided the copyright notice and this permission notice are
61
preserved on all copies.
62
 
63
Permission is granted to copy and distribute modified versions of this
64
manual under the conditions for verbatim copying, provided that the
65
entire resulting derived work is distributed under the terms of a
66
permission notice identical to this one.
67
 
68
Permission is granted to copy and distribute translations of this manual
69
into another language, under the above conditions for modified versions,
70
except that this permission notice may be stated in a translation
71
approved by the Free Software Foundation.
72
@end titlepage
73
 
74
@node    Top, Introduction, (dir), (dir)
75
@ifinfo
76
@top FFTW User Manual
77
Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is a
78
collection of fast C routines to compute the discrete Fourier transform.
79
This manual documents FFTW version @value{VERSION}.
80
@end ifinfo
81
 
82
@menu
83
* Introduction::                
84
* Tutorial::                    
85
* FFTW Reference::              
86
* Parallel FFTW::              
87
* Calling FFTW from Fortran::  
88
* Installation and Customization::  
89
* Acknowledgments::            
90
* License and Copyright::      
91
* Concept Index::              
92
* Library Index::              
93
 
94
@detailmenu --- The Detailed Node Listing ---
95
 
96
Tutorial
97
 
98
* Complex One-dimensional Transforms Tutorial::  
99
* Complex Multi-dimensional Transforms Tutorial::  
100
* Real One-dimensional Transforms Tutorial::  
101
* Real Multi-dimensional Transforms Tutorial::  
102
* Multi-dimensional Array Format::  
103
* Words of Wisdom::            
104
 
105
Multi-dimensional Array Format
106
 
107
* Row-major Format::            
108
* Column-major Format::        
109
* Static Arrays in C::          
110
* Dynamic Arrays in C::        
111
* Dynamic Arrays in C-The Wrong Way::  
112
 
113
Words of Wisdom
114
 
115
* Caveats in Using Wisdom::     What you should worry about in using wisdom
116
* Importing and Exporting Wisdom::  I/O of wisdom to disk and other media
117
 
118
FFTW Reference
119
 
120
* Data Types::                  real, complex, and halfcomplex numbers
121
* One-dimensional Transforms Reference::  
122
* Multi-dimensional Transforms Reference::  
123
* Real One-dimensional Transforms Reference::  
124
* Real Multi-dimensional Transforms Reference::  
125
* Wisdom Reference::            
126
* Memory Allocator Reference::  
127
* Thread safety::              
128
 
129
One-dimensional Transforms Reference
130
 
131
* fftw_create_plan::            Plan Creation
132
* Discussion on Specific Plans::  
133
* fftw::                        Plan Execution
134
* fftw_destroy_plan::           Plan Destruction
135
* What FFTW Really Computes::   Definition of the DFT.
136
 
137
Multi-dimensional Transforms Reference
138
 
139
* fftwnd_create_plan::          Plan Creation
140
* fftwnd::                      Plan Execution
141
* fftwnd_destroy_plan::         Plan Destruction
142
* What FFTWND Really Computes::  
143
 
144
Real One-dimensional Transforms Reference
145
 
146
* rfftw_create_plan::           Plan Creation  
147
* rfftw::                       Plan Execution  
148
* rfftw_destroy_plan::          Plan Destruction
149
* What RFFTW Really Computes::  
150
 
151
Real Multi-dimensional Transforms Reference
152
 
153
* rfftwnd_create_plan::         Plan Creation
154
* rfftwnd::                     Plan Execution
155
* Array Dimensions for Real Multi-dimensional Transforms::  
156
* Strides in In-place RFFTWND::  
157
* rfftwnd_destroy_plan::        Plan Destruction
158
* What RFFTWND Really Computes::  
159
 
160
Wisdom Reference
161
 
162
* fftw_export_wisdom::          
163
* fftw_import_wisdom::          
164
* fftw_forget_wisdom::          
165
 
166
Parallel FFTW
167
 
168
* Multi-threaded FFTW::        
169
* MPI FFTW::                    
170
 
171
Multi-threaded FFTW
172
 
173
* Installation and Supported Hardware/Software::  
174
* Usage of Multi-threaded FFTW::  
175
* How Many Threads to Use?::    
176
* Using Multi-threaded FFTW in a Multi-threaded Program::  
177
* Tips for Optimal Threading::  
178
 
179
MPI FFTW
180
 
181
* MPI FFTW Installation::      
182
* Usage of MPI FFTW for Complex Multi-dimensional Transforms::  
183
* MPI Data Layout::            
184
* Usage of MPI FFTW for Real Multi-dimensional Transforms::  
185
* Usage of MPI FFTW for Complex One-dimensional Transforms::  
186
* MPI Tips::                    
187
 
188
Calling FFTW from Fortran
189
 
190
* Wrapper Routines::            
191
* FFTW Constants in Fortran::  
192
* Fortran Examples::            
193
 
194
Installation and Customization
195
 
196
* Installation on Unix::        
197
* Installation on non-Unix Systems::  
198
* Installing FFTW in both single and double precision::  
199
* gcc and Pentium/PentiumPro hacks::  
200
* Customizing the timer::      
201
* Generating your own code::    
202
 
203
@end detailmenu
204
@end menu
205
 
206
@c ************************************************************
207
@node    Introduction, Tutorial, Top, Top
208
@chapter Introduction
209
This manual documents version @value{VERSION} of FFTW, the @emph{Fastest
210
Fourier Transform in the West}.  FFTW is a comprehensive collection of
211
fast C routines for computing the discrete Fourier transform (DFT) in
212
one or more dimensions, of both real and complex data, and of arbitrary
213
input size.  FFTW also includes parallel transforms for both shared- and
214
distributed-memory systems.  We assume herein that the reader is already
215
familiar with the properties and uses of the DFT that are relevant to
216
her application.  Otherwise, see e.g. @cite{The Fast Fourier Transform}
217
by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974).
218
@uref{http://theory.lcs.mit.edu/~fftw, Our web page} also has links to
219
FFT-related information online.
220
@cindex FFTW
221
 
222
FFTW is usually faster (and sometimes much faster) than all other
223
freely-available Fourier transform programs found on the Net.  For
224
transforms whose size is a power of two, it compares favorably with the
225
FFT codes in Sun's Performance Library and IBM's ESSL library, which are
226
targeted at specific machines.  Moreover, FFTW's performance is
227
@emph{portable}.  Indeed, FFTW is unique in that it automatically adapts
228
itself to your machine, your cache, the size of your memory, the number
229
of registers, and all the other factors that normally make it impossible
230
to optimize a program for more than one machine.  An extensive
231
comparison of FFTW's performance with that of other Fourier transform
232
codes has been made. The results are available on the Web at
233
@uref{http://theory.lcs.mit.edu/~benchfft, the benchFFT home page}.
234
@cindex benchmark
235
@fpindex benchfft
236
 
237
In order to use FFTW effectively, you need to understand one basic
238
concept of FFTW's internal structure.  FFTW does not used a fixed
239
algorithm for computing the transform, but it can adapt the DFT
240
algorithm to details of the underlying hardware in order to achieve best
241
performance.  Hence, the computation of the transform is split into two
242
phases.  First, FFTW's @dfn{planner} is called, which ``learns'' the
243
@cindex plan
244
fastest way to compute the transform on your machine.  The planner
245
@cindex planner
246
produces a data structure called a @dfn{plan} that contains this
247
information.  Subsequently, the plan is passed to FFTW's @dfn{executor},
248
@cindex executor
249
along with an array of input data.  The executor computes the actual
250
transform, as dictated by the plan.  The plan can be reused as many
251
times as needed.  In typical high-performance applications, many
252
transforms of the same size are computed, and consequently a
253
relatively-expensive initialization of this sort is acceptable.  On the
254
other hand, if you need a single transform of a given size, the one-time
255
cost of the planner becomes significant.  For this case, FFTW provides
256
fast planners based on heuristics or on previously computed plans.
257
 
258
The pattern of planning/execution applies to all four operation modes of
259
FFTW, that is, @w{I) one-dimensional} complex transforms (FFTW), @w{II)
260
multi-dimensional} complex transforms (FFTWND), @w{III) one-dimensional}
261
transforms of real data (RFFTW), @w{IV) multi-dimensional} transforms of
262
real data (RFFTWND).  Each mode comes with its own planner and executor.
263
 
264
Besides the automatic performance adaptation performed by the planner,
265
it is also possible for advanced users to customize FFTW for their
266
special needs.  As distributed, FFTW works most efficiently for arrays
267
whose size can be factored into small primes (@math{2}, @math{3},
268
@math{5}, and @math{7}), and uses a slower general-purpose routine for
269
other factors.  FFTW, however, comes with a code generator that can
270
produce fast C programs for any particular array size you may care
271
about.
272
@cindex code generator
273
For example, if you need transforms of size
274
@ifinfo
275
@math{513 = 19 x 3^3},
276
@end ifinfo
277
@iftex
278
@tex
279
$513 = 19 \cdot 3^3$,
280
@end tex
281
@end iftex
282
@ifhtml
283
513&nbsp;=&nbsp;19*3<sup>3</sup>,
284
@end ifhtml
285
you can customize FFTW to support the factor @math{19} efficiently.
286
 
287
FFTW can exploit multiple processors if you have them.  FFTW comes with
288
a shared-memory implementation on top of POSIX (and similar) threads, as
289
well as a distributed-memory implementation based on MPI.
290
@cindex parallel transform
291
@cindex threads
292
@cindex MPI
293
We also provide an experimental parallel implementation written in Cilk,
294
@emph{the superior programming tool of choice for discriminating
295
hackers} (Olin Shivers).  (See @uref{http://supertech.lcs.mit.edu/cilk,
296
the Cilk home page}.)
297
@cindex Cilk
298
 
299
For more information regarding FFTW, see the paper, ``The Fastest
300
Fourier Transform in the West,'' by M. Frigo and S. G. Johnson, which is
301
the technical report MIT-LCS-TR-728 (Sep. '97).  See also, ``FFTW: An
302
Adaptive Software Architecture for the FFT,'' by M. Frigo and
303
S. G. Johnson, which appeared in the 23rd International Conference on
304
Acoustics, Speech, and Signal Processing (@cite{Proc. ICASSP 1998}
305
@b{3}, p. 1381).  The code generator is described in the paper ``A Fast
306
Fourier Transform Compiler'',
307
@cindex compiler
308
by M. Frigo, to appear in the @cite{Proceedings of the 1999 ACM SIGPLAN
309
Conference on Programming Language Design and Implementation (PLDI),
310
Atlanta, Georgia, May 1999}.  These papers, along with the latest
311
version of FFTW, the FAQ, benchmarks, and other links, are available at
312
@uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}.  The current
313
version of FFTW incorporates many good ideas from the past thirty years
314
of FFT literature.  In one way or another, FFTW uses the Cooley-Tukey
315
algorithm, the Prime Factor algorithm, Rader's algorithm for prime
316
sizes, and the split-radix algorithm (with a variation due to Dan
317
Bernstein).  Our code generator also produces new algorithms that we do
318
not yet completely understand.
319
@cindex algorithm
320
The reader is referred to the cited papers for the appropriate
321
references.
322
 
323
The rest of this manual is organized as follows.  We first discuss the
324
sequential (one-processor) implementation.  We start by describing the
325
basic features of FFTW in @ref{Tutorial}.  This discussion includes the
326
storage scheme of multi-dimensional arrays (@ref{Multi-dimensional Array
327
Format}) and FFTW's mechanisms for storing plans on disk (@ref{Words of
328
Wisdom}).  Next, @ref{FFTW Reference} provides comprehensive
329
documentation of all FFTW's features.  Parallel transforms are discussed
330
in their own chapter @ref{Parallel FFTW}.  Fortran programmers can also
331
use FFTW, as described in @ref{Calling FFTW from Fortran}.
332
@ref{Installation and Customization} explains how to install FFTW in
333
your computer system and how to adapt FFTW to your needs.  License and
334
copyright information is given in @ref{License and Copyright}.  Finally,
335
we thank all the people who helped us in @ref{Acknowledgments}.
336
 
337
@c ************************************************************
338
@node  Tutorial, FFTW Reference, Introduction, Top
339
@chapter Tutorial
340
@cindex Tutorial
341
This chapter describes the basic usage of FFTW, i.e., how to compute the
342
Fourier transform of a single array.  This chapter tells the truth, but
343
not the @emph{whole} truth. Specifically, FFTW implements additional
344
routines and flags, providing extra functionality, that are not
345
documented here.  @xref{FFTW Reference}, for more complete information.
346
(Note that you need to compile and install FFTW before you can use it in
347
a program.  @xref{Installation and Customization}, for the details of
348
the installation.)
349
 
350
This tutorial chapter is structured as follows.  @ref{Complex
351
One-dimensional Transforms Tutorial} describes the basic usage of the
352
one-dimensional transform of complex data.  @ref{Complex
353
Multi-dimensional Transforms Tutorial} describes the basic usage of the
354
multi-dimensional transform of complex data.  @ref{Real One-dimensional
355
Transforms Tutorial} describes the one-dimensional transform of real
356
data and its inverse.  Finally, @ref{Real Multi-dimensional Transforms
357
Tutorial} describes the multi-dimensional transform of real data and its
358
inverse.  We recommend that you read these sections in the order that
359
they are presented.  We then discuss two topics in detail.  In
360
@ref{Multi-dimensional Array Format}, we discuss the various
361
alternatives for storing multi-dimensional arrays in memory.  @ref{Words
362
of Wisdom} shows how you can save FFTW's plans for future use.
363
 
364
@menu
365
* Complex One-dimensional Transforms Tutorial::  
366
* Complex Multi-dimensional Transforms Tutorial::  
367
* Real One-dimensional Transforms Tutorial::  
368
* Real Multi-dimensional Transforms Tutorial::  
369
* Multi-dimensional Array Format::  
370
* Words of Wisdom::            
371
@end menu
372
 
373
@node  Complex One-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial, Tutorial
374
@section Complex One-dimensional Transforms Tutorial
375
@cindex complex one-dimensional transform
376
@cindex complex transform
377
 
378
The basic usage of FFTW is simple.  A typical call to FFTW looks like:
379
 
380
@example
381
#include <fftw.h>
382
...
383
@{
384
     fftw_complex in[N], out[N];
385
     fftw_plan p;
386
     ...
387
     p = fftw_create_plan(N, FFTW_FORWARD, FFTW_ESTIMATE);
388
     ...
389
     fftw_one(p, in, out);
390
     ...
391
     fftw_destroy_plan(p);  
392
@}
393
@end example
394
 
395
The first thing we do is to create a @dfn{plan}, which is an object
396
@cindex plan
397
that contains all the data that FFTW needs to compute the FFT, using the
398
following function:
399
 
400
@example
401
fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);
402
@end example
403
@findex fftw_create_plan
404
@findex fftw_direction
405
@tindex fftw_plan
406
 
407
The first argument, @code{n}, is the size of the transform you are
408
trying to compute.  The size @code{n} can be any positive integer, but
409
sizes that are products of small factors are transformed most
410
efficiently.  The second argument, @code{dir}, can be either
411
@code{FFTW_FORWARD} or @code{FFTW_BACKWARD}, and indicates the direction
412
of the transform you
413
@ctindex FFTW_FORWARD
414
@ctindex FFTW_BACKWARD
415
are interested in.  Alternatively, you can use the sign of the exponent
416
in the transform, @math{-1} or @math{+1}, which corresponds to
417
@code{FFTW_FORWARD} or @code{FFTW_BACKWARD} respectively.  The
418
@code{flags} argument is either @code{FFTW_MEASURE} or
419
@cindex flags
420
@code{FFTW_ESTIMATE}.  @code{FFTW_MEASURE} means that FFTW actually runs
421
@ctindex FFTW_MEASURE
422
and measures the execution time of several FFTs in order to find the
423
best way to compute the transform of size @code{n}.  This may take some
424
time, depending on your installation and on the precision of the timer
425
in your machine.  @code{FFTW_ESTIMATE}, on the contrary, does not run
426
any computation, and just builds a
427
@ctindex FFTW_ESTIMATE
428
reasonable plan, which may be sub-optimal.  In other words, if your
429
program performs many transforms of the same size and initialization
430
time is not important, use @code{FFTW_MEASURE}; otherwise use the
431
estimate.  (A compromise between these two extremes exists.  @xref{Words
432
of Wisdom}.)
433
 
434
Once the plan has been created, you can use it as many times as you like
435
for transforms on arrays of the same size.  When you are done with the
436
plan, you deallocate it by calling @code{fftw_destroy_plan(plan)}.
437
@findex fftw_destroy_plan
438
 
439
The transform itself is computed by passing the plan along with the
440
input and output arrays to @code{fftw_one}:
441
 
442
@example
443
void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);
444
@end example
445
@findex fftw_one
446
 
447
Note that the transform is out of place: @code{in} and @code{out} must
448
point to distinct arrays. It operates on data of type
449
@code{fftw_complex}, a data structure with real (@code{in[i].re}) and
450
imaginary (@code{in[i].im}) floating-point components.  The @code{in}
451
and @code{out} arrays should have the length specified when the plan was
452
created.  An alternative function, @code{fftw}, allows you to
453
efficiently perform multiple and/or strided transforms (@pxref{FFTW
454
Reference}).
455
@tindex fftw_complex
456
 
457
The DFT results are stored in-order in the array @code{out}, with the
458
zero-frequency (DC) component in @code{out[0]}.
459
@cindex frequency
460
The array @code{in} is not modified.  Users should note that FFTW
461
computes an unnormalized DFT, the sign of whose exponent is given by the
462
@code{dir} parameter of @code{fftw_create_plan}.  Thus, computing a
463
forward followed by a backward transform (or vice versa) results in the
464
original array scaled by @code{n}.  @xref{What FFTW Really Computes},
465
for the definition of DFT.
466
@cindex normalization
467
 
468
A program using FFTW should be linked with @code{-lfftw -lm} on Unix
469
systems, or with the FFTW and standard math libraries in general.
470
@cindex linking on Unix
471
 
472
@node Complex Multi-dimensional Transforms Tutorial, Real One-dimensional Transforms Tutorial, Complex One-dimensional Transforms Tutorial, Tutorial
473
@section Complex Multi-dimensional Transforms Tutorial
474
@cindex complex multi-dimensional transform
475
@cindex multi-dimensional transform
476
 
477
FFTW can also compute transforms of any number of dimensions
478
(@dfn{rank}).  The syntax is similar to that for the one-dimensional
479
@cindex rank
480
transforms, with @samp{fftw_} replaced by @samp{fftwnd_} (which stands
481
for ``@code{fftw} in @code{N} dimensions'').
482
 
483
As before, we @code{#include <fftw.h>} and create a plan for the
484
transforms, this time of type @code{fftwnd_plan}:
485
 
486
@example
487
fftwnd_plan fftwnd_create_plan(int rank, const int *n,
488
                               fftw_direction dir, int flags);
489
@end example
490
@tindex fftwnd_plan
491
@tindex fftw_direction
492
@findex fftwnd_create_plan
493
 
494
@code{rank} is the dimensionality of the array, and can be any
495
non-negative integer.  The next argument, @code{n}, is a pointer to an
496
integer array of length @code{rank} containing the (positive) sizes of
497
each dimension of the array.  (Note that the array will be stored in
498
row-major order. @xref{Multi-dimensional Array Format}, for information
499
on row-major order.)  The last two parameters are the same as in
500
@code{fftw_create_plan}.  We now, however, have an additional possible
501
flag, @code{FFTW_IN_PLACE}, since @code{fftwnd} supports true in-place
502
@cindex flags
503
@ctindex FFTW_IN_PLACE
504
@findex fftwnd
505
transforms.  Multiple flags are combined using a bitwise @dfn{or}
506
(@samp{|}).  (An @dfn{in-place} transform is one in which the output
507
data overwrite the input data.  It thus requires half as much memory
508
as---and is often faster than---its opposite, an @dfn{out-of-place}
509
transform.)
510
@cindex in-place transform
511
@cindex out-of-place transform
512
 
513
For two- and three-dimensional transforms, FFTWND provides alternative
514
routines that accept the sizes of each dimension directly, rather than
515
indirectly through a rank and an array of sizes.  These are otherwise
516
identical to @code{fftwnd_create_plan}, and are sometimes more
517
convenient:
518
 
519
@example
520
fftwnd_plan fftw2d_create_plan(int nx, int ny,
521
                               fftw_direction dir, int flags);
522
fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz,
523
                               fftw_direction dir, int flags);
524
@end example
525
@findex fftw2d_create_plan
526
@findex fftw3d_create_plan
527
 
528
Once the plan has been created, you can use it any number of times for
529
transforms of the same size.  When you do not need a plan anymore, you
530
can deallocate the plan by calling @code{fftwnd_destroy_plan(plan)}.
531
@findex fftwnd_destroy_plan
532
 
533
Given a plan, you can compute the transform of an array of data by
534
calling:
535
 
536
@example
537
void fftwnd_one(fftwnd_plan plan, fftw_complex *in, fftw_complex *out);
538
@end example
539
@findex fftwnd_one
540
 
541
Here, @code{in} and @code{out} point to multi-dimensional arrays in
542
row-major order, of the size specified when the plan was created.  In
543
the case of an in-place transform, the @code{out} parameter is ignored
544
and the output data are stored in the @code{in} array.  The results are
545
stored in-order, unnormalized, with the zero-frequency component in
546
@code{out[0]}.
547
@cindex frequency
548
A forward followed by a backward transform (or vice-versa) yields the
549
original data multiplied by the size of the array (i.e. the product of
550
the dimensions).  @xref{What FFTWND Really Computes}, for a discussion
551
of what FFTWND computes.
552
@cindex normalization
553
 
554
For example, code to perform an in-place FFT of a three-dimensional
555
array might look like:
556
 
557
@example
558
#include <fftw.h>
559
...
560
@{
561
     fftw_complex in[L][M][N];
562
     fftwnd_plan p;
563
     ...
564
     p = fftw3d_create_plan(L, M, N, FFTW_FORWARD,
565
                            FFTW_MEASURE | FFTW_IN_PLACE);
566
     ...
567
     fftwnd_one(p, &in[0][0][0], NULL);
568
     ...
569
     fftwnd_destroy_plan(p);  
570
@}
571
@end example
572
 
573
Note that @code{in} is a statically-declared array, which is
574
automatically in row-major order, but we must take the address of the
575
first element in order to fit the type expected by @code{fftwnd_one}.
576
(@xref{Multi-dimensional Array Format}.)
577
 
578
@node Real One-dimensional Transforms Tutorial, Real Multi-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial
579
@section Real One-dimensional Transforms Tutorial
580
@cindex real transform
581
@cindex complex to real transform
582
@cindex RFFTW
583
 
584
If the input data are purely real, you can save roughly a factor of two
585
in both time and storage by using the @dfn{rfftw} transforms, which are
586
FFTs specialized for real data.  The output of a such a transform is a
587
@dfn{halfcomplex} array, which consists of only half of the complex DFT
588
amplitudes (since the negative-frequency amplitudes for real data are
589
the complex conjugate of the positive-frequency amplitudes).
590
@cindex halfcomplex array
591
 
592
In exchange for these speed and space advantages, the user sacrifices
593
some of the simplicity of FFTW's complex transforms.  First of all, to
594
allow maximum performance, the output format of the one-dimensional real
595
transforms is different from that used by the multi-dimensional
596
transforms.  Second, the inverse transform (halfcomplex to real) has the
597
side-effect of destroying its input array.  Neither of these
598
inconveniences should pose a serious problem for users, but it is
599
important to be aware of them.  (Both the inconvenient output format
600
and the side-effect of the inverse transform can be ameliorated for
601
one-dimensional transforms, at the expense of some performance, by using
602
instead the multi-dimensional transform routines with a rank of one.)
603
 
604
The computation of the plan is similar to that for the complex
605
transforms.  First, you @code{#include <rfftw.h>}.  Then, you create a
606
plan (of type @code{rfftw_plan}) by calling:
607
 
608
@example
609
rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags);
610
@end example
611
@tindex rfftw_plan
612
@tindex fftw_direction
613
@findex rfftw_create_plan
614
 
615
@code{n} is the length of the @emph{real} array in the transform (even
616
for halfcomplex-to-real transforms), and can be any positive integer
617
(although sizes with small factors are transformed more efficiently).
618
@code{dir} is either @code{FFTW_REAL_TO_COMPLEX} or
619
@code{FFTW_COMPLEX_TO_REAL}.
620
@ctindex FFTW_REAL_TO_COMPLEX
621
@ctindex FFTW_COMPLEX_TO_REAL
622
The @code{flags} parameter is the same as in @code{fftw_create_plan}.
623
 
624
Once created, a plan can be used for any number of transforms, and is
625
deallocated when you are done with it by calling
626
@code{rfftw_destroy_plan(plan)}.
627
@findex rfftw_destroy_plan
628
 
629
Given a plan, a real-to-complex or complex-to-real transform is computed
630
by calling:
631
 
632
@example
633
void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);
634
@end example
635
@findex rfftw_one
636
 
637
(Note that @code{fftw_real} is an alias for the floating-point type for
638
which FFTW was compiled.)  Depending upon the direction of the plan,
639
either the input or the output array is halfcomplex, and is stored in
640
the following format:
641
@cindex halfcomplex array
642
 
643
@iftex
644
@tex
645
$$
646
r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
647
$$
648
@end tex
649
@end iftex
650
@ifinfo
651
r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
652
@end ifinfo
653
@ifhtml
654
<p align=center>
655
r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
656
</p>
657
@end ifhtml
658
 
659
Here,
660
@ifinfo
661
rk
662
@end ifinfo
663
@iftex
664
@tex
665
$r_k$
666
@end tex
667
@end iftex
668
@ifhtml
669
r<sub>k</sub>
670
@end ifhtml
671
is the real part of the @math{k}th output, and
672
@ifinfo
673
ik
674
@end ifinfo
675
@iftex
676
@tex
677
$i_k$
678
@end tex
679
@end iftex
680
@ifhtml
681
i<sub>k</sub>
682
@end ifhtml
683
is the imaginary part.  (We follow here the C convention that integer
684
division is rounded down, e.g. @math{7 / 2 = 3}.) For a halfcomplex
685
array @code{hc[]}, the @math{k}th component has its real part in
686
@code{hc[k]} and its imaginary part in @code{hc[n-k]}, with the
687
exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter only
688
if n is even)---in these two cases, the imaginary part is zero due to
689
symmetries of the real-complex transform, and is not stored.  Thus, the
690
transform of @code{n} real values is a halfcomplex array of length
691
@code{n}, and vice versa.  @footnote{The output for the
692
multi-dimensional rfftw is a more-conventional array of
693
@code{fftw_complex} values, but the format here permitted us greater
694
efficiency in one dimension.}  This is actually only half of the DFT
695
spectrum of the data.  Although the other half can be obtained by
696
complex conjugation, it is not required by many applications such as
697
convolution and filtering.
698
 
699
Like the complex transforms, the RFFTW transforms are unnormalized, so a
700
forward followed by a backward transform (or vice-versa) yields the
701
original data scaled by the length of the array, @code{n}.
702
@cindex normalization
703
 
704
Let us reiterate here our warning that an @code{FFTW_COMPLEX_TO_REAL}
705
transform has the side-effect of destroying its (halfcomplex) input.
706
The @code{FFTW_REAL_TO_COMPLEX} transform, however, leaves its (real)
707
input untouched, just as you would hope.
708
 
709
As an example, here is an outline of how you might use RFFTW to compute
710
the power spectrum of a real array (i.e. the squares of the absolute
711
values of the DFT amplitudes):
712
@cindex power spectrum
713
 
714
@example
715
#include <rfftw.h>
716
...
717
@{
718
     fftw_real in[N], out[N], power_spectrum[N/2+1];
719
     rfftw_plan p;
720
     int k;
721
     ...
722
     p = rfftw_create_plan(N, FFTW_REAL_TO_COMPLEX, FFTW_ESTIMATE);
723
     ...
724
     rfftw_one(p, in, out);
725
     power_spectrum[0] = out[0]*out[0];  /* DC component */
726
     for (k = 1; k < (N+1)/2; ++k)  /* (k < N/2 rounded up) */
727
          power_spectrum[k] = out[k]*out[k] + out[N-k]*out[N-k];
728
     if (N % 2 == 0) /* N is even */
729
          power_spectrum[N/2] = out[N/2]*out[N/2];  /* Nyquist freq. */
730
     ...
731
     rfftw_destroy_plan(p);
732
@}
733
@end example
734
 
735
Programs using RFFTW should link with @code{-lrfftw -lfftw -lm} on Unix,
736
or with the FFTW, RFFTW, and math libraries in general.
737
@cindex linking on Unix
738
 
739
@node Real Multi-dimensional Transforms Tutorial, Multi-dimensional Array Format, Real One-dimensional Transforms Tutorial, Tutorial
740
@section Real Multi-dimensional Transforms Tutorial
741
@cindex real multi-dimensional transform
742
 
743
FFTW includes multi-dimensional transforms for real data of any rank.
744
As with the one-dimensional real transforms, they save roughly a factor
745
of two in time and storage over complex transforms of the same size.
746
Also as in one dimension, these gains come at the expense of some
747
increase in complexity---the output format is different from the
748
one-dimensional RFFTW (and is more similar to that of the complex FFTW)
749
and the inverse (complex to real) transforms have the side-effect of
750
overwriting their input data.  
751
 
752
To use the real multi-dimensional transforms, you first @code{#include
753
<rfftw.h>} and then create a plan for the size and direction of
754
transform that you are interested in:
755
 
756
@example
757
rfftwnd_plan rfftwnd_create_plan(int rank, const int *n,
758
                                 fftw_direction dir, int flags);
759
@end example
760
@tindex rfftwnd_plan
761
@findex rfftwnd_create_plan
762
 
763
The first two parameters describe the size of the real data (not the
764
halfcomplex data, which will have different dimensions).  The last two
765
parameters are the same as those for @code{rfftw_create_plan}.  Just as
766
for fftwnd, there are two alternate versions of this routine,
767
@code{rfftw2d_create_plan} and @code{rfftw3d_create_plan}, that are
768
sometimes more convenient for two- and three-dimensional transforms.
769
@findex rfftw2d_create_plan
770
@findex rfftw3d_create_plan
771
Also as in fftwnd, rfftwnd supports true in-place transforms, specified
772
by including @code{FFTW_IN_PLACE} in the flags.
773
 
774
Once created, a plan can be used for any number of transforms, and is
775
deallocated by calling @code{rfftwnd_destroy_plan(plan)}.
776
 
777
Given a plan, the transform is computed by calling one of the following
778
two routines:
779
 
780
@example
781
void rfftwnd_one_real_to_complex(rfftwnd_plan plan,
782
                                 fftw_real *in, fftw_complex *out);
783
void rfftwnd_one_complex_to_real(rfftwnd_plan plan,
784
                                 fftw_complex *in, fftw_real *out);
785
@end example
786
@findex rfftwnd_one_real_to_complex
787
@findex rfftwnd_one_complex_to_real
788
 
789
As is clear from their names and parameter types, the former function is
790
for @code{FFTW_REAL_TO_COMPLEX} transforms and the latter is for
791
@code{FFTW_COMPLEX_TO_REAL} transforms.  (We could have used only a
792
single routine, since the direction of the transform is encoded in the
793
plan, but we wanted to correctly express the datatypes of the
794
parameters.)  The latter routine, as we discuss elsewhere, has the
795
side-effect of overwriting its input (except when the rank of the array
796
is one).  In both cases, the @code{out} parameter is ignored for
797
in-place transforms.
798
 
799
The format of the complex arrays deserves careful attention.
800
@cindex rfftwnd array format
801
Suppose that the real data has dimensions
802
@iftex
803
@tex
804
$n_1 \times n_2 \times \cdots \times n_d$
805
@end tex
806
@end iftex
807
@ifinfo
808
n1 x n2 x ... x nd
809
@end ifinfo
810
@ifhtml
811
n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub>
812
@end ifhtml
813
(in row-major order).  Then, after a real-to-complex transform, the
814
output is an
815
@iftex
816
@tex
817
$n_1 \times n_2 \times \cdots \times (n_d/2+1)$
818
@end tex
819
@end iftex
820
@ifinfo
821
n1 x n2 x ... x (nd/2+1)
822
@end ifinfo
823
@ifhtml
824
n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1)
825
@end ifhtml
826
array of @code{fftw_complex} values in row-major order, corresponding to
827
slightly over half of the output of the corresponding complex transform.
828
(Note that the division is rounded down.)  The ordering of the data is
829
otherwise exactly the same as in the complex case.  (In principle, the
830
output could be exactly half the size of the complex transform output,
831
but in more than one dimension this requires too complicated a format to
832
be practical.)  Note that, unlike the one-dimensional RFFTW, the real
833
and imaginary parts of the DFT amplitudes are here stored together in
834
the natural way.
835
 
836
Since the complex data is slightly larger than the real data, some
837
complications arise for in-place transforms.  In this case, the final
838
dimension of the real data must be padded with extra values to
839
accommodate the size of the complex data---two extra if the last
840
dimension is even and one if it is odd.
841
@cindex padding
842
That is, the last dimension of the real data must physically contain
843
@iftex
844
@tex
845
$2 (n_d/2+1)$
846
@end tex
847
@end iftex
848
@ifinfo
849
2 * (nd/2+1)
850
@end ifinfo
851
@ifhtml
852
2 * (n<sub>d</sub>/2+1)
853
@end ifhtml
854
@code{fftw_real} values (exactly enough to hold the complex data).
855
This physical array size does not, however, change the @emph{logical}
856
array size---only
857
@iftex
858
@tex
859
$n_d$
860
@end tex
861
@end iftex
862
@ifinfo
863
nd
864
@end ifinfo
865
@ifhtml
866
n<sub>d</sub>
867
@end ifhtml
868
values are actually stored in the last dimension, and
869
@iftex
870
@tex
871
$n_d$
872
@end tex
873
@end iftex
874
@ifinfo
875
nd
876
@end ifinfo
877
@ifhtml
878
n<sub>d</sub>
879
@end ifhtml
880
is the last dimension passed to @code{rfftwnd_create_plan}.
881
 
882
For example, consider the transform of a two-dimensional real array of
883
size @code{nx} by @code{ny}.  The output of the @code{rfftwnd} transform
884
is a two-dimensional real array of size @code{nx} by @code{ny/2+1},
885
where the @code{y} dimension has been cut nearly in half because of
886
redundancies in the output.  Because @code{fftw_complex} is twice the
887
size of @code{fftw_real}, the output array is slightly bigger than the
888
input array.  Thus, if we want to compute the transform in place, we
889
must @emph{pad} the input array so that it is of size @code{nx} by
890
@code{2*(ny/2+1)}.  If @code{ny} is even, then there are two padding
891
elements at the end of each row (which need not be initialized, as they
892
are only used for output).
893
@ifhtml
894
The following illustration depicts the input and output arrays just
895
described, for both the out-of-place and in-place transforms (with the
896
arrows indicating consecutive memory locations):
897
 
898
<p align=center><img src="rfftwnd.gif" width=389 height=583>
899
@end ifhtml
900
 
901
@iftex
902
Figure 1 depicts the input and output arrays just
903
described, for both the out-of-place and in-place transforms (with the
904
arrows indicating consecutive memory locations).
905
 
906
@tex
907
{
908
\pageinsert
909
\vfill
910
\vskip405pt
911
\hskip40pt
912
\special{psfile="rfftwnd.eps"
913
}
914
\vskip 24pt
915
Figure 1: Illustration of the data layout for real to complex
916
transforms.  
917
\vfill
918
\endinsert}
919
@end tex
920
@end iftex
921
 
922
The RFFTWND transforms are unnormalized, so a forward followed by a
923
backward transform will result in the original data scaled by the number
924
of real data elements---that is, the product of the (logical) dimensions
925
of the real data.
926
@cindex normalization
927
 
928
Below, we illustrate the use of RFFTWND by showing how you might use it
929
to compute the (cyclic) convolution of two-dimensional real arrays
930
@code{a} and @code{b} (using the identity that a convolution corresponds
931
to a pointwise product of the Fourier transforms).  For variety,
932
in-place transforms are used for the forward FFTs and an out-of-place
933
transform is used for the inverse transform.
934
@cindex convolution
935
@cindex cyclic convolution
936
 
937
@example
938
#include <rfftw.h>
939
...
940
@{
941
     fftw_real a[M][2*(N/2+1)], b[M][2*(N/2+1)], c[M][N];
942
     fftw_complex *A, *B, C[M][N/2+1];
943
     rfftwnd_plan p, pinv;
944
     fftw_real scale = 1.0 / (M * N);
945
     int i, j;
946
     ...
947
     p    = rfftw2d_create_plan(M, N, FFTW_REAL_TO_COMPLEX,
948
                                FFTW_ESTIMATE | FFTW_IN_PLACE);
949
     pinv = rfftw2d_create_plan(M, N, FFTW_COMPLEX_TO_REAL,
950
                                FFTW_ESTIMATE);
951
 
952
     /* aliases for accessing complex transform outputs: */
953
     A = (fftw_complex*) &a[0][0];
954
     B = (fftw_complex*) &b[0][0];
955
     ...
956
     for (i = 0; i < M; ++i)
957
          for (j = 0; j < N; ++j) @{
958
               a[i][j] = ... ;
959
               b[i][j] = ... ;
960
          @}
961
     ...
962
     rfftwnd_one_real_to_complex(p, &a[0][0], NULL);
963
     rfftwnd_one_real_to_complex(p, &b[0][0], NULL);
964
 
965
     for (i = 0; i < M; ++i)
966
          for (j = 0; j < N/2+1; ++j) @{
967
               int ij = i*(N/2+1) + j;
968
               C[i][j].re = (A[ij].re * B[ij].re
969
                             - A[ij].im * B[ij].im) * scale;
970
               C[i][j].im = (A[ij].re * B[ij].im
971
                             + A[ij].im * B[ij].re) * scale;
972
          @}
973
 
974
     /* inverse transform to get c, the convolution of a and b;
975
        this has the side effect of overwriting C */
976
     rfftwnd_one_complex_to_real(pinv, &C[0][0], &c[0][0]);
977
     ...
978
     rfftwnd_destroy_plan(p);
979
     rfftwnd_destroy_plan(pinv);
980
@}
981
@end example
982
 
983
We access the complex outputs of the in-place transforms by casting
984
each real array to a @code{fftw_complex} pointer.  Because this is a
985
``flat'' pointer, we have to compute the row-major index @code{ij}
986
explicitly in the convolution product loop.
987
@cindex row-major
988
In order to normalize the convolution, we must multiply by a scale
989
factor---we can do so either before or after the inverse transform, and
990
choose the former because it obviates the necessity of an additional
991
loop.
992
@cindex normalization
993
Notice the limits of the loops and the dimensions of the various arrays.
994
 
995
As with the one-dimensional RFFTW, an out-of-place
996
@code{FFTW_COMPLEX_TO_REAL} transform has the side-effect of overwriting
997
its input array.  (The real-to-complex transform, on the other hand,
998
leaves its input array untouched.)  If you use RFFTWND for a rank-one
999
transform, however, this side-effect does not occur.  Because of this
1000
fact (and the simpler output format), users may find the RFFTWND
1001
interface more convenient than RFFTW for one-dimensional transforms.
1002
However, RFFTWND in one dimension is slightly slower than RFFTW because
1003
RFFTWND uses an extra buffer array internally.
1004
 
1005
@c ------------------------------------------------------------
1006
@node Multi-dimensional Array Format, Words of Wisdom, Real Multi-dimensional Transforms Tutorial, Tutorial
1007
@section Multi-dimensional Array Format
1008
 
1009
This section describes the format in which multi-dimensional arrays are
1010
stored.  We felt that a detailed discussion of this topic was necessary,
1011
since it is often a source of confusion among users and several
1012
different formats are common.  Although the comments below refer to
1013
@code{fftwnd}, they are also applicable to the @code{rfftwnd} routines.
1014
 
1015
@menu
1016
* Row-major Format::            
1017
* Column-major Format::        
1018
* Static Arrays in C::          
1019
* Dynamic Arrays in C::        
1020
* Dynamic Arrays in C-The Wrong Way::  
1021
@end menu
1022
 
1023
@node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
1024
@subsection Row-major Format
1025
@cindex row-major
1026
 
1027
The multi-dimensional arrays passed to @code{fftwnd} are expected to be
1028
stored as a single contiguous block in @dfn{row-major} order (sometimes
1029
called ``C order'').  Basically, this means that as you step through
1030
adjacent memory locations, the first dimension's index varies most
1031
slowly and the last dimension's index varies most quickly.
1032
 
1033
To be more explicit, let us consider an array of rank @math{d} whose
1034
dimensions are
1035
@iftex
1036
@tex
1037
$n_1 \times n_2 \times n_3 \times \cdots \times n_d$.
1038
@end tex
1039
@end iftex
1040
@ifinfo
1041
n1 x n2 x n3 x ... x nd.
1042
@end ifinfo
1043
@ifhtml
1044
n<sub>1</sub> x n<sub>2</sub> x n<sub>3</sub> x ... x n<sub>d</sub>.
1045
@end ifhtml
1046
Now, we specify a location in the array by a sequence of (zero-based) indices,
1047
one for each dimension:
1048
@iftex
1049
@tex
1050
$(i_1, i_2, i_3, \ldots, i_d)$.
1051
@end tex
1052
@end iftex
1053
@ifinfo
1054
(i1, i2, ..., id).
1055
@end ifinfo
1056
@ifhtml
1057
(i<sub>1</sub>, i<sub>2</sub>, i<sub>3</sub>,..., i<sub>d</sub>).
1058
@end ifhtml
1059
If the array is stored in row-major
1060
order, then this element is located at the position
1061
@iftex
1062
@tex
1063
$i_d + n_d (i_{d-1} + n_{d-1} (\ldots + n_2 i_1))$.
1064
@end tex
1065
@end iftex
1066
@ifinfo
1067
id + nd * (id-1 + nd-1 * (... + n2 * i1)).
1068
@end ifinfo
1069
@ifhtml
1070
i<sub>d</sub> + n<sub>d</sub> * (i<sub>d-1</sub> + n<sub>d-1</sub> * (... + n<sub>2</sub> * i<sub>1</sub>)).
1071
@end ifhtml
1072
 
1073
Note that each element of the array must be of type @code{fftw_complex};
1074
i.e. a (real, imaginary) pair of (double-precision) numbers. Note also
1075
that, in @code{fftwnd}, the expression above is multiplied by the stride
1076
to get the actual array index---this is useful in situations where each
1077
element of the multi-dimensional array is actually a data structure or
1078
another array, and you just want to transform a single field. In most
1079
cases, however, you use a stride of 1.
1080
@cindex stride
1081
 
1082
@node Column-major Format, Static Arrays in C, Row-major Format, Multi-dimensional Array Format
1083
@subsection Column-major Format
1084
@cindex column-major
1085
 
1086
Readers from the Fortran world are used to arrays stored in
1087
@dfn{column-major} order (sometimes called ``Fortran order'').  This is
1088
essentially the exact opposite of row-major order in that, here, the
1089
@emph{first} dimension's index varies most quickly.
1090
 
1091
If you have an array stored in column-major order and wish to transform
1092
it using @code{fftwnd}, it is quite easy to do.  When creating the plan,
1093
simply pass the dimensions of the array to @code{fftwnd_create_plan} in
1094
@emph{reverse order}.  For example, if your array is a rank three
1095
@code{N x M x L} matrix in column-major order, you should pass the
1096
dimensions of the array as if it were an @code{L x M x N} matrix (which
1097
it is, from perspective of @code{fftwnd}).  This is done for you
1098
automatically by the FFTW Fortran wrapper routines (@pxref{Calling FFTW
1099
from Fortran}).
1100
@cindex Fortran-callable wrappers
1101
 
1102
@node Static Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
1103
@subsection Static Arrays in C
1104
@cindex C multi-dimensional arrays
1105
 
1106
Multi-dimensional arrays declared statically (that is, at compile time,
1107
not necessarily with the @code{static} keyword) in C are @emph{already}
1108
in row-major order.  You don't have to do anything special to transform
1109
them.  (@xref{Complex Multi-dimensional Transforms Tutorial}, for an
1110
example of this sort of code.)
1111
 
1112
@node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Static Arrays in C, Multi-dimensional Array Format
1113
@subsection Dynamic Arrays in C
1114
 
1115
Often, especially for large arrays, it is desirable to allocate the
1116
arrays dynamically, at runtime.  This isn't too hard to do, although it
1117
is not as straightforward for multi-dimensional arrays as it is for
1118
one-dimensional arrays.
1119
 
1120
Creating the array is simple: using a dynamic-allocation routine like
1121
@code{malloc}, allocate an array big enough to store N @code{fftw_complex}
1122
values, where N is the product of the sizes of the array dimensions
1123
(i.e. the total number of complex values in the array).  For example,
1124
here is code to allocate a 5x12x27 rank 3 array:
1125
@ffindex malloc
1126
 
1127
@example
1128
fftw_complex *an_array;
1129
 
1130
an_array = (fftw_complex *) malloc(5 * 12 * 27 * sizeof(fftw_complex));
1131
@end example
1132
 
1133
Accessing the array elements, however, is more tricky---you can't simply
1134
use multiple applications of the @samp{[]} operator like you could for
1135
static arrays.  Instead, you have to explicitly compute the offset into
1136
the array using the formula given earlier for row-major arrays.  For
1137
example, to reference the @math{(i,j,k)}-th element of the array
1138
allocated above, you would use the expression @code{an_array[k + 27 * (j
1139
+ 12 * i)]}.
1140
 
1141
This pain can be alleviated somewhat by defining appropriate macros, or,
1142
in C++, creating a class and overloading the @samp{()} operator.
1143
 
1144
@node Dynamic Arrays in C-The Wrong Way,  , Dynamic Arrays in C, Multi-dimensional Array Format
1145
@subsection Dynamic Arrays in C---The Wrong Way
1146
 
1147
A different method for allocating multi-dimensional arrays in C is often
1148
suggested that is incompatible with @code{fftwnd}: @emph{using it will
1149
cause FFTW to die a painful death}.  We discuss the technique here,
1150
however, because it is so commonly known and used.  This method is to
1151
create arrays of pointers of arrays of pointers of @dots{}etcetera.  For
1152
example, the analogue in this method to the example above is:
1153
 
1154
@example
1155
int i,j;
1156
fftw_complex ***a_bad_array;  /* another way to make a 5x12x27 array */
1157
 
1158
a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
1159
for (i = 0; i < 5; ++i) @{
1160
     a_bad_array[i] =
1161
        (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
1162
     for (j = 0; j < 12; ++j)
1163
          a_bad_array[i][j] =
1164
                (fftw_complex *) malloc(27 * sizeof(fftw_complex));
1165
@}
1166
@end example
1167
 
1168
As you can see, this sort of array is inconvenient to allocate (and
1169
deallocate).  On the other hand, it has the advantage that the
1170
@math{(i,j,k)}-th element can be referenced simply by
1171
@code{a_bad_array[i][j][k]}.
1172
 
1173
If you like this technique and want to maximize convenience in accessing
1174
the array, but still want to pass the array to FFTW, you can use a
1175
hybrid method.  Allocate the array as one contiguous block, but also
1176
declare an array of arrays of pointers that point to appropriate places
1177
in the block.  That sort of trick is beyond the scope of this
1178
documentation; for more information on multi-dimensional arrays in C,
1179
see the @code{comp.lang.c}
1180
@uref{http://www.eskimo.com/~scs/C-faq/s6.html, FAQ}.
1181
 
1182
@c ------------------------------------------------------------
1183
@node Words of Wisdom,  , Multi-dimensional Array Format, Tutorial
1184
@section Words of Wisdom
1185
@cindex wisdom
1186
@cindex saving plans to disk
1187
 
1188
FFTW implements a method for saving plans to disk and restoring them.
1189
In fact, what FFTW does is more general than just saving and loading
1190
plans.  The mechanism is called @dfn{@code{wisdom}}.  Here, we describe
1191
this feature at a high level. @xref{FFTW Reference}, for a less casual
1192
(but more complete) discussion of how to use @code{wisdom} in FFTW.
1193
 
1194
Plans created with the @code{FFTW_MEASURE} option produce near-optimal
1195
FFT performance, but it can take a long time to compute a plan because
1196
FFTW must actually measure the runtime of many possible plans and select
1197
the best one.  This is designed for the situations where so many
1198
transforms of the same size must be computed that the start-up time is
1199
irrelevant.  For short initialization times but slightly slower
1200
transforms, we have provided @code{FFTW_ESTIMATE}.  The @code{wisdom}
1201
mechanism is a way to get the best of both worlds.  There are, however,
1202
certain caveats that the user must be aware of in using @code{wisdom}.
1203
For this reason, @code{wisdom} is an optional feature which is not
1204
enabled by default.
1205
 
1206
At its simplest, @code{wisdom} provides a way of saving plans to disk so
1207
that they can be reused in other program runs.  You create a plan with
1208
the flags @code{FFTW_MEASURE} and @code{FFTW_USE_WISDOM}, and then save
1209
the @code{wisdom} using @code{fftw_export_wisdom}:
1210
@ctindex FFTW_USE_WISDOM
1211
 
1212
@example
1213
     plan = fftw_create_plan(..., ... | FFTW_MEASURE | FFTW_USE_WISDOM);
1214
     fftw_export_wisdom(...);
1215
@end example
1216
@findex fftw_export_wisdom
1217
 
1218
The next time you run the program, you can restore the @code{wisdom}
1219
with @code{fftw_import_wisdom}, and then recreate the plan using the
1220
same flags as before.  This time, however, the same optimal plan will be
1221
created very quickly without measurements. (FFTW still needs some time
1222
to compute trigonometric tables, however.)  The basic outline is:
1223
 
1224
@example
1225
     fftw_import_wisdom(...);
1226
     plan = fftw_create_plan(..., ... | FFTW_USE_WISDOM);
1227
@end example
1228
@findex fftw_import_wisdom
1229
 
1230
Wisdom is more than mere rote memorization, however.  FFTW's
1231
@code{wisdom} encompasses all of the knowledge and measurements that
1232
were used to create the plan for a given size.  Therefore, existing
1233
@code{wisdom} is also applied to the creation of other plans of
1234
different sizes.
1235
 
1236
Whenever a plan is created with the @code{FFTW_MEASURE} and
1237
@code{FFTW_USE_WISDOM} flags, @code{wisdom} is generated.  Thereafter,
1238
plans for any transform with a similar factorization will be computed
1239
more quickly, so long as they use the @code{FFTW_USE_WISDOM} flag.  In
1240
fact, for transforms with the same factors and of equal or lesser size,
1241
no measurements at all need to be made and an optimal plan can be
1242
created with negligible delay!
1243
 
1244
For example, suppose that you create a plan for
1245
@iftex
1246
@tex
1247
$N = 2^{16}$.
1248
@end tex
1249
@end iftex
1250
@ifinfo
1251
N = 2^16.
1252
@end ifinfo
1253
@ifhtml
1254
N&nbsp;=&nbsp;2<sup>16</sup>.
1255
@end ifhtml
1256
Then, for any equal or smaller power of two, FFTW can create a
1257
plan (with the same direction and flags) quickly, using the
1258
precomputed @code{wisdom}. Even for larger powers of two, or sizes that
1259
are a power of two times some other prime factors, plans will be
1260
computed more quickly than they would otherwise (although some
1261
measurements still have to be made).
1262
 
1263
The @code{wisdom} is cumulative, and is stored in a global, private data
1264
structure managed internally by FFTW.  The storage space required is
1265
minimal, proportional to the logarithm of the sizes the @code{wisdom} was
1266
generated from.  The @code{wisdom} can be forgotten (and its associated
1267
memory freed) by a call to @code{fftw_forget_wisdom()}; otherwise, it is
1268
remembered until the program terminates.  It can also be exported to a
1269
file, a string, or any other medium using @code{fftw_export_wisdom} and
1270
restored during a subsequent execution of the program (or a different
1271
program) using @code{fftw_import_wisdom} (these functions are described
1272
below).
1273
 
1274
Because @code{wisdom} is incorporated into FFTW at a very low level, the
1275
same @code{wisdom} can be used for one-dimensional transforms,
1276
multi-dimensional transforms, and even the parallel extensions to FFTW.
1277
Just include @code{FFTW_USE_WISDOM} in the flags for whatever plans you
1278
create (i.e., always plan wisely).
1279
 
1280
Plans created with the @code{FFTW_ESTIMATE} plan can use @code{wisdom},
1281
but cannot generate it;  only @code{FFTW_MEASURE} plans actually produce
1282
@code{wisdom}.  Also, plans can only use @code{wisdom} generated from
1283
plans created with the same direction and flags.  For example, a size
1284
@code{42} @code{FFTW_BACKWARD} transform will not use @code{wisdom}
1285
produced by a size @code{42} @code{FFTW_FORWARD} transform.  The only
1286
exception to this rule is that @code{FFTW_ESTIMATE} plans can use
1287
@code{wisdom} from @code{FFTW_MEASURE} plans.
1288
 
1289
@menu
1290
* Caveats in Using Wisdom::     What you should worry about in using wisdom
1291
* Importing and Exporting Wisdom::  I/O of wisdom to disk and other media
1292
@end menu
1293
 
1294
@node Caveats in Using Wisdom, Importing and Exporting Wisdom, Words of Wisdom, Words of Wisdom
1295
@subsection Caveats in Using Wisdom
1296
@cindex wisdom, problems with
1297
 
1298
@quotation
1299
@ifhtml
1300
<i>
1301
@end ifhtml
1302
For in much wisdom is much grief, and he that increaseth knowledge
1303
increaseth sorrow.
1304
@ifhtml
1305
</i>
1306
@end ifhtml
1307
[Ecclesiastes 1:18]
1308
@cindex Ecclesiastes
1309
@end quotation
1310
 
1311
There are pitfalls to using @code{wisdom}, in that it can negate FFTW's
1312
ability to adapt to changing hardware and other conditions. For example,
1313
it would be perfectly possible to export @code{wisdom} from a program
1314
running on one processor and import it into a program running on another
1315
processor.  Doing so, however, would mean that the second program would
1316
use plans optimized for the first processor, instead of the one it is
1317
running on.
1318
 
1319
It should be safe to reuse @code{wisdom} as long as the hardware and
1320
program binaries remain unchanged. (Actually, the optimal plan may
1321
change even between runs of the same binary on identical hardware, due
1322
to differences in the virtual memory environment, etcetera.  Users
1323
seriously interested in performance should worry about this problem,
1324
too.)  It is likely that, if the same @code{wisdom} is used for two
1325
different program binaries, even running on the same machine, the plans
1326
may be sub-optimal because of differing code alignments.  It is
1327
therefore wise to recreate @code{wisdom} every time an application is
1328
recompiled.  The more the underlying hardware and software changes
1329
between the creation of @code{wisdom} and its use, the greater grows the
1330
risk of sub-optimal plans.
1331
 
1332
@node Importing and Exporting Wisdom,  , Caveats in Using Wisdom, Words of Wisdom
1333
@subsection Importing and Exporting Wisdom
1334
@cindex wisdom, import and export
1335
 
1336
@example
1337
void fftw_export_wisdom_to_file(FILE *output_file);
1338
fftw_status fftw_import_wisdom_from_file(FILE *input_file);
1339
@end example
1340
@findex fftw_export_wisdom_to_file
1341
@findex fftw_import_wisdom_from_file
1342
 
1343
@code{fftw_export_wisdom_to_file} writes the @code{wisdom} to
1344
@code{output_file}, which must be a file open for
1345
writing. @code{fftw_import_wisdom_from_file} reads the @code{wisdom}
1346
from @code{input_file}, which must be a file open for reading, and
1347
returns @code{FFTW_SUCCESS} if successful and @code{FFTW_FAILURE}
1348
otherwise. In both cases, the file is left open and must be closed by
1349
the caller.  It is perfectly fine if other data lie before or after the
1350
@code{wisdom} in the file, as long as the file is positioned at the
1351
beginning of the @code{wisdom} data before import.
1352
 
1353
@example
1354
char *fftw_export_wisdom_to_string(void);
1355
fftw_status fftw_import_wisdom_from_string(const char *input_string)
1356
@end example
1357
@findex fftw_export_wisdom_to_string
1358
@findex fftw_import_wisdom_from_string
1359
 
1360
@code{fftw_export_wisdom_to_string} allocates a string, exports the
1361
@code{wisdom} to it in @code{NULL}-terminated format, and returns a
1362
pointer to the string.  If there is an error in allocating or writing
1363
the data, it returns @code{NULL}.  The caller is responsible for
1364
deallocating the string (with @code{fftw_free}) when she is done with
1365
it. @code{fftw_import_wisdom_from_string} imports the @code{wisdom} from
1366
@code{input_string}, returning @code{FFTW_SUCCESS} if successful and
1367
@code{FFTW_FAILURE} otherwise.
1368
 
1369
Exporting @code{wisdom} does not affect the store of @code{wisdom}. Imported
1370
@code{wisdom} supplements the current store rather than replacing it
1371
(except when there is conflicting @code{wisdom}, in which case the older
1372
@code{wisdom} is discarded). The format of the exported @code{wisdom} is
1373
``nerd-readable'' LISP-like ASCII text; we will not document it here
1374
except to note that it is insensitive to white space (interested users
1375
can contact us for more details).
1376
@cindex LISP
1377
@cindex nerd-readable text
1378
 
1379
@xref{FFTW Reference}, for more information, and for a description of
1380
how you can implement @code{wisdom} import/export for other media
1381
besides files and strings.
1382
 
1383
The following is a brief example in which the @code{wisdom} is read from
1384
a file, a plan is created (possibly generating more @code{wisdom}), and
1385
then the @code{wisdom} is exported to a string and printed to
1386
@code{stdout}.
1387
 
1388
@example
1389
@{
1390
     fftw_plan plan;
1391
     char *wisdom_string;
1392
     FILE *input_file;
1393
 
1394
     /* open file to read wisdom from */
1395
     input_file = fopen("sample.wisdom", "r");
1396
     if (FFTW_FAILURE == fftw_import_wisdom_from_file(input_file))
1397
          printf("Error reading wisdom!\n");
1398
     fclose(input_file); /* be sure to close the file! */
1399
 
1400
     /* create a plan for N=64, possibly creating and/or using wisdom */
1401
     plan = fftw_create_plan(64,FFTW_FORWARD,
1402
                             FFTW_MEASURE | FFTW_USE_WISDOM);
1403
 
1404
     /* ... do some computations with the plan ... */
1405
 
1406
     /* always destroy plans when you are done */
1407
     fftw_destroy_plan(plan);
1408
 
1409
     /* write the wisdom to a string */
1410
     wisdom_string = fftw_export_wisdom_to_string();
1411
     if (wisdom_string != NULL) @{
1412
          printf("Accumulated wisdom: %s\n",wisdom_string);
1413
 
1414
          /* Just for fun, destroy and restore the wisdom */
1415
          fftw_forget_wisdom(); /* all gone! */
1416
          fftw_import_wisdom_from_string(wisdom_string);
1417
          /* wisdom is back! */
1418
 
1419
          fftw_free(wisdom_string); /* deallocate it since we're done */
1420
     @}
1421
@}
1422
@end example
1423
 
1424
@c ************************************************************
1425
@node FFTW Reference, Parallel FFTW, Tutorial, Top
1426
@chapter FFTW Reference
1427
 
1428
This chapter provides a complete reference for all sequential (i.e.,
1429
one-processor) FFTW functions.  We first define the data types upon
1430
which FFTW operates, that is, real, complex, and ``halfcomplex'' numbers
1431
(@pxref{Data Types}).  Then, in four sections, we explain the FFTW
1432
program interface for complex one-dimensional transforms
1433
(@pxref{One-dimensional Transforms Reference}), complex
1434
multi-dimensional transforms (@pxref{Multi-dimensional Transforms
1435
Reference}), and real one-dimensional transforms (@pxref{Real
1436
One-dimensional Transforms Reference}), real multi-dimensional
1437
transforms (@pxref{Real Multi-dimensional Transforms Reference}).
1438
@ref{Wisdom Reference} describes the @code{wisdom} mechanism for
1439
exporting and importing plans.  Finally, @ref{Memory Allocator
1440
Reference} describes how to change FFTW's default memory allocator.
1441
For parallel transforms, @xref{Parallel FFTW}.
1442
 
1443
@menu
1444
* Data Types::                  real, complex, and halfcomplex numbers
1445
* One-dimensional Transforms Reference::  
1446
* Multi-dimensional Transforms Reference::  
1447
* Real One-dimensional Transforms Reference::  
1448
* Real Multi-dimensional Transforms Reference::  
1449
* Wisdom Reference::            
1450
* Memory Allocator Reference::  
1451
* Thread safety::              
1452
@end menu
1453
 
1454
@c -------------------------------------------------------
1455
@node Data Types, One-dimensional Transforms Reference, FFTW Reference, FFTW Reference
1456
@section Data Types
1457
@cindex real number
1458
@cindex complex number
1459
@cindex halfcomplex array
1460
 
1461
The routines in the FFTW package use three main kinds of data types.
1462
@dfn{Real} and @dfn{complex} numbers should be already known to the
1463
reader.  We also use the term @dfn{halfcomplex} to describe complex
1464
arrays in a special packed format used by the one-dimensional real
1465
transforms (taking advantage of the @dfn{hermitian} symmetry that arises
1466
in those cases).
1467
 
1468
By including @code{<fftw.h>} or @code{<rfftw.h>}, you will have access
1469
to the following definitions:
1470
 
1471
@example
1472
typedef double fftw_real;
1473
 
1474
typedef struct @{
1475
     fftw_real re, im;
1476
@} fftw_complex;
1477
 
1478
#define c_re(c)  ((c).re)
1479
#define c_im(c)  ((c).im)
1480
@end example
1481
@tindex fftw_real
1482
@tindex fftw_complex
1483
 
1484
All FFTW operations are performed on the @code{fftw_real} and
1485
@code{fftw_complex} data types.  For @code{fftw_complex} numbers, the
1486
two macros @code{c_re} and @code{c_im} retrieve, respectively, the real
1487
and imaginary parts of the number.
1488
 
1489
A @dfn{real array} is an array of real numbers.  A @dfn{complex array}
1490
is an array of complex numbers.  A one-dimensional array @math{X} of
1491
@math{n} complex numbers is @dfn{hermitian} if the following property
1492
holds:
1493
@iftex
1494
@tex
1495
for all $0 \leq i < n$, we have $X_i = X^{*}_{n-i}$, where
1496
$x^*$ denotes the complex conjugate of $x$.
1497
@end tex
1498
@end iftex
1499
@ifinfo
1500
for all @math{0 <= i < n}, we have @math{X[i] = conj(X[n-i])}.
1501
@end ifinfo
1502
@ifhtml
1503
for all 0 &lt;= i &lt; n, we have X<sub>i</sub> = conj(X<sub>n-i</sub>)}.
1504
@end ifhtml
1505
Hermitian arrays are relevant to FFTW because the Fourier transform of a
1506
real array is hermitian.
1507
 
1508
Because of its symmetry, a hermitian array can be stored in half the
1509
space of a complex array of the same size.  FFTW's one-dimensional real
1510
transforms store hermitian arrays as @dfn{halfcomplex} arrays.  A
1511
halfcomplex array of size @math{n} is
1512
@cindex hermitian array
1513
a one-dimensional array of @math{n} @code{fftw_real} numbers.  A
1514
hermitian array @math{X} in stored into a halfcomplex array @math{Y} as
1515
follows.
1516
@iftex
1517
@tex
1518
For all integers $i$ such that $0 \leq i \leq n / 2$, we have $Y_i :=
1519
\hbox{Re}(X_i)$.  For all integers $i$ such that $0 < i < n / 2$,
1520
we have $Y_{n - i} := \hbox{Im}(X_i)$.
1521
@end tex
1522
@end iftex
1523
@ifinfo
1524
For all integers @math{i} such that @math{0 <= i <= n / 2}, we have
1525
@math{Y[i] = Re(X[i])}.  For all integers @math{i} such that @math{0 <
1526
i < n / 2}, we have @math{Y[n-i] = Im(X[i])}.
1527
@end ifinfo
1528
@ifhtml
1529
For all integers i such that 0 &lt;= i &lt;= n / 2, we have
1530
Y<sub>i</sub> = Re(X<sub>i</sub>).  For all integers i such that 0
1531
&lt; i &lt; n / 2, we have Y<sub>n-i</sub> = Im(X<sub>i</sub>).
1532
@end ifhtml
1533
 
1534
We now illustrate halfcomplex storage for @math{n = 4} and @math{n = 5},
1535
since the scheme depends on the parity of @math{n}.  Let @math{n = 4}.
1536
In this case, we have
1537
@iftex
1538
@tex
1539
$Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$,
1540
$Y_2 := \hbox{Re}(X_2)$, and $Y_3 := \hbox{Im}(X_1)$.
1541
@end tex
1542
@end iftex
1543
@ifinfo
1544
@math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])},
1545
@math{Y[2] = Re(X[2])}, and  @math{Y[3] = Im(X[1])}.
1546
@end ifinfo
1547
@ifhtml
1548
Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>),
1549
Y<sub>2</sub> = Re(X<sub>2</sub>), and  Y<sub>3</sub> = Im(X<sub>1</sub>).
1550
@end ifhtml
1551
Let now @math{n = 5}.  In this case, we have
1552
@iftex
1553
@tex
1554
$Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$,
1555
$Y_2 := \hbox{Re}(X_2)$, $Y_3 := \hbox{Im}(X_2)$, and
1556
$Y_4 := \hbox{Im}(X_1)$.
1557
@end tex
1558
@end iftex
1559
@ifinfo
1560
@math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])},
1561
@math{Y[2] = Re(X[2])}, @math{Y[3] = Im(X[2])}, and
1562
@math{Y[4] = Im(X[1])}.
1563
@end ifinfo
1564
@ifhtml
1565
Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>),
1566
Y<sub>2</sub> = Re(X<sub>2</sub>), Y<sub>3</sub> = Im(X<sub>2</sub>),
1567
and Y<sub>4</sub> = Im(X<sub>1</sub>).
1568
@end ifhtml
1569
 
1570
@cindex floating-point precision
1571
By default, the type @code{fftw_real} equals the C type @code{double}.
1572
To work in single precision rather than double precision, @code{#define}
1573
the symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} and then recompile
1574
the library.  On Unix systems, you can instead use @code{configure
1575
--enable-float} at installation time (@pxref{Installation and
1576
Customization}).
1577
@fpindex configure
1578
@ctindex FFTW_ENABLE_FLOAT
1579
 
1580
In version 1 of FFTW, the data types were called @code{FFTW_REAL} and
1581
@code{FFTW_COMPLEX}.  We changed the capitalization for consistency with
1582
the rest of FFTW's conventions.  The old names are still supported, but
1583
their use is deprecated.
1584
@tindex FFTW_REAL
1585
@tindex FFTW_COMPLEX
1586
 
1587
@c -------------------------------------------------------
1588
@node One-dimensional Transforms Reference, Multi-dimensional Transforms Reference, Data Types, FFTW Reference
1589
@section One-dimensional Transforms Reference
1590
 
1591
The one-dimensional complex routines are generally prefixed with
1592
@code{fftw_}.  Programs using FFTW should be linked with @code{-lfftw
1593
-lm} on Unix systems, or with the FFTW and standard math libraries in
1594
general.
1595
 
1596
@menu
1597
* fftw_create_plan::            Plan Creation
1598
* Discussion on Specific Plans::  
1599
* fftw::                        Plan Execution
1600
* fftw_destroy_plan::           Plan Destruction
1601
* What FFTW Really Computes::   Definition of the DFT.
1602
@end menu
1603
 
1604
@node    fftw_create_plan, Discussion on Specific Plans, One-dimensional Transforms Reference, One-dimensional Transforms Reference
1605
@subsection Plan Creation for One-dimensional Transforms
1606
 
1607
@example
1608
#include <fftw.h>
1609
 
1610
fftw_plan fftw_create_plan(int n, fftw_direction dir,
1611
                           int flags);
1612
 
1613
fftw_plan fftw_create_plan_specific(int n, fftw_direction dir,
1614
                                    int flags,
1615
                                    fftw_complex *in, int istride,
1616
                                    fftw_complex *out, int ostride);
1617
@end example
1618
@tindex fftw_plan
1619
@tindex fftw_direction
1620
@findex fftw_create_plan
1621
@findex fftw_create_plan_specific
1622
 
1623
The function @code{fftw_create_plan} creates a plan, which is
1624
a data structure containing all the information that @code{fftw}
1625
needs in order to compute the 1D Fourier transform. You can
1626
create as many plans as you need, but only one plan for a given
1627
array size is required (a plan can be reused many times).
1628
 
1629
@code{fftw_create_plan} returns a valid plan, or @code{NULL}
1630
if, for some reason, the plan can't be created.  In the
1631
default installation, this cannot happen, but it is possible
1632
to configure FFTW in such a way that some input sizes are
1633
forbidden, and FFTW cannot create a plan.
1634
 
1635
The @code{fftw_create_plan_specific} variant takes as additional
1636
arguments specific input/output arrays and their strides.  For the last
1637
four arguments, you should pass the arrays and strides that you will
1638
eventually be passing to @code{fftw}.  The resulting plans will be
1639
optimized for those arrays and strides, although they may be used on
1640
other arrays as well.  Note: the contents of the in and out arrays are
1641
@emph{destroyed} by the specific planner (the initial contents are
1642
ignored, so the arrays need not have been initialized).
1643
 
1644
@subsubheading Arguments
1645
@itemize @bullet
1646
@item
1647
@code{n} is the size of the transform.  It can be
1648
 any positive integer.
1649
 
1650
@itemize @minus
1651
@item
1652
FFTW is best at handling sizes of the form
1653
@ifinfo
1654
@math{2^a 3^b 5^c 7^d 11^e 13^f},
1655
@end ifinfo
1656
@iftex
1657
@tex
1658
$2^a 3^b 5^c 7^d 11^e 13^f$,
1659
@end tex
1660
@end iftex
1661
@ifhtml
1662
2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
1663
        11<SUP>e</SUP> 13<SUP>f</SUP>,
1664
@end ifhtml
1665
where @math{e+f} is either @math{0} or
1666
@math{1}, and the other exponents are arbitrary.  Other sizes are
1667
computed by means of a slow, general-purpose routine (which nevertheless
1668
retains
1669
@iftex
1670
@tex
1671
$O(n \log n)$
1672
@end tex
1673
@end iftex
1674
@ifinfo
1675
O(n lg n)
1676
@end ifinfo
1677
@ifhtml
1678
O(n lg n)
1679
@end ifhtml
1680
performance, even for prime sizes).  (It is
1681
possible to customize FFTW for different array sizes.
1682
@xref{Installation and Customization} for more information.)  Transforms
1683
whose sizes are powers of @math{2} are especially fast.
1684
@end itemize
1685
 
1686
@item
1687
@code{dir} is the sign of the exponent in the formula that
1688
defines the Fourier transform.  It can be @math{-1} or @math{+1}.
1689
The aliases @code{FFTW_FORWARD} and @code{FFTW_BACKWARD}
1690
are provided, where @code{FFTW_FORWARD} stands for @math{-1}.
1691
 
1692
@item
1693
@cindex flags
1694
@code{flags} is a boolean OR (@samp{|}) of zero or more of the following:
1695
@itemize @minus
1696
@item
1697
@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
1698
actually @emph{computing} several FFTs and measuring their
1699
execution time.  Depending on the installation, this can take some
1700
time. @footnote{The basic problem is the resolution of the clock:
1701
FFTW needs to run for a certain time for the clock to be reliable.}
1702
 
1703
@item
1704
@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
1705
plan (for a RISC processor with many registers).  If neither
1706
@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
1707
@code{FFTW_ESTIMATE}.
1708
 
1709
@item
1710
@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input and
1711
output arrays will be distinct (this is the default).
1712
@ctindex FFTW_OUT_OF_PLACE
1713
 
1714
@item
1715
@cindex in-place transform
1716
@code{FFTW_IN_PLACE}: produce a plan assuming that you want the output
1717
in the input array.  The algorithm used is not necessarily in place:
1718
FFTW is able to compute true in-place transforms only for small values
1719
of @code{n}.  If FFTW is not able to compute the transform in-place, it
1720
will allocate a temporary array (unless you provide one yourself),
1721
compute the transform out of place, and copy the result back.
1722
@emph{Warning: This option changes the meaning of some parameters of
1723
@code{fftw}} (@pxref{fftw,,Computing the One-dimensional Transform}).
1724
 
1725
The in-place option is mainly provided for people who want to write
1726
their own in-place multi-dimensional Fourier transform, using FFTW as a
1727
base.  For example, consider a three-dimensional @code{n * n * n}
1728
transform.  An out-of-place algorithm will need another array (which may
1729
be huge).  However, FFTW can compute the in-place transform along
1730
each dimension using only a temporary array of size @code{n}.
1731
Moreover, if FFTW happens to be able to compute the transform truly
1732
in-place, no temporary array and no copying are needed.  As distributed,
1733
FFTW `knows' how to compute in-place transforms of size 1, 2, 3, 4, 5, 6,
1734
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32 and 64.
1735
 
1736
The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
1737
 
1738
@item
1739
@cindex wisdom
1740
@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
1741
in the creation of the plan. (@xref{Words of Wisdom}.)
1742
This can greatly speed the creation of plans, especially with the
1743
@code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take
1744
advantage of @code{wisdom} to produce a more optimal plan (based on past
1745
measurements) than the estimation heuristic would normally
1746
generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
1747
will also be generated if the current transform size is not completely
1748
understood by existing @code{wisdom}.
1749
 
1750
@end itemize
1751
 
1752
@item
1753
@code{in}, @code{out}, @code{istride}, @code{ostride} (only for
1754
@code{fftw_create_plan_specific}): see corresponding arguments in the
1755
description of @code{fftw}.  (@xref{fftw,,Computing the One-dimensional
1756
Transform}.)  In particular, the @code{out} and @code{ostride}
1757
parameters have the same special meaning for @code{FFTW_IN_PLACE}
1758
transforms as they have for @code{fftw}.
1759
 
1760
@end itemize
1761
 
1762
@node Discussion on Specific Plans, fftw, fftw_create_plan, One-dimensional Transforms Reference
1763
@subsection Discussion on Specific Plans
1764
@cindex specific planner
1765
We recommend the use of the specific planners, even in cases where you
1766
will be transforming arrays different from those passed to the specific
1767
planners, as they confer the following advantages:
1768
 
1769
@itemize @bullet
1770
 
1771
@item
1772
The resulting plans will be optimized for your specific arrays and
1773
strides.  This may or may not make a significant difference, but it
1774
certainly doesn't hurt.  (The ordinary planner does its planning based
1775
upon a stride-one temporary array that it allocates.)
1776
 
1777
@item
1778
Less intermediate storage is required during the planning process.  (The
1779
ordinary planner uses O(@code{N}) temporary storage, where @code{N} is
1780
the maximum dimension, while it is creating the plan.)
1781
 
1782
@item
1783
For multi-dimensional transforms, new parameters become accessible for
1784
optimization by the planner.  (Since multi-dimensional arrays can be
1785
very large, we don't dare to allocate one in the ordinary planner for
1786
experimentation.  This prevents us from doing certain optimizations
1787
that can yield dramatic improvements in some cases.)
1788
 
1789
@end itemize
1790
 
1791
On the other hand, note that @emph{the specific planner destroys the
1792
contents of the @code{in} and @code{out} arrays}.
1793
 
1794
@node    fftw, fftw_destroy_plan, Discussion on Specific Plans, One-dimensional Transforms Reference
1795
@subsection Computing the One-dimensional Transform
1796
 
1797
@example
1798
#include <fftw.h>
1799
 
1800
void fftw(fftw_plan plan, int howmany,
1801
          fftw_complex *in, int istride, int idist,
1802
          fftw_complex *out, int ostride, int odist);
1803
 
1804
void fftw_one(fftw_plan plan, fftw_complex *in,
1805
          fftw_complex *out);
1806
@end example
1807
@findex fftw
1808
@findex fftw_one
1809
 
1810
The function @code{fftw} computes the one-dimensional Fourier transform,
1811
using a plan created by @code{fftw_create_plan} (@xref{fftw_create_plan,
1812
, Plan Creation for One-dimensional Transforms}.)  The function
1813
@code{fftw_one} provides a simplified interface for the common case of
1814
single input array of stride 1.
1815
@cindex stride
1816
 
1817
@subsubheading Arguments
1818
@itemize @bullet
1819
@item
1820
@code{plan} is the plan created by @code{fftw_create_plan}
1821
(@pxref{fftw_create_plan,,Plan Creation for One-dimensional Transforms}).
1822
 
1823
@item
1824
@code{howmany} is the number of transforms @code{fftw} will compute.
1825
It is faster to tell FFTW to compute many transforms, instead of
1826
simply calling @code{fftw} many times.
1827
 
1828
@item
1829
@code{in}, @code{istride} and @code{idist} describe the input array(s).
1830
There are @code{howmany} input arrays; the first one is pointed to by
1831
@code{in}, the second one is pointed to by @code{in + idist}, and so on,
1832
up to @code{in + (howmany - 1) * idist}.  Each input array consists of
1833
complex numbers (@pxref{Data Types}), which are not necessarily
1834
contiguous in memory.  Specifically, @code{in[0]} is the first element
1835
of the first array, @code{in[istride]} is the second element of the
1836
first array, and so on.  In general, the @code{i}-th element of the
1837
@code{j}-th input array will be in position @code{in[i * istride + j *
1838
idist]}.
1839
 
1840
@item
1841
@code{out}, @code{ostride} and @code{odist} describe the output
1842
array(s).  The format is the same as for the input array.
1843
 
1844
@itemize @minus
1845
@item @emph{In-place transforms}:
1846
@cindex in-place transform
1847
If the @code{plan} specifies an in-place transform, @code{ostride} and
1848
@code{odist} are always ignored.  If @code{out} is @code{NULL},
1849
@code{out} is ignored, too.  Otherwise, @code{out} is interpreted as a
1850
pointer to an array of @code{n} complex numbers, that FFTW will use as
1851
temporary space to perform the in-place computation.  @code{out} is used
1852
as scratch space and its contents destroyed.  In this case, @code{out}
1853
must be an ordinary array whose elements are contiguous in memory (no
1854
striding).
1855
@end itemize
1856
 
1857
@end itemize
1858
 
1859
The function @code{fftw_one} transforms a single, contiguous input array
1860
to a contiguous output array.  By definition, the call
1861
@example
1862
fftw_one(plan, in, out)
1863
@end example
1864
is equivalent to
1865
@example
1866
fftw(plan, 1, in, 1, 1, out, 1, 1)
1867
@end example
1868
 
1869
@node    fftw_destroy_plan, What FFTW Really Computes, fftw, One-dimensional Transforms Reference
1870
@subsection Destroying a One-dimensional Plan
1871
 
1872
@example
1873
#include <fftw.h>
1874
 
1875
void fftw_destroy_plan(fftw_plan plan);
1876
@end example
1877
@tindex fftw_destroy_plan
1878
 
1879
The function @code{fftw_destroy_plan} frees the plan @code{plan} and
1880
releases all the memory associated with it.  After destruction, a plan
1881
is no longer valid.
1882
 
1883
@node What FFTW Really Computes,  , fftw_destroy_plan, One-dimensional Transforms Reference
1884
@subsection What FFTW Really Computes
1885
@cindex Discrete Fourier Transform
1886
In this section, we define precisely what FFTW computes.  Please be
1887
warned that different authors and software packages might employ
1888
different conventions than FFTW does.
1889
 
1890
The forward transform of a complex array @math{X} of size
1891
@math{n} computes an array @math{Y}, where
1892
@iftex
1893
@tex
1894
$$
1895
Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n} \ .
1896
$$
1897
@end tex
1898
@end iftex
1899
@ifinfo
1900
@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n) .
1901
@end ifinfo
1902
@ifhtml
1903
<center><IMG SRC="equation-1.gif" ALIGN="top"></center>
1904
@end ifhtml
1905
 
1906
The backward transform computes
1907
@iftex
1908
@tex
1909
$$
1910
Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n} \ .
1911
$$
1912
@end tex
1913
@end iftex
1914
@ifinfo
1915
@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n) .
1916
@end ifinfo
1917
@ifhtml
1918
<center><IMG SRC="equation-2.gif" ALIGN="top"></center>
1919
@end ifhtml
1920
 
1921
@cindex normalization
1922
FFTW computes an unnormalized transform, that is, the equation
1923
@math{IFFT(FFT(X)) = n X} holds.  In other words, applying the forward
1924
and then the backward transform will multiply the input by @math{n}.
1925
 
1926
@cindex frequency
1927
An @code{FFTW_FORWARD} transform corresponds to a sign of @math{-1} in
1928
the exponent of the DFT.  Note also that we use the standard
1929
``in-order'' output ordering---the @math{k}-th output corresponds to the
1930
frequency @math{k/n} (or @math{k/T}, where @math{T} is your total
1931
sampling period).  For those who like to think in terms of positive and
1932
negative frequencies, this means that the positive frequencies are
1933
stored in the first half of the output and the negative frequencies are
1934
stored in backwards order in the second half of the output.  (The
1935
frequency @math{-k/n} is the same as the frequency @math{(n-k)/n}.)
1936
 
1937
@c -------------------------------------------------------
1938
@node Multi-dimensional Transforms Reference, Real One-dimensional Transforms Reference, One-dimensional Transforms Reference, FFTW Reference
1939
@section Multi-dimensional Transforms Reference
1940
@cindex complex multi-dimensional transform
1941
@cindex multi-dimensional transform
1942
The multi-dimensional complex routines are generally prefixed with
1943
@code{fftwnd_}.  Programs using FFTWND should be linked with @code{-lfftw
1944
-lm} on Unix systems, or with the FFTW and standard math libraries in
1945
general.
1946
@cindex FFTWND
1947
 
1948
@menu
1949
* fftwnd_create_plan::          Plan Creation
1950
* fftwnd::                      Plan Execution
1951
* fftwnd_destroy_plan::         Plan Destruction
1952
* What FFTWND Really Computes::  
1953
@end menu
1954
 
1955
@node   fftwnd_create_plan, fftwnd, Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference
1956
@subsection Plan Creation for Multi-dimensional Transforms
1957
 
1958
@example
1959
#include <fftw.h>
1960
 
1961
fftwnd_plan fftwnd_create_plan(int rank, const int *n,
1962
                               fftw_direction dir, int flags);
1963
 
1964
fftwnd_plan fftw2d_create_plan(int nx, int ny,
1965
                               fftw_direction dir, int flags);
1966
 
1967
fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz,
1968
                               fftw_direction dir, int flags);
1969
 
1970
fftwnd_plan fftwnd_create_plan_specific(int rank, const int *n,
1971
                                        fftw_direction dir,
1972
                                        int flags,
1973
                                        fftw_complex *in, int istride,
1974
                                        fftw_complex *out, int ostride);
1975
 
1976
fftwnd_plan fftw2d_create_plan_specific(int nx, int ny,
1977
                                        fftw_direction dir,
1978
                                        int flags,
1979
                                        fftw_complex *in, int istride,
1980
                                        fftw_complex *out, int ostride);
1981
 
1982
fftwnd_plan fftw3d_create_plan_specific(int nx, int ny, int nz,
1983
                                        fftw_direction dir, int flags,
1984
                                        fftw_complex *in, int istride,
1985
                                        fftw_complex *out, int ostride);
1986
@end example
1987
@tindex fftwnd_plan
1988
@tindex fftw_direction
1989
@findex fftwnd_create_plan
1990
@findex fftw2d_create_plan
1991
@findex fftw3d_create_plan
1992
@findex fftwnd_create_plan_specific
1993
@findex fftw2d_create_plan_specific
1994
@findex fftw3d_create_plan_specific
1995
 
1996
The function @code{fftwnd_create_plan} creates a plan, which is a data
1997
structure containing all the information that @code{fftwnd} needs in
1998
order to compute a multi-dimensional Fourier transform.  You can create
1999
as many plans as you need, but only one plan for a given array size is
2000
required (a plan can be reused many times).  The functions
2001
@code{fftw2d_create_plan} and @code{fftw3d_create_plan} are optional,
2002
alternative interfaces to @code{fftwnd_create_plan} for two and three
2003
dimensions, respectively.
2004
 
2005
@code{fftwnd_create_plan} returns a valid plan, or @code{NULL} if, for
2006
some reason, the plan can't be created.  This can happen if memory runs
2007
out or if the arguments are invalid in some way (e.g.  if @code{rank} <
2008
0).
2009
 
2010
The @code{create_plan_specific} variants take as additional arguments
2011
specific input/output arrays and their strides.  For the last four
2012
arguments, you should pass the arrays and strides that you will
2013
eventually be passing to @code{fftwnd}.  The resulting plans will be
2014
optimized for those arrays and strides, although they may be used on
2015
other arrays as well.  Note: the contents of the in and out arrays are
2016
@emph{destroyed} by the specific planner (the initial contents are
2017
ignored, so the arrays need not have been initialized).
2018
@xref{Discussion on Specific Plans}, for a discussion on specific plans.
2019
 
2020
@subsubheading Arguments
2021
@itemize @bullet
2022
@item
2023
@code{rank} is the dimensionality of the arrays to be transformed.  It
2024
can be any non-negative integer.
2025
 
2026
@item
2027
@code{n} is a pointer to an array of @code{rank} integers, giving the
2028
size of each dimension of the arrays to be transformed.  These sizes,
2029
which must be positive integers, correspond to the dimensions of
2030
@cindex row-major
2031
row-major arrays---i.e. @code{n[0]} is the size of the dimension whose
2032
indices vary most slowly, and so on. (@xref{Multi-dimensional Array
2033
Format}, for more information on row-major storage.)
2034
@xref{fftw_create_plan,,Plan Creation for One-dimensional Transforms},
2035
for more information regarding optimal array sizes.
2036
 
2037
@item
2038
@code{nx} and @code{ny} in @code{fftw2d_create_plan} are positive
2039
integers specifying the dimensions of the rank 2 array to be
2040
transformed. i.e. they specify that the transform will operate on
2041
@code{nx x ny} arrays in row-major order, where @code{nx} is the number
2042
of rows and @code{ny} is the number of columns.
2043
 
2044
@item
2045
@code{nx}, @code{ny} and @code{nz} in @code{fftw3d_create_plan} are
2046
positive integers specifying the dimensions of the rank 3 array to be
2047
transformed. i.e. they specify that the transform will operate on
2048
@code{nx x ny x nz} arrays in row-major order.
2049
 
2050
@item
2051
@code{dir} is the sign of the exponent in the formula that defines the
2052
Fourier transform.  It can be @math{-1} or @math{+1}.  The aliases
2053
@code{FFTW_FORWARD} and @code{FFTW_BACKWARD} are provided, where
2054
@code{FFTW_FORWARD} stands for @math{-1}.
2055
 
2056
@item
2057
@cindex flags
2058
@code{flags} is a boolean OR (@samp{|}) of zero or more of the following:
2059
@itemize @minus
2060
@item
2061
@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
2062
actually @emph{computing} several FFTs and measuring their execution
2063
time.
2064
 
2065
@item
2066
@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
2067
plan (for a RISC processor with many registers).  If neither
2068
@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
2069
@code{FFTW_ESTIMATE}.
2070
 
2071
@item
2072
@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
2073
  and output arrays will be distinct (this is the default).
2074
 
2075
@item
2076
@code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform
2077
the transform in-place.  (Unlike the one-dimensional transform, this
2078
``really'' @footnote{@code{fftwnd} actually may use some temporary
2079
storage (hidden in the plan), but this storage space is only the size of
2080
the largest dimension of the array, rather than being as big as the
2081
entire array.  (Unless you use @code{fftwnd} to perform one-dimensional
2082
transforms, in which case the temporary storage required for in-place
2083
transforms @emph{is} as big as the entire array.)} performs the
2084
transform in-place.) Note that, if you want to perform in-place
2085
transforms, you @emph{must} use a plan created with this option.
2086
 
2087
The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
2088
 
2089
@item
2090
@cindex wisdom
2091
@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
2092
in the creation of the plan. (@xref{Words of Wisdom}.)  This can greatly
2093
speed the creation of plans, especially with the @code{FFTW_MEASURE}
2094
option. @code{FFTW_ESTIMATE} plans can also take advantage of
2095
@code{wisdom} to produce a more optimal plan (based on past
2096
measurements) than the estimation heuristic would normally
2097
generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
2098
will also be generated if the current transform size is not completely
2099
understood by existing @code{wisdom}. Note that the same @code{wisdom}
2100
is shared between one-dimensional and multi-dimensional transforms.
2101
 
2102
@end itemize
2103
 
2104
@item
2105
@code{in}, @code{out}, @code{istride}, @code{ostride} (only for the
2106
@code{_create_plan_specific} variants): see corresponding arguments in
2107
the description of @code{fftwnd}.  (@xref{fftwnd,,Computing the
2108
Multi-dimensional Transform}.)
2109
 
2110
@end itemize
2111
 
2112
@node    fftwnd, fftwnd_destroy_plan, fftwnd_create_plan, Multi-dimensional Transforms Reference
2113
@subsection Computing the Multi-dimensional Transform
2114
 
2115
@example
2116
#include <fftw.h>
2117
 
2118
void fftwnd(fftwnd_plan plan, int howmany,
2119
            fftw_complex *in, int istride, int idist,
2120
            fftw_complex *out, int ostride, int odist);
2121
 
2122
void fftwnd_one(fftwnd_plan p, fftw_complex *in,
2123
                fftw_complex *out);
2124
@end example
2125
@findex fftwnd
2126
@findex fftwnd_one
2127
 
2128
The function @code{fftwnd} computes the multi-dimensional Fourier
2129
Transform, using a plan created by @code{fftwnd_create_plan}
2130
(@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional
2131
Transforms}). (Note that the plan determines the rank and dimensions of
2132
the array to be transformed.)  The function @code{fftwnd_one} provides a
2133
simplified interface for the common case of single input array of stride
2134
1.
2135
@cindex stride
2136
 
2137
@subsubheading Arguments
2138
@itemize @bullet
2139
@item
2140
@code{plan} is the plan created by @code{fftwnd_create_plan}.
2141
(@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional
2142
Transforms}). In the case of two and three-dimensional transforms, it
2143
could also have been created by @code{fftw2d_create_plan} or
2144
@code{fftw3d_create_plan}, respectively.
2145
 
2146
@item
2147
@code{howmany} is the number of transforms @code{fftwnd} will compute.
2148
 
2149
@item
2150
@code{in}, @code{istride} and @code{idist} describe the input array(s).
2151
There are @code{howmany} input arrays; the first one is pointed to by
2152
@code{in}, the second one is pointed to by @code{in + idist}, and so on,
2153
up to @code{in + (howmany - 1) * idist}.  Each input array consists of
2154
complex numbers (@pxref{Data Types}), stored in row-major format
2155
(@pxref{Multi-dimensional Array Format}), which are not necessarily
2156
contiguous in memory.  Specifically, @code{in[0]} is the first element
2157
of the first array, @code{in[istride]} is the second element of the
2158
first array, and so on.  In general, the @code{i}-th element of the
2159
@code{j}-th input array will be in position @code{in[i * istride + j *
2160
idist]}. Note that, here, @code{i} refers to an index into the row-major
2161
format for the multi-dimensional array, rather than an index in any
2162
particular dimension.
2163
 
2164
@itemize @minus
2165
@item @emph{In-place transforms}:
2166
@cindex in-place transform
2167
For plans created with the @code{FFTW_IN_PLACE} option, the transform is
2168
computed in-place---the output is returned in the @code{in} array, using
2169
the same strides, etcetera, as were used in the input.
2170
@end itemize
2171
 
2172
@item
2173
@code{out}, @code{ostride} and @code{odist} describe the output array(s).
2174
The format is the same as for the input array.
2175
 
2176
@itemize @minus
2177
@item @emph{In-place transforms}:
2178
These parameters are ignored for plans created with the
2179
@code{FFTW_IN_PLACE} option.
2180
@end itemize
2181
 
2182
@end itemize
2183
 
2184
The function @code{fftwnd_one} transforms a single, contiguous input
2185
array to a contiguous output array.  By definition, the call
2186
@example
2187
fftwnd_one(plan, in, out)
2188
@end example
2189
is equivalent to
2190
@example
2191
fftwnd(plan, 1, in, 1, 1, out, 1, 1)
2192
@end example
2193
 
2194
@node    fftwnd_destroy_plan, What FFTWND Really Computes, fftwnd, Multi-dimensional Transforms Reference
2195
@subsection Destroying a Multi-dimensional Plan
2196
 
2197
@example
2198
#include <fftw.h>
2199
 
2200
void fftwnd_destroy_plan(fftwnd_plan plan);
2201
@end example
2202
@findex fftwnd_destroy_plan
2203
 
2204
The function @code{fftwnd_destroy_plan} frees the plan @code{plan}
2205
and releases all the memory associated with it.  After destruction,
2206
a plan is no longer valid.
2207
 
2208
@node What FFTWND Really Computes,  , fftwnd_destroy_plan, Multi-dimensional Transforms Reference
2209
@subsection What FFTWND Really Computes
2210
@cindex Discrete Fourier Transform
2211
 
2212
The conventions that we follow for the multi-dimensional transform are
2213
analogous to those for the one-dimensional transform. In particular, the
2214
forward transform has a negative sign in the exponent and neither the
2215
forward nor the backward transforms will perform any normalization.
2216
Computing the backward transform of the forward transform will multiply
2217
the array by the product of its dimensions.  The output is in-order, and
2218
the zeroth element of the output is the amplitude of the zero frequency
2219
component.
2220
 
2221
@iftex
2222
@tex
2223
The exact mathematical definition of our multi-dimensional transform
2224
follows.  Let $X$ be a $d$-dimensional complex array whose elements are
2225
$X[j_1, j_2, \ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{
2226
1, 2, \ldots, d \}$.  Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for
2227
all ~$s \in \{ 1, 2, \ldots, d \}$.
2228
 
2229
The forward transform computes a complex array~$Y$, whose
2230
structure is the same as that of~$X$, defined by
2231
 
2232
$$
2233
Y[i_1, i_2, \ldots, i_d] =
2234
    \sum_{j_1 = 0}^{n_1 - 1}
2235
        \sum_{j_2 = 0}^{n_2 - 1}
2236
           \cdots
2237
              \sum_{j_d = 0}^{n_d - 1}
2238
                  X[j_1, j_2, \ldots, j_d]
2239
                      \omega_1^{-i_1 j_1}
2240
                      \omega_2^{-i_2 j_2}
2241
                      \cdots
2242
                      \omega_d^{-i_d j_d} \ .
2243
$$
2244
 
2245
The backward transform computes
2246
$$
2247
Y[i_1, i_2, \ldots, i_d] =
2248
    \sum_{j_1 = 0}^{n_1 - 1}
2249
        \sum_{j_2 = 0}^{n_2 - 1}
2250
           \cdots
2251
              \sum_{j_d = 0}^{n_d - 1}
2252
                  X[j_1, j_2, \ldots, j_d]
2253
                      \omega_1^{i_1 j_1}
2254
                      \omega_2^{i_2 j_2}
2255
                      \cdots
2256
                      \omega_d^{i_d j_d} \ .
2257
$$
2258
 
2259
Computing the forward transform followed by the backward transform
2260
will multiply the array by $\prod_{s=1}^{d} n_d$.
2261
@end tex
2262
@end iftex
2263
@ifinfo
2264
The @TeX{} version of this manual contains the exact definition of the
2265
@math{n}-dimensional transform FFTW uses.  It is not possible to
2266
display the definition on a ASCII terminal properly.
2267
@end ifinfo
2268
@ifhtml
2269
The Gods forbade using HTML to display mathematical formulas.  Please
2270
see the TeX or Postscript version of this manual for the proper
2271
definition of the n-dimensional Fourier transform that FFTW
2272
uses.  For completeness, we include a bitmap of the TeX output below:
2273
<P><center><IMG SRC="equation-3.gif" ALIGN="top"></center>
2274
@end ifhtml
2275
 
2276
@c -------------------------------------------------------
2277
@node Real One-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference, FFTW Reference
2278
@section Real One-dimensional Transforms Reference
2279
 
2280
The one-dimensional real routines are generally prefixed with
2281
@code{rfftw_}. @footnote{The etymologically-correct spelling would be
2282
@code{frftw_}, but it is hard to remember.}  Programs using RFFTW
2283
should be linked with @code{-lrfftw -lfftw -lm} on Unix systems, or with
2284
the RFFTW, the FFTW, and the standard math libraries in general.
2285
@cindex RFFTW
2286
@cindex real transform
2287
@cindex complex to real transform
2288
 
2289
@menu
2290
* rfftw_create_plan::           Plan Creation  
2291
* rfftw::                       Plan Execution  
2292
* rfftw_destroy_plan::          Plan Destruction
2293
* What RFFTW Really Computes::  
2294
@end menu
2295
 
2296
@node    rfftw_create_plan, rfftw, Real One-dimensional Transforms Reference, Real One-dimensional Transforms Reference
2297
@subsection Plan Creation for Real One-dimensional Transforms
2298
 
2299
@example
2300
#include <rfftw.h>
2301
 
2302
rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags);
2303
 
2304
rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir,
2305
            int flags, fftw_real *in, int istride,
2306
            fftw_real *out, int ostride);
2307
@end example
2308
@tindex rfftw_plan
2309
@findex rfftw_create_plan
2310
@findex rfftw_create_plan_specific
2311
 
2312
The function @code{rfftw_create_plan} creates a plan, which is a data
2313
structure containing all the information that @code{rfftw} needs in
2314
order to compute the 1D real Fourier transform. You can create as many
2315
plans as you need, but only one plan for a given array size is required
2316
(a plan can be reused many times).
2317
 
2318
@code{rfftw_create_plan} returns a valid plan, or @code{NULL} if, for
2319
some reason, the plan can't be created.  In the default installation,
2320
this cannot happen, but it is possible to configure RFFTW in such a way
2321
that some input sizes are forbidden, and RFFTW cannot create a plan.
2322
 
2323
The @code{rfftw_create_plan_specific} variant takes as additional
2324
arguments specific input/output arrays and their strides.  For the last
2325
four arguments, you should pass the arrays and strides that you will
2326
eventually be passing to @code{rfftw}.  The resulting plans will be
2327
optimized for those arrays and strides, although they may be used on
2328
other arrays as well.  Note: the contents of the in and out arrays are
2329
@emph{destroyed} by the specific planner (the initial contents are
2330
ignored, so the arrays need not have been initialized).
2331
@xref{Discussion on Specific Plans}, for a discussion on specific plans.
2332
 
2333
@subsubheading Arguments
2334
@itemize @bullet
2335
@item
2336
@code{n} is the size of the transform.  It can be
2337
 any positive integer.
2338
 
2339
@itemize @minus
2340
@item
2341
RFFTW is best at handling sizes of the form
2342
@ifinfo
2343
@math{2^a 3^b 5^c 7^d 11^e 13^f},
2344
@end ifinfo
2345
@iftex
2346
@tex
2347
$2^a 3^b 5^c 7^d 11^e 13^f$,
2348
@end tex
2349
@end iftex
2350
@ifhtml
2351
2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
2352
        11<SUP>e</SUP> 13<SUP>f</SUP>,
2353
@end ifhtml
2354
where @math{e+f} is either @math{0} or
2355
@math{1}, and the other exponents are arbitrary.  Other sizes are
2356
computed by means of a slow, general-purpose routine (reducing to
2357
@ifinfo
2358
@math{O(n^2)}
2359
@end ifinfo
2360
@iftex
2361
@tex
2362
$O(n^2)$
2363
@end tex
2364
@end iftex
2365
@ifhtml
2366
O(n<sup>2</sup>)
2367
@end ifhtml
2368
performance for prime sizes).  (It is possible to customize RFFTW for
2369
different array sizes.  @xref{Installation and Customization}, for more
2370
information.)  Transforms whose sizes are powers of @math{2} are
2371
especially fast.
2372
@end itemize
2373
 
2374
@item
2375
@code{dir} is the direction of the desired transform, either
2376
@code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL},
2377
corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD},
2378
respectively.
2379
@ctindex FFTW_REAL_TO_COMPLEX
2380
@ctindex FFTW_COMPLEX_TO_REAL
2381
 
2382
@item
2383
@cindex flags
2384
@code{flags} is a boolean OR (@samp{|}) of zero or more of the following:
2385
@itemize @minus
2386
@item
2387
@code{FFTW_MEASURE}: this flag tells RFFTW to find the optimal plan by
2388
actually @emph{computing} several FFTs and measuring their
2389
execution time.  Depending on the installation, this can take some
2390
time.
2391
 
2392
@item
2393
@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
2394
plan (for a RISC processor with many registers).  If neither
2395
@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
2396
@code{FFTW_ESTIMATE}.
2397
 
2398
@item
2399
@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
2400
  and output arrays will be distinct (this is the default).
2401
 
2402
@item
2403
@code{FFTW_IN_PLACE}: produce a plan assuming that you want the output
2404
in the input array.  The algorithm used is not necessarily in place:
2405
RFFTW is able to compute true in-place transforms only for small values
2406
of @code{n}.  If RFFTW is not able to compute the transform in-place, it
2407
will allocate a temporary array (unless you provide one yourself),
2408
compute the transform out of place, and copy the result back.
2409
@emph{Warning: This option changes the meaning of some parameters of
2410
@code{rfftw}} (@pxref{rfftw,,Computing the Real One-dimensional Transform}).
2411
 
2412
The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
2413
 
2414
@item
2415
@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
2416
in the creation of the plan. (@xref{Words of Wisdom}.)
2417
This can greatly speed the creation of plans, especially with the
2418
@code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take
2419
advantage of @code{wisdom} to produce a more optimal plan (based on past
2420
measurements) than the estimation heuristic would normally
2421
generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
2422
will also be generated if the current transform size is not completely
2423
understood by existing @code{wisdom}.
2424
 
2425
@end itemize
2426
 
2427
@item
2428
@code{in}, @code{out}, @code{istride}, @code{ostride} (only for
2429
@code{rfftw_create_plan_specific}): see corresponding arguments in the
2430
description of @code{rfftw}.  (@xref{rfftw,,Computing the Real
2431
One-dimensional Transform}.)  In particular, the @code{out} and
2432
@code{ostride} parameters have the same special meaning for
2433
@code{FFTW_IN_PLACE} transforms as they have for @code{rfftw}.
2434
 
2435
@end itemize
2436
 
2437
@node    rfftw, rfftw_destroy_plan, rfftw_create_plan, Real One-dimensional Transforms Reference
2438
@subsection Computing the Real One-dimensional Transform
2439
 
2440
@example
2441
#include <rfftw.h>
2442
 
2443
void rfftw(rfftw_plan plan, int howmany,
2444
           fftw_real *in, int istride, int idist,
2445
           fftw_real *out, int ostride, int odist);
2446
 
2447
void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);
2448
@end example
2449
@findex rfftw
2450
@findex rfftw_one
2451
 
2452
The function @code{rfftw} computes the Real One-dimensional Fourier
2453
Transform, using a plan created by @code{rfftw_create_plan}
2454
(@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional
2455
Transforms}).  The function @code{rfftw_one} provides a simplified
2456
interface for the common case of single input array of stride 1.
2457
@cindex stride
2458
 
2459
@emph{Important:} When invoked for an out-of-place,
2460
@code{FFTW_COMPLEX_TO_REAL} transform, the input array is overwritten
2461
with scratch values by these routines.  The input array is not modified
2462
for @code{FFTW_REAL_TO_COMPLEX} transforms.
2463
 
2464
@subsubheading Arguments
2465
@itemize @bullet
2466
@item
2467
@code{plan} is the plan created by @code{rfftw_create_plan}
2468
(@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional
2469
Transforms}).
2470
 
2471
@item
2472
@code{howmany} is the number of transforms @code{rfftw} will compute.
2473
It is faster to tell RFFTW to compute many transforms, instead of
2474
simply calling @code{rfftw} many times.
2475
 
2476
@item
2477
@code{in}, @code{istride} and @code{idist} describe the input array(s).
2478
There are two cases.  If the @code{plan} defines a
2479
@code{FFTW_REAL_TO_COMPLEX} transform, @code{in} is a real array.
2480
Otherwise, for @code{FFTW_COMPLEX_TO_REAL} transforms, @code{in} is a
2481
halfcomplex array @emph{whose contents will be destroyed}.
2482
 
2483
@item
2484
@code{out}, @code{ostride} and @code{odist} describe the output
2485
array(s), and have the same meaning as the corresponding parameters for
2486
the input array.
2487
 
2488
@itemize @minus
2489
@item @emph{In-place transforms}:
2490
If the @code{plan} specifies an in-place transform, @code{ostride} and
2491
@code{odist} are always ignored.  If @code{out} is @code{NULL},
2492
@code{out} is ignored, too.  Otherwise, @code{out} is interpreted as a
2493
pointer to an array of @code{n} complex numbers, that FFTW will use as
2494
temporary space to perform the in-place computation.  @code{out} is used
2495
as scratch space and its contents destroyed.  In this case, @code{out}
2496
must be an ordinary array whose elements are contiguous in memory (no
2497
striding).
2498
@end itemize
2499
 
2500
@end itemize
2501
 
2502
The function @code{rfftw_one} transforms a single, contiguous input array
2503
to a contiguous output array.  By definition, the call
2504
@example
2505
rfftw_one(plan, in, out)
2506
@end example
2507
is equivalent to
2508
@example
2509
rfftw(plan, 1, in, 1, 1, out, 1, 1)
2510
@end example
2511
 
2512
@node    rfftw_destroy_plan, What RFFTW Really Computes, rfftw, Real One-dimensional Transforms Reference
2513
@subsection Destroying a Real One-dimensional Plan
2514
 
2515
@example
2516
#include <rfftw.h>
2517
 
2518
void rfftw_destroy_plan(rfftw_plan plan);
2519
@end example
2520
@findex rfftw_destroy_plan
2521
 
2522
The function @code{rfftw_destroy_plan} frees the plan @code{plan} and
2523
releases all the memory associated with it.  After destruction, a plan
2524
is no longer valid.
2525
 
2526
@node What RFFTW Really Computes,  , rfftw_destroy_plan, Real One-dimensional Transforms Reference
2527
@subsection What RFFTW Really Computes
2528
@cindex Discrete Fourier Transform
2529
In this section, we define precisely what RFFTW computes.
2530
 
2531
The real to complex (@code{FFTW_REAL_TO_COMPLEX}) transform of a real
2532
array @math{X} of size @math{n} computes an hermitian array @math{Y},
2533
where
2534
@iftex
2535
@tex
2536
$$
2537
Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n}
2538
$$
2539
@end tex
2540
@end iftex
2541
@ifinfo
2542
@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n)
2543
@end ifinfo
2544
@ifhtml
2545
<center><IMG SRC="equation-1.gif" ALIGN="top"></center>
2546
@end ifhtml
2547
(That @math{Y} is a hermitian array is not intended to be obvious,
2548
although the proof is easy.)  The hermitian array @math{Y} is stored in
2549
halfcomplex order (@pxref{Data Types}).  Currently, RFFTW provides no
2550
way to compute a real to complex transform with a positive sign in the
2551
exponent.
2552
 
2553
The complex to real (@code{FFTW_COMPLEX_TO_REAL}) transform of a hermitian
2554
array @math{X} of size @math{n} computes a real array @math{Y}, where
2555
@iftex
2556
@tex
2557
$$
2558
Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n}
2559
$$
2560
@end tex
2561
@end iftex
2562
@ifinfo
2563
@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n)
2564
@end ifinfo
2565
@ifhtml
2566
<center><IMG SRC="equation-2.gif" ALIGN="top"></center>
2567
@end ifhtml
2568
(That @math{Y} is a real array is not intended to be obvious, although
2569
the proof is easy.)  The hermitian input array @math{X} is stored in
2570
halfcomplex order (@pxref{Data Types}).  Currently, RFFTW provides no
2571
way to compute a complex to real transform with a negative sign in the
2572
exponent.
2573
 
2574
@cindex normalization
2575
Like FFTW, RFFTW computes an unnormalized transform.  In other words,
2576
applying the real to complex (forward) and then the complex to real
2577
(backward) transform will multiply the input by @math{n}.
2578
 
2579
@c -------------------------------------------------------
2580
@node Real Multi-dimensional Transforms Reference, Wisdom Reference, Real One-dimensional Transforms Reference, FFTW Reference
2581
@section Real Multi-dimensional Transforms Reference
2582
@cindex real multi-dimensional transform
2583
@cindex multi-dimensional transform
2584
 
2585
The multi-dimensional real routines are generally prefixed with
2586
@code{rfftwnd_}.  Programs using RFFTWND should be linked with
2587
@code{-lrfftw -lfftw -lm} on Unix systems, or with the FFTW, RFFTW, and
2588
standard math libraries in general.
2589
@cindex RFFTWND
2590
 
2591
@menu
2592
* rfftwnd_create_plan::         Plan Creation
2593
* rfftwnd::                     Plan Execution
2594
* Array Dimensions for Real Multi-dimensional Transforms::  
2595
* Strides in In-place RFFTWND::  
2596
* rfftwnd_destroy_plan::        Plan Destruction
2597
* What RFFTWND Really Computes::  
2598
@end menu
2599
 
2600
@node   rfftwnd_create_plan, rfftwnd, Real Multi-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference
2601
@subsection Plan Creation for Real Multi-dimensional Transforms
2602
 
2603
@example
2604
#include <rfftw.h>
2605
 
2606
rfftwnd_plan rfftwnd_create_plan(int rank, const int *n,
2607
                                 fftw_direction dir, int flags);
2608
 
2609
rfftwnd_plan rfftw2d_create_plan(int nx, int ny,
2610
                                 fftw_direction dir, int flags);
2611
 
2612
rfftwnd_plan rfftw3d_create_plan(int nx, int ny, int nz,
2613
                                 fftw_direction dir, int flags);
2614
@end example
2615
@tindex rfftwnd_plan
2616
@tindex fftw_direction
2617
@findex rfftwnd_create_plan
2618
@findex rfftw2d_create_plan
2619
@findex rfftw3d_create_plan
2620
 
2621
The function @code{rfftwnd_create_plan} creates a plan, which is a data
2622
structure containing all the information that @code{rfftwnd} needs in
2623
order to compute a multi-dimensional real Fourier transform.  You can
2624
create as many plans as you need, but only one plan for a given array
2625
size is required (a plan can be reused many times).  The functions
2626
@code{rfftw2d_create_plan} and @code{rfftw3d_create_plan} are optional,
2627
alternative interfaces to @code{rfftwnd_create_plan} for two and three
2628
dimensions, respectively.
2629
 
2630
@code{rfftwnd_create_plan} returns a valid plan, or @code{NULL} if, for
2631
some reason, the plan can't be created.  This can happen if the
2632
arguments are invalid in some way (e.g. if @code{rank} < 0).
2633
 
2634
@subsubheading Arguments
2635
@itemize @bullet
2636
@item
2637
@code{rank} is the dimensionality of the arrays to be transformed.  It
2638
can be any non-negative integer.
2639
 
2640
@item
2641
@code{n} is a pointer to an array of @code{rank} integers, giving the
2642
size of each dimension of the arrays to be transformed.  Note that these
2643
are always the dimensions of the @emph{real} arrays; the complex arrays
2644
have different dimensions (@pxref{Array Dimensions for Real
2645
Multi-dimensional Transforms}).  These sizes, which must be positive
2646
integers, correspond to the dimensions of row-major
2647
arrays---i.e. @code{n[0]} is the size of the dimension whose indices
2648
vary most slowly, and so on. (@xref{Multi-dimensional Array Format}, for
2649
more information.)
2650
@itemize @minus
2651
@item
2652
@xref{rfftw_create_plan,,Plan Creation for Real One-dimensional Transforms},
2653
for more information regarding optimal array sizes.
2654
@end itemize
2655
 
2656
@item
2657
@code{nx} and @code{ny} in @code{rfftw2d_create_plan} are positive
2658
integers specifying the dimensions of the rank 2 array to be
2659
transformed. i.e. they specify that the transform will operate on
2660
@code{nx x ny} arrays in row-major order, where @code{nx} is the number
2661
of rows and @code{ny} is the number of columns.
2662
 
2663
@item
2664
@code{nx}, @code{ny} and @code{nz} in @code{rfftw3d_create_plan} are
2665
positive integers specifying the dimensions of the rank 3 array to be
2666
transformed. i.e. they specify that the transform will operate on
2667
@code{nx x ny x nz} arrays in row-major order.
2668
 
2669
@item
2670
@code{dir} is the direction of the desired transform, either
2671
@code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL},
2672
corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD},
2673
respectively.
2674
 
2675
@item
2676
@cindex flags
2677
@code{flags} is a boolean OR (@samp{|}) of zero or more of the following:
2678
@itemize @minus
2679
@item
2680
@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
2681
actually @emph{computing} several FFTs and measuring their execution
2682
time.
2683
 
2684
@item
2685
@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
2686
plan (for a RISC processor with many registers).  If neither
2687
@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
2688
@code{FFTW_ESTIMATE}.
2689
 
2690
@item
2691
@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
2692
  and output arrays will be distinct (this is the default).
2693
 
2694
@item
2695
@cindex in-place transform
2696
@code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform
2697
the transform in-place.  (Unlike the one-dimensional transform, this
2698
``really'' performs the transform in-place.) Note that, if you want to
2699
perform in-place transforms, you @emph{must} use a plan created with
2700
this option.  The use of this option has important implications for the
2701
size of the input/output array (@pxref{rfftwnd,,Computing the Real
2702
Multi-dimensional Transform}).
2703
 
2704
The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
2705
 
2706
@item
2707
@cindex wisdom
2708
@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
2709
in the creation of the plan. (@xref{Words of Wisdom}.)  This can greatly
2710
speed the creation of plans, especially with the @code{FFTW_MEASURE}
2711
option. @code{FFTW_ESTIMATE} plans can also take advantage of
2712
@code{wisdom} to produce a more optimal plan (based on past
2713
measurements) than the estimation heuristic would normally
2714
generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
2715
will also be generated if the current transform size is not completely
2716
understood by existing @code{wisdom}. Note that the same @code{wisdom}
2717
is shared between one-dimensional and multi-dimensional transforms.
2718
 
2719
@end itemize
2720
 
2721
@end itemize
2722
 
2723
@node    rfftwnd, Array Dimensions for Real Multi-dimensional Transforms, rfftwnd_create_plan, Real Multi-dimensional Transforms Reference
2724
@subsection Computing the Real Multi-dimensional Transform
2725
 
2726
@example
2727
#include <rfftw.h>
2728
 
2729
void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany,
2730
                             fftw_real *in, int istride, int idist,
2731
                             fftw_complex *out, int ostride, int odist);
2732
void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany,
2733
                             fftw_complex *in, int istride, int idist,
2734
                             fftw_real *out, int ostride, int odist);
2735
 
2736
void rfftwnd_one_real_to_complex(rfftwnd_plan p, fftw_real *in,
2737
                                 fftw_complex *out);
2738
void rfftwnd_one_complex_to_real(rfftwnd_plan p, fftw_complex *in,
2739
                                 fftw_real *out);
2740
@end example
2741
@findex rfftwnd_real_to_complex
2742
@findex rfftwnd_complex_to_real
2743
@findex rfftwnd_one_real_to_complex
2744
@findex rfftwnd_one_complex_to_real
2745
 
2746
These functions compute the real multi-dimensional Fourier Transform,
2747
using a plan created by @code{rfftwnd_create_plan}
2748
(@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional
2749
Transforms}). (Note that the plan determines the rank and dimensions of
2750
the array to be transformed.)  The @samp{@code{rfftwnd_one_}} functions
2751
provide a simplified interface for the common case of single input array
2752
of stride 1.  Unlike other transform routines in FFTW, we here use
2753
separate functions for the two directions of the transform in order to
2754
correctly express the datatypes of the parameters.
2755
 
2756
@emph{Important:} When invoked for an out-of-place,
2757
@code{FFTW_COMPLEX_TO_REAL} transform with @code{rank > 1}, the input
2758
array is overwritten with scratch values by these routines.  The input
2759
array is not modified for @code{FFTW_REAL_TO_COMPLEX} transforms or for
2760
@code{FFTW_COMPLEX_TO_REAL} with @code{rank == 1}.
2761
 
2762
@subsubheading Arguments
2763
@itemize @bullet
2764
@item
2765
@code{plan} is the plan created by @code{rfftwnd_create_plan}.
2766
(@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional
2767
Transforms}). In the case of two and three-dimensional transforms, it
2768
could also have been created by @code{rfftw2d_create_plan} or
2769
@code{rfftw3d_create_plan}, respectively.
2770
 
2771
@code{FFTW_REAL_TO_COMPLEX} plans must be used with the
2772
@samp{@code{real_to_complex}} functions, and @code{FFTW_COMPLEX_TO_REAL}
2773
plans must be used with the @samp{@code{complex_to_real}} functions.  It
2774
is an error to mismatch the plan direction and the transform function.
2775
 
2776
@item
2777
@code{howmany} is the number of transforms to be computed.
2778
 
2779
@item
2780
@cindex stride
2781
@code{in}, @code{istride} and @code{idist} describe the input array(s).
2782
There are @code{howmany} input arrays; the first one is pointed to by
2783
@code{in}, the second one is pointed to by @code{in + idist}, and so on,
2784
up to @code{in + (howmany - 1) * idist}.  Each input array is stored in
2785
row-major format (@pxref{Multi-dimensional Array Format}), and is not
2786
necessarily contiguous in memory.  Specifically, @code{in[0]} is the
2787
first element of the first array, @code{in[istride]} is the second
2788
element of the first array, and so on.  In general, the @code{i}-th
2789
element of the @code{j}-th input array will be in position @code{in[i *
2790
istride + j * idist]}. Note that, here, @code{i} refers to an index into
2791
the row-major format for the multi-dimensional array, rather than an
2792
index in any particular dimension.
2793
 
2794
The dimensions of the arrays are different for real and complex data,
2795
and are discussed in more detail below (@pxref{Array Dimensions for Real
2796
Multi-dimensional Transforms}).
2797
 
2798
@itemize @minus
2799
@item @emph{In-place transforms}:
2800
For plans created with the @code{FFTW_IN_PLACE} option, the transform is
2801
computed in-place---the output is returned in the @code{in} array.  The
2802
meaning of the @code{stride} and @code{dist} parameters in this case is
2803
subtle and is discussed below (@pxref{Strides in In-place RFFTWND}).
2804
@end itemize
2805
 
2806
@item
2807
@code{out}, @code{ostride} and @code{odist} describe the output
2808
array(s).  The format is the same as that for the input array.  See
2809
below for a discussion of the dimensions of the output array for real
2810
and complex data.
2811
 
2812
@itemize @minus
2813
@item @emph{In-place transforms}:
2814
These parameters are ignored for plans created with the
2815
@code{FFTW_IN_PLACE} option.
2816
@end itemize
2817
 
2818
@end itemize
2819
 
2820
The function @code{rfftwnd_one} transforms a single, contiguous input
2821
array to a contiguous output array.  By definition, the call
2822
@example
2823
rfftwnd_one_...(plan, in, out)
2824
@end example
2825
is equivalent to
2826
@example
2827
rfftwnd_...(plan, 1, in, 1, 1, out, 1, 1)
2828
@end example
2829
 
2830
@node Array Dimensions for Real Multi-dimensional Transforms, Strides in In-place RFFTWND, rfftwnd, Real Multi-dimensional Transforms Reference
2831
@subsection Array Dimensions for Real Multi-dimensional Transforms
2832
 
2833
@cindex rfftwnd array format
2834
The output of a multi-dimensional transform of real data contains
2835
symmetries that, in principle, make half of the outputs redundant
2836
(@pxref{What RFFTWND Really Computes}).  In practice, it is not
2837
possible to entirely realize these savings in an efficient and
2838
understandable format.  Instead, the output of the rfftwnd transforms is
2839
@emph{slightly} over half of the output of the corresponding complex
2840
transform.  We do not ``pack'' the data in any way, but store it as an
2841
ordinary array of @code{fftw_complex} values.  In fact, this data is
2842
simply a subsection of what would be the array in the corresponding
2843
complex transform.
2844
 
2845
Specifically, for a real transform of dimensions
2846
@iftex
2847
@tex
2848
$n_1 \times n_2 \times \cdots \times n_d$,
2849
@end tex
2850
@end iftex
2851
@ifinfo
2852
n1 x n2 x ... x nd,
2853
@end ifinfo
2854
@ifhtml
2855
n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub>,
2856
@end ifhtml
2857
the complex data is an
2858
@iftex
2859
@tex
2860
$n_1 \times n_2 \times \cdots \times (n_d/2+1)$
2861
@end tex
2862
@end iftex
2863
@ifinfo
2864
n1 x n2 x ... x (nd/2+1)
2865
@end ifinfo
2866
@ifhtml
2867
n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1)
2868
@end ifhtml
2869
array of @code{fftw_complex} values in row-major order (with the
2870
division rounded down).  That is, we only store the lower half (plus one
2871
element) of the last dimension of the data from the ordinary complex
2872
transform.  (We could have instead taken half of any other dimension,
2873
but implementation turns out to be simpler if the last, contiguous,
2874
dimension is used.)
2875
 
2876
@cindex in-place transform
2877
@cindex padding
2878
Since the complex data is slightly larger than the real data, some
2879
complications arise for in-place transforms.  In this case, the final
2880
dimension of the real data must be padded with extra values to
2881
accommodate the size of the complex data---two extra if the last
2882
dimension is even and one if it is odd.  That is, the last dimension of
2883
the real data must physically contain
2884
@iftex
2885
@tex
2886
$2 (n_d/2+1)$
2887
@end tex
2888
@end iftex
2889
@ifinfo
2890
2 * (nd/2+1)
2891
@end ifinfo
2892
@ifhtml
2893
2 * (n<sub>d</sub>/2+1)
2894
@end ifhtml
2895
@code{fftw_real} values (exactly enough to hold the complex data).
2896
This physical array size does not, however, change the @emph{logical}
2897
array size---only
2898
@iftex
2899
@tex
2900
$n_d$
2901
@end tex
2902
@end iftex
2903
@ifinfo
2904
nd
2905
@end ifinfo
2906
@ifhtml
2907
n<sub>d</sub>
2908
@end ifhtml
2909
values are actually stored in the last dimension, and
2910
@iftex
2911
@tex
2912
$n_d$
2913
@end tex
2914
@end iftex
2915
@ifinfo
2916
nd
2917
@end ifinfo
2918
@ifhtml
2919
n<sub>d</sub>
2920
@end ifhtml
2921
is the last dimension passed to @code{rfftwnd_create_plan}.
2922
 
2923
@node Strides in In-place RFFTWND, rfftwnd_destroy_plan, Array Dimensions for Real Multi-dimensional Transforms, Real Multi-dimensional Transforms Reference
2924
@subsection Strides in In-place RFFTWND
2925
 
2926
@cindex rfftwnd array format
2927
@cindex stride
2928
The fact that the input and output datatypes are different for rfftwnd
2929
complicates the meaning of the @code{stride} and @code{dist} parameters
2930
of in-place transforms---are they in units of @code{fftw_real} or
2931
@code{fftw_complex} elements?  When reading the input, they are
2932
interpreted in units of the datatype of the input data.  When writing
2933
the output, the @code{istride} and @code{idist} are translated to the
2934
output datatype's ``units'' in one of two ways, corresponding to the two
2935
most common situations in which @code{stride} and @code{dist} parameters
2936
are useful.  Below, we refer to these ``translated'' parameters as
2937
@code{ostride_t} and @code{odist_t}.  (Note that these are computed
2938
internally by rfftwnd; the actual @code{ostride} and @code{odist}
2939
parameters are ignored for in-place transforms.)
2940
 
2941
First, there is the case where you are transforming a number of
2942
contiguous arrays located one after another in memory.  In this
2943
situation, @code{istride} is @code{1} and @code{idist} is the product of
2944
the physical dimensions of the array.  @code{ostride_t} and
2945
@code{odist_t} are then chosen so that the output arrays are contiguous
2946
and lie on top of the input arrays.  @code{ostride_t} is therefore
2947
@code{1}.  For a real-to-complex transform, @code{odist_t} is
2948
@code{idist/2}; for a complex-to-real transform, @code{odist_t} is
2949
@code{idist*2}.
2950
 
2951
The second case is when you have an array in which each element has
2952
@code{nc} components (e.g. a structure with @code{nc} numeric fields),
2953
and you want to transform all of the components at once.  Here,
2954
@code{istride} is @code{nc} and @code{idist} is @code{1}.  For this
2955
case, it is natural to want the output to also have @code{nc}
2956
consecutive components, now of the output data type; this is exactly
2957
what rfftwnd does.  Specifically, it uses an @code{ostride_t} equal to
2958
@code{istride}, and an @code{odist_t} of @code{1}.  (Astute readers will
2959
realize that some extra buffer space is required in order to perform
2960
such a transform; this is handled automatically by rfftwnd.)
2961
 
2962
The general rule is as follows.  @code{ostride_t} equals @code{istride}.
2963
If @code{idist} is @code{1}, then @code{odist_t} is @code{1}.
2964
Otherwise, for a real-to-complex transform @code{odist_t} is
2965
@code{idist/2} and for a complex-to-real transform @code{odist_t} is
2966
@code{idist*2}.
2967
 
2968
@node    rfftwnd_destroy_plan, What RFFTWND Really Computes, Strides in In-place RFFTWND, Real Multi-dimensional Transforms Reference
2969
@subsection Destroying a Multi-dimensional Plan
2970
 
2971
@example
2972
#include <rfftw.h>
2973
 
2974
void rfftwnd_destroy_plan(rfftwnd_plan plan);
2975
@end example
2976
@findex rfftwnd_destroy_plan
2977
 
2978
The function @code{rfftwnd_destroy_plan} frees the plan @code{plan}
2979
and releases all the memory associated with it.  After destruction,
2980
a plan is no longer valid.
2981
 
2982
@node What RFFTWND Really Computes,  , rfftwnd_destroy_plan, Real Multi-dimensional Transforms Reference
2983
@subsection What RFFTWND Really Computes
2984
@cindex Discrete Fourier Transform
2985
 
2986
The conventions that we follow for the real multi-dimensional transform
2987
are analogous to those for the complex multi-dimensional transform. In
2988
particular, the forward transform has a negative sign in the exponent
2989
and neither the forward nor the backward transforms will perform any
2990
normalization.  Computing the backward transform of the forward
2991
transform will multiply the array by the product of its dimensions (that
2992
is, the logical dimensions of the real data).  The forward transform is
2993
real-to-complex and the backward transform is complex-to-real.
2994
 
2995
@cindex Discrete Fourier Transform
2996
@cindex hermitian array
2997
@iftex
2998
@tex
2999
The exact mathematical definition of our real multi-dimensional
3000
transform follows.
3001
 
3002
@noindent@emph{Real to complex (forward) transform.}
3003
Let $X$ be a $d$-dimensional real array whose elements are $X[j_1, j_2,
3004
\ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots,
3005
d \}$.  Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{
3006
1, 2, \ldots, d \}$.
3007
 
3008
The real to complex transform computes a complex array~$Y$, whose
3009
structure is the same as that of~$X$, defined by
3010
 
3011
$$
3012
Y[i_1, i_2, \ldots, i_d] =
3013
    \sum_{j_1 = 0}^{n_1 - 1}
3014
        \sum_{j_2 = 0}^{n_2 - 1}
3015
           \cdots
3016
              \sum_{j_d = 0}^{n_d - 1}
3017
                  X[j_1, j_2, \ldots, j_d]
3018
                      \omega_1^{-i_1 j_1}
3019
                      \omega_2^{-i_2 j_2}
3020
                      \cdots
3021
                      \omega_d^{-i_d j_d} \ .
3022
$$
3023
 
3024
The output array $Y$ enjoys a multidimensional hermitian symmetry, that
3025
is, the identity $Y[i_1, i_2, \ldots, i_d] = Y[n_1-i_1, n_2-i_2, \ldots,
3026
n_d - i_d]^{*}$ holds for all $0 \leq i_s < n_s$.  Because of this
3027
symmetry, $Y$ is stored in the peculiar way described in @ref{Array
3028
Dimensions for Real Multi-dimensional Transforms}.
3029
@cindex hermitian array
3030
 
3031
@noindent@emph{Complex to real (backward) transform.}  Let $X$ be a
3032
$d$-dimensional complex array whose elements are $X[j_1, j_2, \ldots,
3033
j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, d \}$.
3034
The array $X$ must be hermitian, that is, the identity $X[j_1, j_2,
3035
\ldots, j_d] = X[n_1-j_1, n_2-j_2, \ldots, n_d - j_d]^{*}$ must hold for all
3036
$0 \leq j_s < n_s$.  Moreover, $X$ must be stored in memory in the
3037
peculiar way described in @ref{Array Dimensions for Real
3038
Multi-dimensional Transforms}.
3039
 
3040
Let $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ 1, 2,
3041
\ldots, d \}$.  The complex to real transform computes a real array~$Y$, whose
3042
structure is the same as that of~$X$, defined by
3043
 
3044
$$
3045
Y[i_1, i_2, \ldots, i_d] =
3046
    \sum_{j_1 = 0}^{n_1 - 1}
3047
        \sum_{j_2 = 0}^{n_2 - 1}
3048
           \cdots
3049
              \sum_{j_d = 0}^{n_d - 1}
3050
                  X[j_1, j_2, \ldots, j_d]
3051
                      \omega_1^{i_1 j_1}
3052
                      \omega_2^{i_2 j_2}
3053
                      \cdots
3054
                      \omega_d^{i_d j_d} \ .
3055
$$
3056
 
3057
(That $Y$ is real is not meant to be obvious, although the proof is
3058
easy.)
3059
 
3060
Computing the forward transform followed by the backward transform
3061
will multiply the array by $\prod_{s=1}^{d} n_d$.
3062
@end tex
3063
@end iftex
3064
@ifinfo
3065
The @TeX{} version of this manual contains the exact definition of the
3066
@math{n}-dimensional transform RFFTWND uses.  It is not possible to
3067
display the definition on a ASCII terminal properly.
3068
@end ifinfo
3069
@ifhtml
3070
The Gods forbade using HTML to display mathematical formulas.  Please
3071
see the TeX or Postscript version of this manual for the proper
3072
definition of the n-dimensional real Fourier transform that RFFTW
3073
uses.  For completeness, we include a bitmap of the TeX output below:
3074
<P><center><IMG SRC="equation-4.gif" ALIGN="top"></center>
3075
@end ifhtml
3076
 
3077
 
3078
@c -------------------------------------------------------
3079
@node Wisdom Reference, Memory Allocator Reference, Real Multi-dimensional Transforms Reference, FFTW Reference
3080
@section Wisdom Reference
3081
@menu
3082
* fftw_export_wisdom::          
3083
* fftw_import_wisdom::          
3084
* fftw_forget_wisdom::          
3085
@end menu
3086
 
3087
@cindex wisdom
3088
@node    fftw_export_wisdom, fftw_import_wisdom, Wisdom Reference, Wisdom Reference
3089
@subsection Exporting Wisdom
3090
 
3091
@example
3092
#include <fftw.h>
3093
 
3094
void fftw_export_wisdom(void (*emitter)(char c, void *), void *data);
3095
void fftw_export_wisdom_to_file(FILE *output_file);
3096
char *fftw_export_wisdom_to_string(void);
3097
@end example
3098
@findex fftw_export_wisdom
3099
@findex fftw_export_wisdom_to_file
3100
@findex fftw_export_wisdom_to_string
3101
 
3102
These functions allow you to export all currently accumulated
3103
@code{wisdom} in a form from which it can be later imported and
3104
restored, even during a separate run of the program. (@xref{Words of Wisdom}.)  The current store of @code{wisdom} is not
3105
affected by calling any of these routines.
3106
 
3107
@code{fftw_export_wisdom} exports the @code{wisdom} to any output
3108
medium, as specified by the callback function
3109
@code{emitter}. @code{emitter} is a @code{putc}-like function that
3110
writes the character @code{c} to some output; its second parameter is
3111
the @code{data} pointer passed to @code{fftw_export_wisdom}.  For
3112
convenience, the following two ``wrapper'' routines are provided:
3113
 
3114
@code{fftw_export_wisdom_to_file} writes the @code{wisdom} to the
3115
current position in @code{output_file}, which should be open with write
3116
permission.  Upon exit, the file remains open and is positioned at the
3117
end of the @code{wisdom} data.
3118
 
3119
@code{fftw_export_wisdom_to_string} returns a pointer to a
3120
@code{NULL}-terminated string holding the @code{wisdom} data. This
3121
string is dynamically allocated, and it is the responsibility of the
3122
caller to deallocate it with @code{fftw_free} when it is no longer
3123
needed.
3124
 
3125
All of these routines export the wisdom in the same format, which we
3126
will not document here except to say that it is LISP-like ASCII text
3127
that is insensitive to white space.
3128
 
3129
@node    fftw_import_wisdom, fftw_forget_wisdom, fftw_export_wisdom, Wisdom Reference
3130
@subsection Importing Wisdom
3131
 
3132
@example
3133
#include <fftw.h>
3134
 
3135
fftw_status fftw_import_wisdom(int (*get_input)(void *), void *data);
3136
fftw_status fftw_import_wisdom_from_file(FILE *input_file);
3137
fftw_status fftw_import_wisdom_from_string(const char *input_string);
3138
@end example
3139
@findex fftw_import_wisdom
3140
@findex fftw_import_wisdom_from_file
3141
@findex fftw_import_wisdom_from_string
3142
 
3143
These functions import @code{wisdom} into a program from data stored by
3144
the @code{fftw_export_wisdom} functions above. (@xref{Words of Wisdom}.)
3145
The imported @code{wisdom} supplements rather than replaces any
3146
@code{wisdom} already accumulated by the running program (except when
3147
there is conflicting @code{wisdom}, in which case the existing wisdom is
3148
replaced).
3149
 
3150
@code{fftw_import_wisdom} imports @code{wisdom} from any input medium,
3151
as specified by the callback function @code{get_input}. @code{get_input}
3152
is a @code{getc}-like function that returns the next character in the
3153
input; its parameter is the @code{data} pointer passed to
3154
@code{fftw_import_wisdom}. If the end of the input data is reached
3155
(which should never happen for valid data), it may return either
3156
@code{NULL} (ASCII 0) or @code{EOF} (as defined in @code{<stdio.h>}).
3157
For convenience, the following two ``wrapper'' routines are provided:
3158
 
3159
@code{fftw_import_wisdom_from_file} reads @code{wisdom} from the
3160
current position in @code{input_file}, which should be open with read
3161
permission.  Upon exit, the file remains open and is positioned at the
3162
end of the @code{wisdom} data.
3163
 
3164
@code{fftw_import_wisdom_from_string} reads @code{wisdom} from the
3165
@code{NULL}-terminated string @code{input_string}.
3166
 
3167
The return value of these routines is @code{FFTW_SUCCESS} if the wisdom
3168
was read successfully, and @code{FFTW_FAILURE} otherwise. Note that, in
3169
all of these functions, any data in the input stream past the end of the
3170
@code{wisdom} data is simply ignored (it is not even read if the
3171
@code{wisdom} data is well-formed).
3172
 
3173
@node    fftw_forget_wisdom,  , fftw_import_wisdom, Wisdom Reference
3174
@subsection Forgetting Wisdom
3175
 
3176
@example
3177
#include <fftw.h>
3178
 
3179
void fftw_forget_wisdom(void);
3180
@end example
3181
@findex fftw_forget_wisdom
3182
 
3183
Calling @code{fftw_forget_wisdom} causes all accumulated @code{wisdom}
3184
to be discarded and its associated memory to be freed. (New
3185
@code{wisdom} can still be gathered subsequently, however.)
3186
 
3187
@c -------------------------------------------------------
3188
@node    Memory Allocator Reference, Thread safety, Wisdom Reference, FFTW Reference
3189
@section Memory Allocator Reference
3190
 
3191
@example
3192
#include <fftw.h>
3193
 
3194
void *(*fftw_malloc_hook) (size_t n);
3195
void (*fftw_free_hook) (void *p);
3196
@end example
3197
@vindex fftw_malloc_hook
3198
@findex fftw_malloc
3199
@ffindex malloc
3200
@vindex fftw_free_hook
3201
 
3202
Whenever it has to allocate and release memory, FFTW ordinarily calls
3203
@code{malloc} and @code{free}.  
3204
If @code{malloc} fails, FFTW prints an error message and exits.  This
3205
behavior may be undesirable in some applications. Also, special
3206
memory-handling functions may be necessary in certain
3207
environments. Consequently, FFTW provides means by which you can install
3208
your own memory allocator and take whatever error-correcting action you
3209
find appropriate.  The variables @code{fftw_malloc_hook} and
3210
@code{fftw_free_hook} are pointers to functions, and they are normally
3211
@code{NULL}.  If you set those variables to point to other functions,
3212
then FFTW will use your routines instead of @code{malloc} and
3213
@code{free}.  @code{fftw_malloc_hook} must point to a @code{malloc}-like
3214
function, and @code{fftw_free_hook} must point to a @code{free}-like
3215
function.
3216
 
3217
@c -------------------------------------------------------
3218
@node Thread safety,  , Memory Allocator Reference, FFTW Reference
3219
@section Thread safety
3220
 
3221
@cindex threads
3222
@cindex thread safety
3223
Users writing multi-threaded programs must concern themselves with the
3224
@dfn{thread safety} of the libraries they use---that is, whether it is
3225
safe to call routines in parallel from multiple threads.  FFTW can be
3226
used in such an environment, but some care must be taken because certain
3227
parts of FFTW use private global variables to share data between calls.
3228
In particular, the plan-creation functions share trigonometric tables
3229
and accumulated @code{wisdom}.  (Users should note that these comments
3230
only apply to programs using shared-memory threads.  Parallelism using
3231
MPI or forked processes involves a separate address-space and global
3232
variables for each process, and is not susceptible to problems of this
3233
sort.)
3234
 
3235
The central restriction of FFTW is that it is not safe to create
3236
multiple plans in parallel.  You must either create all of your plans
3237
from a single thread, or instead use a semaphore, mutex, or other
3238
mechanism to ensure that different threads don't attempt to create plans
3239
at the same time.  The same restriction also holds for destruction of
3240
plans and importing/forgetting @code{wisdom}.  Once created, a plan may
3241
safely be used in any thread.
3242
 
3243
The actual transform routines in FFTW (@code{fftw_one}, etcetera) are
3244
re-entrant and thread-safe, so it is fine to call them simultaneously
3245
from multiple threads.  Another question arises, however---is it safe to
3246
use the @emph{same plan} for multiple transforms in parallel?  (It would
3247
be unsafe if, for example, the plan were modified in some way by the
3248
transform.)  We address this question by defining an additional planner
3249
flag, @code{FFTW_THREADSAFE}.
3250
@ctindex FFTW_THREADSAFE
3251
When included in the flags for any of the plan-creation routines,
3252
@code{FFTW_THREADSAFE} guarantees that the resulting plan will be
3253
read-only and safe to use in parallel by multiple threads.
3254
 
3255
@c ************************************************************
3256
@node Parallel FFTW, Calling FFTW from Fortran, FFTW Reference, Top
3257
@chapter Parallel FFTW
3258
 
3259
@cindex parallel transform
3260
In this chapter we discuss the use of FFTW in a parallel environment,
3261
documenting the different parallel libraries that we have provided.
3262
(Users calling FFTW from a multi-threaded program should also consult
3263
@ref{Thread safety}.)  The FFTW package currently contains three parallel
3264
transform implementations that leverage the uniprocessor FFTW code:
3265
 
3266
@itemize @bullet
3267
 
3268
@item
3269
@cindex threads
3270
The first set of routines utilizes shared-memory threads for parallel
3271
one- and multi-dimensional transforms of both real and complex data.
3272
Any program using FFTW can be trivially modified to use the
3273
multi-threaded routines.  This code can use any common threads
3274
implementation, including POSIX threads.  (POSIX threads are available
3275
on most Unix variants, including Linux.)  These routines are located in
3276
the @code{threads} directory, and are documented in @ref{Multi-threaded
3277
FFTW}.
3278
 
3279
@item
3280
@cindex MPI
3281
@cindex distributed memory
3282
The @code{mpi} directory contains multi-dimensional transforms
3283
of real and complex data for parallel machines supporting MPI.  It also
3284
includes parallel one-dimensional transforms for complex data.  The main
3285
feature of this code is that it supports distributed-memory transforms,
3286
so it runs on everything from workstation clusters to massively-parallel
3287
supercomputers.  More information on MPI can be found at the
3288
@uref{http://www.mcs.anl.gov/mpi, MPI home page}.  The FFTW MPI routines
3289
are documented in @ref{MPI FFTW}.
3290
 
3291
@item
3292
@cindex Cilk
3293
We also have an experimental parallel implementation written in Cilk, a
3294
C-like parallel language developed at MIT and currently available for
3295
several SMP platforms.  For more information on Cilk see
3296
@uref{http://supertech.lcs.mit.edu/cilk, the Cilk home page}.  The FFTW
3297
Cilk code can be found in the @code{cilk} directory, with parallelized
3298
one- and multi-dimensional transforms of complex data.  The Cilk FFTW
3299
routines are documented in @code{cilk/README}.
3300
 
3301
@end itemize
3302
 
3303
@menu
3304
* Multi-threaded FFTW::        
3305
* MPI FFTW::                    
3306
@end menu
3307
 
3308
@c ------------------------------------------------------------
3309
@node Multi-threaded FFTW, MPI FFTW, Parallel FFTW, Parallel FFTW
3310
@section Multi-threaded FFTW
3311
 
3312
@cindex threads
3313
In this section we document the parallel FFTW routines for shared-memory
3314
threads on SMP hardware.  These routines, which support parellel one-
3315
and multi-dimensional transforms of both real and complex data, are the
3316
easiest way to take advantage of multiple processors with FFTW.  They
3317
work just like the corresponding uniprocessor transform routines, except
3318
that they take the number of parallel threads to use as an extra
3319
parameter.  Any program that uses the uniprocessor FFTW can be trivially
3320
modified to use the multi-threaded FFTW.
3321
 
3322
@menu
3323
* Installation and Supported Hardware/Software::  
3324
* Usage of Multi-threaded FFTW::  
3325
* How Many Threads to Use?::    
3326
* Using Multi-threaded FFTW in a Multi-threaded Program::  
3327
* Tips for Optimal Threading::  
3328
@end menu
3329
 
3330
@c -------------------------------------------------------
3331
@node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
3332
@subsection Installation and Supported Hardware/Software
3333
 
3334
All of the FFTW threads code is located in the @code{threads}
3335
subdirectory of the FFTW package.  On Unix systems, the FFTW threads
3336
libraries and header files can be automatically configured, compiled,
3337
and installed along with the uniprocessor FFTW libraries simply by
3338
including @code{--enable-threads} in the flags to the @code{configure}
3339
script (@pxref{Installation on Unix}).  (Note also that the threads
3340
routines, when enabled, are automatically tested by the @samp{@code{make
3341
check}} self-tests.)
3342
@fpindex configure
3343
 
3344
The threads routines require your operating system to have some sort of
3345
shared-memory threads support.  Specifically, the FFTW threads package
3346
works with POSIX threads (available on most Unix variants, including
3347
Linux), Solaris threads, @uref{http://www.be.com,BeOS} threads (tested
3348
on BeOS DR8.2), Mach C threads (reported to work by users), and Win32
3349
threads (reported to work by users).  (There is also untested code to
3350
use MacOS MP threads.)  If you have a shared-memory machine that uses a
3351
different threads API, it should be a simple matter of programming to
3352
include support for it; see the file @code{fftw_threads-int.h} for more
3353
detail.
3354
 
3355
SMP hardware is not required, although of course you need multiple
3356
processors to get any benefit from the multithreaded transforms.
3357
 
3358
@c -------------------------------------------------------
3359
@node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
3360
@subsection Usage of Multi-threaded FFTW
3361
 
3362
Here, it is assumed that the reader is already familiar with the usage
3363
of the uniprocessor FFTW routines, described elsewhere in this manual.
3364
We only describe what one has to change in order to use the
3365
multi-threaded routines.
3366
 
3367
First, instead of including @code{<fftw.h>} or @code{<rfftw.h>}, you
3368
should include the files @code{<fftw_threads.h>} or
3369
@code{<rfftw_threads.h>}, respectively.
3370
 
3371
Second, before calling any FFTW routines, you should call the function:
3372
 
3373
@example
3374
int fftw_threads_init(void);
3375
@end example
3376
@findex fftw_threads_init
3377
 
3378
This function, which should only be called once (probably in your
3379
@code{main()} function), performs any one-time initialization required
3380
to use threads on your system.  It returns zero if successful, and a
3381
non-zero value if there was an error (in which case, something is
3382
seriously wrong and you should probably exit the program).
3383
 
3384
Third, when you want to actually compute the transform, you should use
3385
one of the following transform routines instead of the ordinary FFTW
3386
functions:
3387
 
3388
@example
3389
fftw_threads(nthreads, plan, howmany, in, istride,
3390
             idist, out, ostride, odist);
3391
@findex fftw_threads
3392
 
3393
fftw_threads_one(nthreads, plan, in, out);
3394
@findex fftw_threads_one
3395
 
3396
fftwnd_threads(nthreads, plan, howmany, in, istride,
3397
               idist, out, ostride, odist);
3398
@findex fftwnd_threads
3399
 
3400
fftwnd_threads_one(nthreads, plan, in, out);
3401
@findex fftwnd_threads_one
3402
 
3403
rfftw_threads(nthreads, plan, howmany, in, istride,
3404
              idist, out, ostride, odist);
3405
@findex rfftw_threads
3406
 
3407
rfftw_threads_one(nthreads, plan, in, out);
3408
@findex rfftw_threads_one
3409
 
3410
rfftwnd_threads_real_to_complex(nthreads, plan, howmany, in,
3411
                                istride, idist, out, ostride, odist);
3412
@findex rfftwnd_threads_real_to_complex
3413
 
3414
rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);
3415
@findex rfftwnd_threads_one_real_to_complex
3416
 
3417
rfftwnd_threads_complex_to_real(nthreads, plan, howmany, in,
3418
                                istride, idist, out, ostride, odist);
3419
@findex rfftwnd_threads_complex_to_real
3420
 
3421
rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);
3422
@findex rfftwnd_threads_one_real_to_complex
3423
 
3424
rfftwnd_threads_one_complex_to_real(nthreads, plan, in, out);
3425
@findex rfftwnd_threads_one_complex_to_real
3426
@end example
3427
 
3428
All of these routines take exactly the same arguments and have exactly
3429
the same effects as their uniprocessor counterparts (i.e. without the
3430
@samp{@code{_threads}}) @emph{except} that they take one extra
3431
parameter, @code{nthreads} (of type @code{int}), before the normal
3432
parameters.@footnote{There is one exception: when performing
3433
one-dimensional in-place transforms, the @code{out} parameter is always
3434
ignored by the multi-threaded routines, instead of being used as a
3435
workspace if it is non-@code{NULL} as in the uniprocessor routines.  The
3436
multi-threaded routines always allocate their own workspace (the size of
3437
which depends upon the number of threads).}  The @code{nthreads}
3438
parameter specifies the number of threads of execution to use when
3439
performing the transform (actually, the maximum number of threads).
3440
@cindex number of threads
3441
 
3442
For example, to parallelize a single one-dimensional transform of
3443
complex data, instead of calling the uniprocessor @code{fftw_one(plan,
3444
in, out)}, you would call @code{fftw_threads_one(nthreads, plan, in,
3445
out)}.  Passing an @code{nthreads} of @code{1} means to use only one
3446
thread (the main thread), and is equivalent to calling the uniprocessor
3447
routine.  Passing an @code{nthreads} of @code{2} means that the
3448
transform is potentially parallelized over two threads (and two
3449
processors, if you have them), and so on.
3450
 
3451
These are the only changes you need to make to your source code.  Calls
3452
to all other FFTW routines (plan creation, destruction, wisdom,
3453
etcetera) are not parallelized and remain the same.  (The same plans and
3454
wisdom are used by both uniprocessor and multi-threaded transforms.)
3455
Your arrays are allocated and formatted in the same way, and so on.
3456
 
3457
Programs using the parallel complex transforms should be linked with
3458
@code{-lfftw_threads -lfftw -lm} on Unix.  Programs using the parallel
3459
real transforms should be linked with @code{-lrfftw_threads
3460
-lfftw_threads -lrfftw -lfftw -lm}.  You will also need to link with
3461
whatever library is responsible for threads on your system
3462
(e.g. @code{-lpthread} on Linux).
3463
@cindex linking on Unix
3464
 
3465
@c -------------------------------------------------------
3466
@node How Many Threads to Use?, Using Multi-threaded FFTW in a Multi-threaded Program, Usage of Multi-threaded FFTW, Multi-threaded FFTW
3467
@subsection How Many Threads to Use?
3468
 
3469
@cindex number of threads
3470
There is a fair amount of overhead involved in spawning and synchronizing
3471
threads, so the optimal number of threads to use depends upon the size
3472
of the transform as well as on the number of processors you have.
3473
 
3474
As a general rule, you don't want to use more threads than you have
3475
processors.  (Using more threads will work, but there will be extra
3476
overhead with no benefit.)  In fact, if the problem size is too small,
3477
you may want to use fewer threads than you have processors.
3478
 
3479
You will have to experiment with your system to see what level of
3480
parallelization is best for your problem size.  Useful tools to help you
3481
do this are the test programs that are automatically compiled along with
3482
the threads libraries, @code{fftw_threads_test} and
3483
@code{rfftw_threads_test} (in the @code{threads} subdirectory).  These
3484
@pindex fftw_threads_test
3485
@pindex rfftw_threads_test
3486
take the same arguments as the other FFTW test programs (see
3487
@code{tests/README}), except that they also take the number of threads
3488
to use as a first argument, and report the parallel speedup in speed
3489
tests.  For example,
3490
 
3491
@example
3492
fftw_threads_test 2 -s 128x128
3493
@end example
3494
 
3495
will benchmark complex 128x128 transforms using two threads and report
3496
the speedup relative to the uniprocessor transform.
3497
@cindex benchmark
3498
 
3499
For instance, on a 4-processor 200MHz Pentium Pro system running Linux
3500
2.2.0, we found that the "crossover" point at which 2 threads became
3501
beneficial for complex transforms was about 4k points, while 4 threads
3502
became beneficial at 8k points.
3503
 
3504
@c -------------------------------------------------------
3505
@node Using Multi-threaded FFTW in a Multi-threaded Program, Tips for Optimal Threading, How Many Threads to Use?, Multi-threaded FFTW
3506
@subsection Using Multi-threaded FFTW in a Multi-threaded Program
3507
 
3508
@cindex thread safety
3509
It is perfectly possible to use the multi-threaded FFTW routines from a
3510
multi-threaded program (e.g. have multiple threads computing
3511
multi-threaded transforms simultaneously).  If you have the processors,
3512
more power to you!  However, the same restrictions apply as for the
3513
uniprocessor FFTW routines (@pxref{Thread safety}).  In particular, you
3514
should recall that you may not create or destroy plans in parallel.
3515
 
3516
@c -------------------------------------------------------
3517
@node Tips for Optimal Threading,  , Using Multi-threaded FFTW in a Multi-threaded Program, Multi-threaded FFTW
3518
@subsection Tips for Optimal Threading
3519
 
3520
Not all transforms are equally well-parallelized by the multi-threaded
3521
FFTW routines.  (This is merely a consequence of laziness on the part of
3522
the implementors, and is not inherent to the algorithms employed.)
3523
Mainly, the limitations are in the parallel one-dimensional transforms.
3524
The things to avoid if you want optimal parallelization are as follows:
3525
 
3526
@subsection Parallelization deficiencies in one-dimensional transforms
3527
 
3528
@itemize @bullet
3529
 
3530
@item
3531
Large prime factors can sometimes parallelize poorly.  Of course, you
3532
should avoid these anyway if you want high performance.
3533
 
3534
@item
3535
@cindex in-place transform
3536
Single in-place transforms don't parallelize completely.  (Multiple
3537
in-place transforms, i.e. @code{howmany > 1}, are fine.)  Again, you
3538
should avoid these in any case if you want high performance, as they
3539
require transforming to a scratch array and copying back.
3540
 
3541
@item
3542
Single real-complex (@code{rfftw}) transforms don't parallelize
3543
completely.  This is unfortunate, but parallelizing this correctly would
3544
have involved a lot of extra code (and a much larger library).  You
3545
still get some benefit from additional processors, but if you have a
3546
very large number of processors you will probably be better off using
3547
the parallel complex (@code{fftw}) transforms.  Note that
3548
multi-dimensional real transforms or multiple one-dimensional real
3549
transforms are fine.
3550
 
3551
@end itemize
3552
 
3553
@c ------------------------------------------------------------
3554
@node MPI FFTW,  , Multi-threaded FFTW, Parallel FFTW
3555
@section MPI FFTW
3556
 
3557
@cindex MPI
3558
This section describes the MPI FFTW routines for distributed-memory (and
3559
shared-memory) machines supporting MPI (Message Passing Interface).  The
3560
MPI routines are significantly different from the ordinary FFTW because
3561
the transform data here are @emph{distributed} over multiple processes,
3562
so that each process gets only a portion of the array.
3563
@cindex distributed memory
3564
Currently, multi-dimensional transforms of both real and complex data,
3565
as well as one-dimensional transforms of complex data, are supported.
3566
 
3567
@menu
3568
* MPI FFTW Installation::      
3569
* Usage of MPI FFTW for Complex Multi-dimensional Transforms::  
3570
* MPI Data Layout::            
3571
* Usage of MPI FFTW for Real Multi-dimensional Transforms::  
3572
* Usage of MPI FFTW for Complex One-dimensional Transforms::  
3573
* MPI Tips::                    
3574
@end menu
3575
 
3576
@c -------------------------------------------------------
3577
@node MPI FFTW Installation, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW, MPI FFTW
3578
@subsection MPI FFTW Installation
3579
 
3580
The FFTW MPI library code is all located in the @code{mpi} subdirectoy
3581
of the FFTW package (along with source code for test programs).  On Unix
3582
systems, the FFTW MPI libraries and header files can be automatically
3583
configured, compiled, and installed along with the uniprocessor FFTW
3584
libraries simply by including @code{--enable-mpi} in the flags to the
3585
@code{configure} script (@pxref{Installation on Unix}).
3586
@fpindex configure
3587
 
3588
The only requirement of the FFTW MPI code is that you have the standard
3589
MPI 1.1 (or later) libraries and header files installed on your system.
3590
A free implementation of MPI is available from
3591
@uref{http://www-unix.mcs.anl.gov/mpi/mpich/,the MPICH home page}.
3592
 
3593
Previous versions of the FFTW MPI routines have had an unfortunate
3594
tendency to expose bugs in MPI implementations.  The current version has
3595
been largely rewritten, and hopefully avoids some of the problems.  If
3596
you run into difficulties, try passing the optional workspace to
3597
@code{(r)fftwnd_mpi} (see below), as this allows us to use the standard
3598
(and hopefully well-tested) @code{MPI_Alltoall} primitive for
3599
@ffindex MPI_Alltoall
3600
communications.  Please let us know (@email{fftw@@theory.lcs.mit.edu})
3601
how things work out.
3602
 
3603
@pindex fftw_mpi_test
3604
@pindex rfftw_mpi_test
3605
Several test programs are included in the @code{mpi} directory.  The
3606
ones most useful to you are probably the @code{fftw_mpi_test} and
3607
@code{rfftw_mpi_test} programs, which are run just like an ordinary MPI
3608
program and accept the same parameters as the other FFTW test programs
3609
(c.f. @code{tests/README}).  For example, @code{mpirun @i{...params...}
3610
fftw_mpi_test -r 0} will run non-terminating complex-transform
3611
correctness tests of random dimensions.  They can also do performance
3612
benchmarks.
3613
 
3614
@c -------------------------------------------------------
3615
@node Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI Data Layout, MPI FFTW Installation, MPI FFTW
3616
@subsection Usage of MPI FFTW for Complex Multi-dimensional Transforms
3617
 
3618
Usage of the MPI FFTW routines is similar to that of the uniprocessor
3619
FFTW.  We assume that the reader already understands the usage of the
3620
uniprocessor FFTW routines, described elsewhere in this manual.  Some
3621
familiarity with MPI is also helpful.
3622
 
3623
A typical program performing a complex two-dimensional MPI transform
3624
might look something like:
3625
 
3626
@example
3627
#include <fftw_mpi.h>
3628
 
3629
int main(int argc, char **argv)
3630
@{
3631
      const int NX = ..., NY = ...;
3632
      fftwnd_mpi_plan plan;
3633
      fftw_complex *data;
3634
 
3635
      MPI_Init(&argc,&argv);
3636
 
3637
      plan = fftw2d_mpi_create_plan(MPI_COMM_WORLD,
3638
                                    NX, NY,
3639
                                    FFTW_FORWARD, FFTW_ESTIMATE);
3640
 
3641
      ...allocate and initialize data...
3642
 
3643
      fftwnd_mpi(p, 1, data, NULL, FFTW_NORMAL_ORDER);
3644
 
3645
      ...
3646
 
3647
      fftwnd_mpi_destroy_plan(plan);
3648
      MPI_Finalize();
3649
@}
3650
@end example
3651
 
3652
The calls to @code{MPI_Init} and @code{MPI_Finalize} are required in all
3653
@ffindex MPI_Init
3654
@ffindex MPI_Finalize
3655
MPI programs; see the @uref{http://www.mcs.anl.gov/mpi/,MPI home page}
3656
for more information.  Note that all of your processes run the program
3657
in parallel, as a group; there is no explicit launching of
3658
threads/processes in an MPI program.
3659
 
3660
@cindex plan
3661
As in the ordinary FFTW, the first thing we do is to create a plan (of
3662
type @code{fftwnd_mpi_plan}), using:
3663
 
3664
@example
3665
fftwnd_mpi_plan fftw2d_mpi_create_plan(MPI_Comm comm,
3666
                                       int nx, int ny,
3667
                                       fftw_direction dir, int flags);
3668
@end example
3669
@findex fftw2d_mpi_create_plan
3670
@tindex fftwnd_mpi_plan
3671
 
3672
Except for the first argument, the parameters are identical to those of
3673
@code{fftw2d_create_plan}.  (There are also analogous
3674
@code{fftwnd_mpi_create_plan} and @code{fftw3d_mpi_create_plan}
3675
functions.  Transforms of any rank greater than one are supported.)
3676
@findex fftwnd_mpi_create_plan
3677
@findex fftw3d_mpi_create_plan
3678
The first argument is an MPI @dfn{communicator}, which specifies the
3679
group of processes that are to be involved in the transform; the
3680
standard constant @code{MPI_COMM_WORLD} indicates all available
3681
processes.
3682
@fcindex MPI_COMM_WORLD
3683
 
3684
Next, one has to allocate and initialize the data.  This is somewhat
3685
tricky, because the transform data is distributed across the processes
3686
involved in the transform.  It is discussed in detail by the next
3687
section (@pxref{MPI Data Layout}).
3688
 
3689
The actual computation of the transform is performed by the function
3690
@code{fftwnd_mpi}, which differs somewhat from its uniprocessor
3691
equivalent and is described by:
3692
 
3693
@example
3694
void fftwnd_mpi(fftwnd_mpi_plan p,
3695
                int n_fields,
3696
                fftw_complex *local_data, fftw_complex *work,
3697
                fftwnd_mpi_output_order output_order);
3698
@end example
3699
@findex fftwnd_mpi
3700
 
3701
There are several things to notice here:
3702
 
3703
@itemize @bullet
3704
 
3705
@item
3706
@cindex in-place transform
3707
First of all, all @code{fftw_mpi} transforms are in-place: the output is
3708
in the @code{local_data} parameter, and there is no need to specify
3709
@code{FFTW_IN_PLACE} in the plan flags.
3710
 
3711
@item
3712
@cindex n_fields
3713
@cindex stride
3714
The MPI transforms also only support a limited subset of the
3715
@code{howmany}/@code{stride}/@code{dist} functionality of the
3716
uniprocessor routines: the @code{n_fields} parameter is equivalent to
3717
@code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}.
3718
(Conceptually, the @code{n_fields} parameter allows you to transform an
3719
array of contiguous vectors, each with length @code{n_fields}.)
3720
@code{n_fields} is @code{1} if you are only transforming a single,
3721
ordinary array.
3722
 
3723
@item
3724
The @code{work} parameter is an optional workspace.  If it is not
3725
@code{NULL}, it should be exactly the same size as the @code{local_data}
3726
array.  If it is provided, FFTW is able to use the built-in
3727
@code{MPI_Alltoall} primitive for (often) greater efficiency at the
3728
@ffindex MPI_Alltoall
3729
expense of extra storage space.
3730
 
3731
@item
3732
Finally, the last parameter specifies whether the output data has the
3733
same ordering as the input data (@code{FFTW_NORMAL_ORDER}), or if it is
3734
transposed (@code{FFTW_TRANSPOSED_ORDER}).  Leaving the data transposed
3735
@ctindex FFTW_NORMAL_ORDER
3736
@ctindex FFTW_TRANSPOSED_ORDER
3737
results in significant performance improvements due to a saved
3738
communication step (needed to un-transpose the data).  Specifically, the
3739
first two dimensions of the array are transposed, as is described in
3740
more detail by the next section.
3741
 
3742
@end itemize
3743
 
3744
@cindex normalization
3745
The output of @code{fftwnd_mpi} is identical to that of the
3746
corresponding uniprocessor transform.  In particular, you should recall
3747
our conventions for normalization and the sign of the transform
3748
exponent.
3749
 
3750
The same plan can be used to compute many transforms of the same size.
3751
After you are done with it, you should deallocate it by calling
3752
@code{fftwnd_mpi_destroy_plan}.
3753
@findex fftwnd_mpi_destroy_plan
3754
 
3755
@cindex blocking
3756
@ffindex MPI_Barrier
3757
@b{Important:} The FFTW MPI routines must be called in the same order by
3758
all processes involved in the transform.  You should assume that they
3759
all are blocking, as if each contained a call to @code{MPI_Barrier}.
3760
 
3761
Programs using the FFTW MPI routines should be linked with
3762
@code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries
3763
are required for MPI.
3764
@cindex linking on Unix
3765
 
3766
@c -------------------------------------------------------
3767
@node MPI Data Layout, Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW
3768
@subsection MPI Data Layout
3769
 
3770
@cindex distributed memory
3771
@cindex distributed array format
3772
The transform data used by the MPI FFTW routines is @dfn{distributed}: a
3773
distinct portion of it resides with each process involved in the
3774
transform.  This allows the transform to be parallelized, for example,
3775
over a cluster of workstations, each with its own separate memory, so
3776
that you can take advantage of the total memory of all the processors
3777
you are parallelizing over.
3778
 
3779
In particular, the array is divided according to the rows (first
3780
dimension) of the data: each process gets a subset of the rows of the
3781
@cindex slab decomposition
3782
data.  (This is sometimes called a ``slab decomposition.'')  One
3783
consequence of this is that you can't take advantage of more processors
3784
than you have rows (e.g. @code{64x64x64} matrix can at most use 64
3785
processors).  This isn't usually much of a limitation, however, as each
3786
processor needs a fair amount of data in order for the
3787
parallel-computation benefits to outweight the communications costs.
3788
 
3789
Below, the first dimension of the data will be referred to as
3790
@samp{@code{x}} and the second dimension as @samp{@code{y}}.
3791
 
3792
FFTW supplies a routine to tell you exactly how much data resides on the
3793
current process:
3794
 
3795
@example
3796
void fftwnd_mpi_local_sizes(fftwnd_mpi_plan p,
3797
                            int *local_nx,
3798
                            int *local_x_start,
3799
                            int *local_ny_after_transpose,
3800
                            int *local_y_start_after_transpose,
3801
                            int *total_local_size);
3802
@end example
3803
@findex fftwnd_mpi_local_sizes
3804
 
3805
Given a plan @code{p}, the other parameters of this routine are set to
3806
values describing the required data layout, described below.
3807
 
3808
@code{total_local_size} is the number of @code{fftw_complex} elements
3809
that you must allocate for your local data (and workspace, if you
3810
choose).  (This value should, of course, be multiplied by
3811
@code{n_fields} if that parameter to @code{fftwnd_mpi} is not @code{1}.)
3812
 
3813
The data on the current process has @code{local_nx} rows, starting at
3814
row @code{local_x_start}.  If @code{fftwnd_mpi} is called with
3815
@code{FFTW_TRANSPOSED_ORDER} output, then @code{y} will be the first
3816
dimension of the output, and the local @code{y} extent will be given by
3817
@code{local_ny_after_transpose} and
3818
@code{local_y_start_after_transpose}.  Otherwise, the output has the
3819
same dimensions and layout as the input.
3820
 
3821
For instance, suppose you want to transform three-dimensional data of
3822
size @code{nx x ny x nz}.  Then, the current process will store a subset
3823
of this data, of size @code{local_nx x ny x nz}, where the @code{x}
3824
indices correspond to the range @code{local_x_start} to
3825
@code{local_x_start+local_nx-1} in the ``real'' (i.e. logical) array.
3826
If @code{fftwnd_mpi} is called with @code{FFTW_TRANSPOSED_ORDER} output,
3827
@ctindex FFTW_TRANSPOSED_ORDER
3828
then the result will be a @code{ny x nx x nz} array, of which a
3829
@code{local_ny_after_transpose x nx x nz} subset is stored on the
3830
current process (corresponding to @code{y} values starting at
3831
@code{local_y_start_after_transpose}).
3832
 
3833
The following is an example of allocating such a three-dimensional array
3834
array (@code{local_data}) before the transform and initializing it to
3835
some function @code{f(x,y,z)}:
3836
 
3837
@example
3838
        fftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start,
3839
                               &local_ny_after_transpose,
3840
                               &local_y_start_after_transpose,
3841
                               &total_local_size);
3842
 
3843
        local_data = (fftw_complex*) malloc(sizeof(fftw_complex) *
3844
                                            total_local_size);
3845
 
3846
        for (x = 0; x < local_nx; ++x)
3847
                for (y = 0; y < ny; ++y)
3848
                        for (z = 0; z < nz; ++z)
3849
                                local_data[(x*ny + y)*nz + z]
3850
                                        = f(x + local_x_start, y, z);
3851
@end example
3852
 
3853
Some important things to remember:
3854
 
3855
@itemize @bullet
3856
 
3857
@item
3858
Although the local data is of dimensions @code{local_nx x ny x nz} in
3859
the above example, do @emph{not} allocate the array to be of size
3860
@code{local_nx*ny*nz}.  Use @code{total_local_size} instead.
3861
 
3862
@item
3863
The amount of data on each process will not necessarily be the same; in
3864
fact, @code{local_nx} may even be zero for some processes.  (For
3865
example, suppose you are doing a @code{6x6} transform on four
3866
processors.  There is no way to effectively use the fourth processor in
3867
a slab decomposition, so we leave it empty.  Proof left as an exercise
3868
for the reader.)
3869
 
3870
@item
3871
@cindex row-major
3872
All arrays are, of course, in row-major order (@pxref{Multi-dimensional
3873
Array Format}).
3874
 
3875
@item
3876
If you want to compute the inverse transform of the output of
3877
@code{fftwnd_mpi}, the dimensions of the inverse transform are given by
3878
the dimensions of the output of the forward transform.  For example, if
3879
you are using @code{FFTW_TRANSPOSED_ORDER} output in the above example,
3880
then the inverse plan should be created with dimensions @code{ny x nx x
3881
nz}.
3882
 
3883
@item
3884
The data layout only depends upon the dimensions of the array, not on
3885
the plan, so you are guaranteed that different plans for the same size
3886
(or inverse plans) will use the same (consistent) data layouts.
3887
 
3888
@end itemize
3889
 
3890
@c -------------------------------------------------------
3891
@node Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Data Layout, MPI FFTW
3892
@subsection Usage of MPI FFTW for Real Multi-dimensional Transforms
3893
 
3894
MPI transforms specialized for real data are also available, similiar to
3895
the uniprocessor @code{rfftwnd} transforms.  Just as in the uniprocessor
3896
case, the real-data MPI functions gain roughly a factor of two in speed
3897
(and save a factor of two in space) at the expense of more complicated
3898
data formats in the calling program.  Before reading this section, you
3899
should definitely understand how to call the uniprocessor @code{rfftwnd}
3900
functions and also the complex MPI FFTW functions.
3901
 
3902
The following is an example of a program using @code{rfftwnd_mpi}.  It
3903
computes the size @code{nx x ny x nz} transform of a real function
3904
@code{f(x,y,z)}, multiplies the imaginary part by @code{2} for fun, then
3905
computes the inverse transform.  (We'll also use
3906
@code{FFTW_TRANSPOSED_ORDER} output for the transform, and additionally
3907
supply the optional workspace parameter to @code{rfftwnd_mpi}, just to
3908
add a little spice.)
3909
 
3910
@example
3911
#include <rfftw_mpi.h>
3912
 
3913
int main(int argc, char **argv)
3914
@{
3915
     const int nx = ..., ny = ..., nz = ...;
3916
     int local_nx, local_x_start, local_ny_after_transpose,
3917
         local_y_start_after_transpose, total_local_size;
3918
     int x, y, z;
3919
     rfftwnd_mpi_plan plan, iplan;
3920
     fftw_real *data, *work;
3921
     fftw_complex *cdata;
3922
 
3923
     MPI_Init(&argc,&argv);
3924
 
3925
     /* create the forward and backward plans: */
3926
     plan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD,
3927
                                    nx, ny, nz,
3928
                                    FFTW_REAL_TO_COMPLEX,
3929
                                    FFTW_ESTIMATE);
3930
@findex rfftw3d_mpi_create_plan
3931
     iplan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD,
3932
      /* dim.'s of REAL data --> */  nx, ny, nz,
3933
                                     FFTW_COMPLEX_TO_REAL,
3934
                                     FFTW_ESTIMATE);
3935
 
3936
     rfftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start,
3937
                            &local_ny_after_transpose,
3938
                            &local_y_start_after_transpose,
3939
                            &total_local_size);
3940
@findex rfftwnd_mpi_local_sizes
3941
 
3942
     data = (fftw_real*) malloc(sizeof(fftw_real) * total_local_size);
3943
 
3944
     /* workspace is the same size as the data: */
3945
     work = (fftw_real*) malloc(sizeof(fftw_real) * total_local_size);
3946
 
3947
     /* initialize data to f(x,y,z): */
3948
     for (x = 0; x < local_nx; ++x)
3949
             for (y = 0; y < ny; ++y)
3950
                     for (z = 0; z < nz; ++z)
3951
                             data[(x*ny + y) * (2*(nz/2+1)) + z]
3952
                                     = f(x + local_x_start, y, z);
3953
 
3954
     /* Now, compute the forward transform: */
3955
     rfftwnd_mpi(plan, 1, data, work, FFTW_TRANSPOSED_ORDER);
3956
@findex rfftwnd_mpi
3957
 
3958
     /* the data is now complex, so typecast a pointer: */
3959
     cdata = (fftw_complex*) data;
3960
 
3961
     /* multiply imaginary part by 2, for fun:
3962
        (note that the data is transposed) */
3963
     for (y = 0; y < local_ny_after_transpose; ++y)
3964
             for (x = 0; x < nx; ++x)
3965
                     for (z = 0; z < (nz/2+1); ++z)
3966
                             cdata[(y*nx + x) * (nz/2+1) + z].im
3967
                                     *= 2.0;
3968
 
3969
     /* Finally, compute the inverse transform; the result
3970
        is transposed back to the original data layout: */
3971
     rfftwnd_mpi(iplan, 1, data, work, FFTW_TRANSPOSED_ORDER);
3972
 
3973
     free(data);
3974
     free(work);
3975
     rfftwnd_mpi_destroy_plan(plan);
3976
@findex rfftwnd_mpi_destroy_plan
3977
     rfftwnd_mpi_destroy_plan(iplan);
3978
     MPI_Finalize();
3979
@}
3980
@end example
3981
 
3982
There's a lot of stuff in this example, but it's all just what you would
3983
have guessed, right?  We replaced all the @code{fftwnd_mpi*} functions
3984
by @code{rfftwnd_mpi*}, but otherwise the parameters were pretty much
3985
the same.  The data layout distributed among the processes just like for
3986
the complex transforms (@pxref{MPI Data Layout}), but in addition the
3987
final dimension is padded just like it is for the uniprocessor in-place
3988
real transforms (@pxref{Array Dimensions for Real Multi-dimensional
3989
Transforms}).
3990
@cindex padding
3991
In particular, the @code{z} dimension of the real input data is padded
3992
to a size @code{2*(nz/2+1)}, and after the transform it contains
3993
@code{nz/2+1} complex values.
3994
@cindex distributed array format
3995
@cindex rfftwnd array format
3996
 
3997
Some other important things to know about the real MPI transforms:
3998
 
3999
@itemize @bullet
4000
 
4001
@item
4002
As for the uniprocessor @code{rfftwnd_create_plan}, the dimensions
4003
passed for the @code{FFTW_COMPLEX_TO_REAL} plan are those of the
4004
@emph{real} data.  In particular, even when @code{FFTW_TRANSPOSED_ORDER}
4005
@ctindex FFTW_COMPLEX_TO_REAL
4006
@ctindex FFTW_TRANSPOSED_ORDER
4007
is used as in this case, the dimensions are those of the (untransposed)
4008
real output, not the (transposed) complex input.  (For the complex MPI
4009
transforms, on the other hand, the dimensions are always those of the
4010
input array.)
4011
 
4012
@item
4013
The output ordering of the transform (@code{FFTW_TRANSPOSED_ORDER} or
4014
@code{FFTW_TRANSPOSED_ORDER}) @emph{must} be the same for both forward
4015
and backward transforms.  (This is not required in the complex case.)
4016
 
4017
@item
4018
@code{total_local_size} is the required size in @code{fftw_real} values,
4019
not @code{fftw_complex} values as it is for the complex transforms.
4020
 
4021
@item
4022
@code{local_ny_after_transpose} and @code{local_y_start_after_transpose}
4023
describe the portion of the array after the transform; that is, they are
4024
indices in the complex array for an @code{FFTW_REAL_TO_COMPLEX} transform
4025
and in the real array for an @code{FFTW_COMPLEX_TO_REAL} transform.
4026
 
4027
@item
4028
@code{rfftwnd_mpi} always expects @code{fftw_real*} array arguments, but
4029
of course these pointers can refer to either real or complex arrays,
4030
depending upon which side of the transform you are on.  Just as for
4031
in-place uniprocessor real transforms (and also in the example above),
4032
this is most easily handled by typecasting to a complex pointer when
4033
handling the complex data.
4034
 
4035
@item
4036
As with the complex transforms, there are also
4037
@code{rfftwnd_create_plan} and @code{rfftw2d_create_plan} functions, and
4038
any rank greater than one is supported.
4039
@findex rfftwnd_create_plan
4040
@findex rfftw2d_create_plan
4041
 
4042
@end itemize
4043
 
4044
Programs using the MPI FFTW real transforms should link with
4045
@code{-lrfftw_mpi -lfftw_mpi -lrfftw -lfftw -lm} on Unix.
4046
@cindex linking on Unix
4047
 
4048
@c -------------------------------------------------------
4049
@node Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Tips, Usage of MPI FFTW for Real Multi-dimensional Transforms, MPI FFTW
4050
@subsection Usage of MPI FFTW for Complex One-dimensional Transforms
4051
 
4052
The MPI FFTW also includes routines for parallel one-dimensional
4053
transforms of complex data (only).  Although the speedup is generally
4054
worse than it is for the multi-dimensional routines,@footnote{The 1D
4055
transforms require much more communication.  All the communication in
4056
our FFT routines takes the form of an all-to-all communication: the
4057
multi-dimensional transforms require two all-to-all communications (or
4058
one, if you use @code{FFTW_TRANSPOSED_ORDER}), while the one-dimensional
4059
transforms require @emph{three} (or two, if you use scrambled input or
4060
output).} these distributed-memory one-dimensional transforms are
4061
especially useful for performing one-dimensional transforms that don't
4062
fit into the memory of a single machine.
4063
 
4064
The usage of these routines is straightforward, and is similar to that
4065
of the multi-dimensional MPI transform functions.  You first include the
4066
header @code{<fftw_mpi.h>} and then create a plan by calling:
4067
 
4068
@example
4069
fftw_mpi_plan fftw_mpi_create_plan(MPI_Comm comm, int n,
4070
                                   fftw_direction dir, int flags);
4071
@end example
4072
@findex fftw_mpi_create_plan
4073
@tindex fftw_mpi_plan
4074
 
4075
The last three arguments are the same as for @code{fftw_create_plan}
4076
(except that all MPI transforms are automatically @code{FFTW_IN_PLACE}).
4077
The first argument specifies the group of processes you are using, and
4078
is usually @code{MPI_COMM_WORLD} (all processes).
4079
@fcindex MPI_COMM_WORLD
4080
A plan can be used for many transforms of the same size, and is
4081
destroyed when you are done with it by calling
4082
@code{fftw_mpi_destroy_plan(plan)}.
4083
@findex fftw_mpi_destroy_plan
4084
 
4085
If you don't care about the ordering of the input or output data of the
4086
transform, you can include @code{FFTW_SCRAMBLED_INPUT} and/or
4087
@code{FFTW_SCRAMBLED_OUTPUT} in the @code{flags}.
4088
@ctindex FFTW_SCRAMBLED_INPUT
4089
@ctindex FFTW_SCRAMBLED_OUTPUT
4090
@cindex flags
4091
These save some communications at the expense of having the input and/or
4092
output reordered in an undocumented way.  For example, if you are
4093
performing an FFT-based convolution, you might use
4094
@code{FFTW_SCRAMBLED_OUTPUT} for the forward transform and
4095
@code{FFTW_SCRAMBLED_INPUT} for the inverse transform.
4096
 
4097
The transform itself is computed by:
4098
 
4099
@example
4100
void fftw_mpi(fftw_mpi_plan p, int n_fields,
4101
              fftw_complex *local_data, fftw_complex *work);
4102
@end example
4103
@findex fftw_mpi
4104
 
4105
@cindex n_fields
4106
@code{n_fields}, as in @code{fftwnd_mpi}, is equivalent to
4107
@code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}, and
4108
should be @code{1} when you are computing the transform of a single
4109
array.  @code{local_data} contains the portion of the array local to the
4110
current process, described below.  @code{work} is either @code{NULL} or
4111
an array exactly the same size as @code{local_data}; in the latter case,
4112
FFTW can use the @code{MPI_Alltoall} communications primitive which is
4113
(usually) faster at the expense of extra storage.  Upon return,
4114
@code{local_data} contains the portion of the output local to the
4115
current process (see below).
4116
@ffindex MPI_Alltoall
4117
 
4118
@cindex distributed array format
4119
To find out what portion of the array is stored local to the current
4120
process, you call the following routine:
4121
 
4122
@example
4123
void fftw_mpi_local_sizes(fftw_mpi_plan p,
4124
                          int *local_n, int *local_start,
4125
                          int *local_n_after_transform,
4126
                          int *local_start_after_transform,
4127
                          int *total_local_size);
4128
@end example
4129
@findex fftw_mpi_local_sizes
4130
 
4131
@code{total_local_size} is the number of @code{fftw_complex} elements
4132
you should actually allocate for @code{local_data} (and @code{work}).
4133
@code{local_n} and @code{local_start} indicate that the current process
4134
stores @code{local_n} elements corresponding to the indices
4135
@code{local_start} to @code{local_start+local_n-1} in the ``real''
4136
array.  @emph{After the transform, the process may store a different
4137
portion of the array.}  The portion of the data stored on the process
4138
after the transform is given by @code{local_n_after_transform} and
4139
@code{local_start_after_transform}.  This data is exactly the same as a
4140
contiguous segment of the corresponding uniprocessor transform output
4141
(i.e. an in-order sequence of sequential frequency bins).
4142
 
4143
Note that, if you compute both a forward and a backward transform of the
4144
same size, the local sizes are guaranteed to be consistent.  That is,
4145
the local size after the forward transform will be the same as the local
4146
size before the backward transform, and vice versa.
4147
 
4148
Programs using the FFTW MPI routines should be linked with
4149
@code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries
4150
are required for MPI.
4151
@cindex linking on Unix
4152
 
4153
@c -------------------------------------------------------
4154
@node MPI Tips,  , Usage of MPI FFTW for Complex One-dimensional Transforms, MPI FFTW
4155
@subsection MPI Tips
4156
 
4157
There are several things you should consider in order to get the best
4158
performance out of the MPI FFTW routines.
4159
 
4160
@cindex load-balancing
4161
First, if possible, the first and second dimensions of your data should
4162
be divisible by the number of processes you are using.  (If only one can
4163
be divisible, then you should choose the first dimension.)  This allows
4164
the computational load to be spread evenly among the processes, and also
4165
reduces the communications complexity and overhead.  In the
4166
one-dimensional transform case, the size of the transform should ideally
4167
be divisible by the @emph{square} of the number of processors.
4168
 
4169
@ctindex FFTW_TRANSPOSED_ORDER
4170
Second, you should consider using the @code{FFTW_TRANSPOSED_ORDER}
4171
output format if it is not too burdensome.  The speed gains from
4172
communications savings are usually substantial.
4173
 
4174
Third, you should consider allocating a workspace for
4175
@code{(r)fftw(nd)_mpi}, as this can often
4176
(but not always) improve performance (at the cost of extra storage).
4177
 
4178
Fourth, you should experiment with the best number of processors to use
4179
for your problem.  (There comes a point of diminishing returns, when the
4180
communications costs outweigh the computational benefits.@footnote{An
4181
FFT is particularly hard on communications systems, as it requires an
4182
@dfn{all-to-all} communication, which is more or less the worst possible
4183
case.})  The @code{fftw_mpi_test} program can output helpful performance
4184
benchmarks.
4185
@pindex fftw_mpi_test
4186
@cindex benchmark
4187
It accepts the same parameters as the uniprocessor test programs
4188
(c.f. @code{tests/README}) and is run like an ordinary MPI program.  For
4189
example, @code{mpirun -np 4 fftw_mpi_test -s 128x128x128} will benchmark
4190
a @code{128x128x128} transform on four processors, reporting timings and
4191
parallel speedups for all variants of @code{fftwnd_mpi} (transposed,
4192
with workspace, etcetera).  (Note also that there is the
4193
@code{rfftw_mpi_test} program for the real transforms.)
4194
@pindex rfftw_mpi_test
4195
 
4196
 
4197
@c ************************************************************
4198
@node Calling FFTW from Fortran, Installation and Customization, Parallel FFTW, Top
4199
@chapter Calling FFTW from Fortran
4200
 
4201
@cindex Fortran-callable wrappers
4202
The standard FFTW libraries include special wrapper functions that allow
4203
Fortran programs to call FFTW subroutines.  This chapter describes how
4204
those functions may be employed to use FFTW from Fortran.  We assume
4205
here that the reader is already familiar with the usage of FFTW in C, as
4206
described elsewhere in this manual.
4207
 
4208
In general, it is not possible to call C functions directly from
4209
Fortran, due to Fortran's inability to pass arguments by value and also
4210
because Fortran compilers typically expect identifiers to be mangled
4211
@cindex compiler
4212
somehow for linking.  However, if C functions are written in a special
4213
way, they @emph{are} callable from Fortran, and we have employed this
4214
technique to create Fortran-callable ``wrapper'' functions around the
4215
main FFTW routines.  These wrapper functions are included in the FFTW
4216
libraries by default, unless a Fortran compiler isn't found on your
4217
system or @code{--disable-fortran} is included in the @code{configure}
4218
flags.
4219
 
4220
As a result, calling FFTW from Fortran requires little more than
4221
appending @samp{@code{_f77}} to the function names and then linking
4222
normally with the FFTW libraries.  There are a few wrinkles, however, as
4223
we shall discuss below.
4224
 
4225
@menu
4226
* Wrapper Routines::            
4227
* FFTW Constants in Fortran::  
4228
* Fortran Examples::            
4229
@end menu
4230
 
4231
@c -------------------------------------------------------
4232
@node Wrapper Routines, FFTW Constants in Fortran, Calling FFTW from Fortran, Calling FFTW from Fortran
4233
@section Wrapper Routines
4234
 
4235
All of the uniprocessor and multi-threaded transform routines have
4236
Fortran-callable wrappers, except for the wisdom import/export functions
4237
(since it is not possible to exchange string and file arguments portably
4238
with Fortran).  The name of the wrapper routine is the same as that of
4239
the corresponding C routine, but with @code{fftw/fftwnd/rfftw/rfftwnd}
4240
replaced by @code{fftw_f77/fftwnd_f77/rfftw_f77/rfftwnd_f77}.  For
4241
example, in Fortran, instead of calling @code{fftw_one} you would call
4242
@code{fftw_f77_one}.@footnote{Technically, Fortran 77 identifiers are
4243
not allowed to have more than 6 characters, nor may they contain
4244
underscores.  Any compiler that enforces this limitation doesn't deserve
4245
to link to FFTW.}
4246
@findex fftw_f77_one
4247
For the most part, all of the arguments to the functions are the same,
4248
with the following exceptions:
4249
 
4250
@itemize @bullet
4251
 
4252
@item
4253
Any function that returns a value (e.g. @code{fftw_create_plan}) is
4254
converted into a subroutine.  The return value is converted into an
4255
additional (first) parameter of the wrapper subroutine.  (The reason for
4256
this is that some Fortran implementations seem to have trouble with C
4257
function return values.)
4258
 
4259
@item
4260
@cindex in-place transform
4261
When performing one-dimensional @code{FFTW_IN_PLACE} transforms, you
4262
don't have the option of passing @code{NULL} for the @code{out} argument
4263
(since there is no way to pass @code{NULL} from Fortran).  Therefore,
4264
when performing such transforms, you @emph{must} allocate and pass a
4265
contiguous scratch array of the same size as the transform.  Note that
4266
for in-place multi-dimensional (@code{(r)fftwnd}) transforms, the
4267
@code{out} argument is ignored, so you can pass anything for that
4268
parameter.
4269
 
4270
@item
4271
@cindex column-major
4272
The wrapper routines expect multi-dimensional arrays to be in
4273
column-major order, which is the ordinary format of Fortran arrays.
4274
They do this transparently and costlessly simply by reversing the order
4275
of the dimensions passed to FFTW, but this has one important consequence
4276
for multi-dimensional real-complex transforms, discussed below.
4277
 
4278
@item
4279
@code{plan} variables (what would be of type @code{fftw_plan},
4280
@code{rfftwnd_plan}, etcetera, in C), must be declared as a type that is
4281
the same size as a pointer (address) on your machine.  (Fortran has no
4282
generic pointer type.)  The Fortran @code{integer} type is usually the
4283
same size as a pointer, but you need to be wary (especially on 64-bit
4284
machines).  (You could also use @code{integer*4} on a 32-bit machine and
4285
@code{integer*8} on a 64-bit machine.)  Ugh.  (@code{g77} has a special
4286
type, @code{integer(kind=7)}, that is defined to be the same size as a
4287
pointer.)
4288
 
4289
@end itemize
4290
 
4291
@cindex floating-point precision
4292
In general, you should take care to use Fortran data types that
4293
correspond to (i.e. are the same size as) the C types used by FFTW.  If
4294
your C and Fortran compilers are made by the same vendor, the
4295
correspondence is usually straightforward (i.e. @code{integer}
4296
corresponds to @code{int}, @code{real} corresponds to @code{float},
4297
etcetera).  Such simple correspondences are assumed in the examples
4298
below.  The examples also assume that FFTW was compiled in
4299
double precision (the default).
4300
 
4301
@c -------------------------------------------------------
4302
@node  FFTW Constants in Fortran, Fortran Examples, Wrapper Routines, Calling FFTW from Fortran
4303
@section FFTW Constants in Fortran
4304
 
4305
When creating plans in FFTW, a number of constants are used to specify
4306
options, such as @code{FFTW_FORWARD} or @code{FFTW_USE_WISDOM}.  The
4307
same constants must be used with the wrapper routines, but of course the
4308
C header files where the constants are defined can't be incorporated
4309
directly into Fortran code.
4310
 
4311
Instead, we have placed Fortran equivalents of the FFTW constant
4312
definitions in the file @code{fortran/fftw_f77.i} of the FFTW package.
4313
If your Fortran compiler supports a preprocessor, you can use that to
4314
incorporate this file into your code whenever you need to call FFTW.
4315
Otherwise, you will have to paste the constant definitions in directly.
4316
They are:
4317
 
4318
@example
4319
      integer FFTW_FORWARD,FFTW_BACKWARD
4320
      parameter (FFTW_FORWARD=-1,FFTW_BACKWARD=1)
4321
 
4322
      integer FFTW_REAL_TO_COMPLEX,FFTW_COMPLEX_TO_REAL
4323
      parameter (FFTW_REAL_TO_COMPLEX=-1,FFTW_COMPLEX_TO_REAL=1)
4324
 
4325
      integer FFTW_ESTIMATE,FFTW_MEASURE
4326
      parameter (FFTW_ESTIMATE=0,FFTW_MEASURE=1)
4327
 
4328
      integer FFTW_OUT_OF_PLACE,FFTW_IN_PLACE,FFTW_USE_WISDOM
4329
      parameter (FFTW_OUT_OF_PLACE=0)
4330
      parameter (FFTW_IN_PLACE=8,FFTW_USE_WISDOM=16)
4331
 
4332
      integer FFTW_THREADSAFE
4333
      parameter (FFTW_THREADSAFE=128)
4334
@end example
4335
 
4336
@cindex flags
4337
In C, you combine different flags (like @code{FFTW_USE_WISDOM} and
4338
@code{FFTW_MEASURE}) using the @samp{@code{|}} operator; in Fortran you
4339
should just use @samp{@code{+}}.
4340
 
4341
@c -------------------------------------------------------
4342
@node Fortran Examples,  , FFTW Constants in Fortran, Calling FFTW from Fortran
4343
@section Fortran Examples
4344
 
4345
In C you might have something like the following to transform a
4346
one-dimensional complex array:
4347
 
4348
@example
4349
        fftw_complex in[N], *out[N];
4350
        fftw_plan plan;
4351
 
4352
        plan = fftw_create_plan(N,FFTW_FORWARD,FFTW_ESTIMATE);
4353
        fftw_one(plan,in,out);
4354
        fftw_destroy_plan(plan);
4355
@end example
4356
 
4357
In Fortran, you use the following to accomplish the same thing:
4358
 
4359
@example
4360
        double complex in, out
4361
        dimension in(N), out(N)
4362
        integer plan
4363
 
4364
        call fftw_f77_create_plan(plan,N,FFTW_FORWARD,FFTW_ESTIMATE)
4365
        call fftw_f77_one(plan,in,out)
4366
        call fftw_f77_destroy_plan(plan)
4367
@end example
4368
@findex fftw_f77_create_plan
4369
@findex fftw_f77_one
4370
@findex fftw_f77_destroy_plan
4371
 
4372
Notice how all routines are called as Fortran subroutines, and the plan
4373
is returned via the first argument to @code{fftw_f77_create_plan}.  To
4374
do the same thing, but using 8 threads in parallel
4375
(@pxref{Multi-threaded FFTW}), you would simply replace the call to
4376
@code{fftw_f77_one} with:
4377
 
4378
@example
4379
        call fftw_f77_threads_one(8,plan,in,out)
4380
@end example
4381
@findex fftw_f77_threads_one
4382
 
4383
To transform a three-dimensional array in-place with C, you might do:
4384
 
4385
@example
4386
        fftw_complex arr[L][M][N];
4387
        fftwnd_plan plan;
4388
        int n[3] = @{L,M,N@};
4389
 
4390
        plan = fftwnd_create_plan(3,n,FFTW_FORWARD,
4391
                                  FFTW_ESTIMATE | FFTW_IN_PLACE);
4392
        fftwnd_one(plan, arr, 0);
4393
        fftwnd_destroy_plan(plan);
4394
@end example
4395
 
4396
In Fortran, you would use this instead:
4397
 
4398
@example
4399
        double complex arr
4400
        dimension arr(L,M,N)
4401
        integer n
4402
        dimension n(3)
4403
        integer plan
4404
 
4405
        n(1) = L
4406
        n(2) = M
4407
        n(3) = N
4408
        call fftwnd_f77_create_plan(plan,3,n,FFTW_FORWARD,
4409
       +                            FFTW_ESTIMATE + FFTW_IN_PLACE)
4410
        call fftwnd_f77_one(plan, arr, 0)
4411
        call fftwnd_f77_destroy_plan(plan)
4412
@end example
4413
@findex fftwnd_f77_create_plan
4414
@findex fftwnd_f77_one
4415
@findex fftwnd_f77_destroy_plan
4416
 
4417
Instead of calling @code{fftwnd_f77_create_plan(plan,3,n,...)}, we could
4418
also have called @code{fftw3d_f77_create_plan(plan,L,M,N,...)}.
4419
@findex fftw3d_f77_create_plan
4420
 
4421
Note that we pass the array dimensions in the "natural" order; also
4422
note that the last argument to @code{fftwnd_f77} is ignored since the
4423
transform is @code{FFTW_IN_PLACE}.
4424
 
4425
To transform a one-dimensional real array in Fortran, you might do:
4426
 
4427
@example
4428
        double precision in, out
4429
        dimension in(N), out(N)
4430
        integer plan
4431
 
4432
        call rfftw_f77_create_plan(plan,N,FFTW_REAL_TO_COMPLEX,
4433
       +                           FFTW_ESTIMATE)
4434
        call rfftw_f77_one(plan,in,out)
4435
        call rfftw_f77_destroy_plan(plan)
4436
@end example
4437
@findex rfftw_f77_create_plan
4438
@findex rfftw_f77_one
4439
@findex rfftw_f77_destroy_plan
4440
 
4441
To transform a two-dimensional real array, out of place, you might use
4442
the following:
4443
 
4444
@example
4445
        double precision in
4446
        double complex out
4447
        dimension in(M,N), out(M/2 + 1, N)
4448
        integer plan
4449
 
4450
        call rfftw2d_f77_create_plan(plan,M,N,FFTW_REAL_TO_COMPLEX,
4451
       +                             FFTW_ESTIMATE)
4452
        call rfftwnd_f77_one_real_to_complex(plan, in, out)
4453
        call rfftwnd_f77_destroy_plan(plan)
4454
@end example
4455
@findex rfftw2d_f77_create_plan
4456
@findex rfftwnd_f77_one_real_to_complex
4457
@findex rfftwnd_f77_destroy_plan
4458
 
4459
@b{Important:} Notice that it is the @emph{first} dimension of the
4460
complex output array that is cut in half in Fortran, rather than the
4461
last dimension as in C.  This is a consequence of the wrapper routines
4462
reversing the order of the array dimensions passed to FFTW so that the
4463
Fortran program can use its ordinary column-major order.
4464
@cindex column-major
4465
@cindex rfftwnd array format
4466
 
4467
 
4468
@c ************************************************************
4469
@node Installation and Customization, Acknowledgments, Calling FFTW from Fortran, Top
4470
@chapter Installation and Customization
4471
 
4472
This chapter describes the installation and customization of FFTW, the
4473
latest version of which may be downloaded from
4474
@uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}.
4475
 
4476
As distributed, FFTW makes very few assumptions about your system.  All
4477
you need is an ANSI C compiler (@code{gcc} is fine, although
4478
vendor-provided compilers often produce faster code).
4479
@cindex compiler
4480
However, installation of FFTW is somewhat simpler if you have a Unix or
4481
a GNU system, such as Linux.  In this chapter, we first describe the
4482
installation of FFTW on Unix and non-Unix systems.  We then describe how
4483
you can customize FFTW to achieve better performance.  Specifically, you
4484
can I) enable @code{gcc}/x86-specific hacks that improve performance on
4485
Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock
4486
of your machine, if any; III) produce code (@emph{codelets}) to support
4487
fast transforms of sizes that are not supported efficiently by the
4488
standard FFTW distribution.
4489
@cindex installation
4490
 
4491
@menu
4492
* Installation on Unix::        
4493
* Installation on non-Unix Systems::  
4494
* Installing FFTW in both single and double precision::  
4495
* gcc and Pentium/PentiumPro hacks::  
4496
* Customizing the timer::      
4497
* Generating your own code::    
4498
@end menu
4499
 
4500
@node Installation on Unix, Installation on non-Unix Systems, Installation and Customization, Installation and Customization
4501
@section Installation on Unix
4502
 
4503
FFTW comes with a @code{configure} program in the GNU style.
4504
Installation can be as simple as:
4505
@fpindex configure
4506
 
4507
@example
4508
./configure
4509
make
4510
make install
4511
@end example
4512
 
4513
This will build the uniprocessor complex and real transform libraries
4514
along with the test programs.  We strongly recommend that you use GNU
4515
@code{make} if it is available; on some systems it is called
4516
@code{gmake}.  The ``@code{make install}'' command installs the fftw and
4517
rfftw libraries in standard places, and typically requires root
4518
privileges (unless you specify a different install directory with the
4519
@code{--prefix} flag to @code{configure}).  You can also type
4520
``@code{make check}'' to put the FFTW test programs through their paces.
4521
If you have problems during configuration or compilation, you may want
4522
to run ``@code{make distclean}'' before trying again; this ensures that
4523
you don't have any stale files left over from previous compilation
4524
attempts.
4525
 
4526
The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
4527
@cindex compiler flags
4528
for a few systems.  If your system is not known, the @code{configure}
4529
script will print out a warning.  @footnote{Each version of @code{cc}
4530
seems to have its own magic incantation to get the fastest code most of
4531
the time---you'd think that people would have agreed upon some
4532
convention, e.g. "@code{-Omax}", by now.}  In this case, you can compile
4533
FFTW with the command
4534
@example
4535
make CFLAGS="<write your CFLAGS here>"
4536
@end example
4537
If you do find an optimal set of @code{CFLAGS} for your system, please
4538
let us know what they are (along with the output of @code{config.guess})
4539
so that we can include them in future releases.
4540
 
4541
The @code{configure} program supports all the standard flags defined by
4542
the GNU Coding Standards; see the @code{INSTALL} file in FFTW or
4543
@uref{http://www.gnu.org/prep/standards_toc.html, the GNU web page}.
4544
Note especially @code{--help} to list all flags and
4545
@code{--enable-shared} to create shared, rather than static, libraries.
4546
@code{configure} also accepts a few FFTW-specific flags, particularly:
4547
 
4548
@itemize @bullet
4549
 
4550
@item
4551
@cindex floating-point precision
4552
@code{--enable-float} Produces a single-precision version of FFTW
4553
(@code{float}) instead of the default double-precision (@code{double}).
4554
@xref{Installing FFTW in both single and double precision}.
4555
 
4556
@item
4557
@code{--enable-type-prefix} Adds a @samp{d} or @samp{s} prefix to all
4558
installed libraries and header files to indicate the floating-point
4559
precision.  @xref{Installing FFTW in both single and double
4560
precision}.  (@code{--enable-type-prefix=<prefix>} lets you add an
4561
arbitrary prefix.)  By default, no prefix is used.
4562
 
4563
@item
4564
@cindex threads
4565
@code{--enable-threads} Enables compilation and installation of the FFTW
4566
threads library (@pxref{Multi-threaded FFTW}), which provides a
4567
simple interface to parallel transforms for SMP systems.  (By default,
4568
the threads routines are not compiled.)
4569
 
4570
@item
4571
@cindex MPI
4572
@code{--enable-mpi} Enables compilation and installation of the FFTW MPI
4573
library (@pxref{MPI FFTW}), which provides parallel transforms for
4574
distributed-memory systems with MPI.  (By default, the MPI routines are
4575
not compiled.)
4576
 
4577
@item
4578
@cindex Fortran-callable wrappers
4579
@code{--disable-fortran} Disables inclusion of Fortran-callable wrapper
4580
routines (@pxref{Calling FFTW from Fortran}) in the standard FFTW
4581
libraries.  These wrapper routines increase the library size by only a
4582
negligible amount, so they are included by default as long as the
4583
@code{configure} script finds a Fortran compiler on your system.
4584
 
4585
@item
4586
@code{--with-gcc} Enables the use of @code{gcc}.  By default, FFTW uses
4587
the vendor-supplied @code{cc} compiler if present.  Unfortunately,
4588
@code{gcc} produces slower code than @code{cc} on many systems.
4589
 
4590
@item
4591
@code{--enable-i386-hacks}  See below.
4592
 
4593
@item
4594
@code{--enable-pentium-timer}  See below.
4595
 
4596
@end itemize
4597
 
4598
To force @code{configure} to use a particular C compiler (instead of the
4599
@cindex compiler
4600
default, usually @code{cc}), set the environment variable @code{CC} to
4601
the name of the desired compiler before running @code{configure}; you
4602
may also need to set the flags via the variable @code{CFLAGS}.
4603
@cindex compiler flags
4604
 
4605
@node Installation on non-Unix Systems, Installing FFTW in both single and double precision, Installation on Unix, Installation and Customization
4606
@section Installation on non-Unix Systems
4607
 
4608
It is quite straightforward to install FFTW even on non-Unix systems
4609
lacking the niceties of the @code{configure} script.  The FFTW Home Page
4610
may include some FFTW packages preconfigured for particular
4611
systems/compilers, and also contains installation notes sent in by
4612
@cindex compiler
4613
users.  All you really need to do, though, is to compile all of the
4614
@code{.c} files in the appropriate directories of the FFTW package.
4615
(You needn't worry about the many extraneous files lying around.)
4616
 
4617
For the complex transforms, compile all of the @code{.c} files in the
4618
@code{fftw} directory and link them into a library.  Similarly, for the
4619
real transforms, compile all of the @code{.c} files in the @code{rfftw}
4620
directory into a library.  Note that these sources @code{#include}
4621
various files in the @code{fftw} and @code{rfftw} directories, so you
4622
may need to set up the @code{#include} paths for your compiler
4623
appropriately.  Be sure to enable the highest-possible level of
4624
optimization in your compiler.
4625
 
4626
@cindex floating-point precision
4627
By default, FFTW is compiled for double-precision transforms.  To work
4628
in single precision rather than double precision, @code{#define} the
4629
symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} (in the @code{fftw}
4630
directory) and (re)compile FFTW.
4631
 
4632
These libraries should be linked with any program that uses the
4633
corresponding transforms.  The required header files, @code{fftw.h} and
4634
@code{rfftw.h}, are located in the @code{fftw} and @code{rfftw}
4635
directories respectively; you may want to put them with the libraries,
4636
or wherever header files normally go on your system.
4637
 
4638
FFTW includes test programs, @code{fftw_test} and @code{rfftw_test}, in
4639
@pindex fftw_test
4640
@pindex rfftw_test
4641
the @code{tests} directory.  These are compiled and linked like any
4642
program using FFTW, except that they use additional header files located
4643
in the @code{fftw} and @code{rfftw} directories, so you will need to set
4644
your compiler @code{#include} paths appropriately.  @code{fftw_test} is
4645
compiled from @code{fftw_test.c} and @code{test_main.c}, while
4646
@code{rfftw_test} is compiled from @code{rfftw_test.c} and
4647
@code{test_main.c}.  When you run these programs, you will be prompted
4648
interactively for various possible tests to perform; see also
4649
@code{tests/README} for more information.
4650
 
4651
@node Installing FFTW in both single and double precision, gcc and Pentium/PentiumPro hacks, Installation on non-Unix Systems, Installation and Customization
4652
@section Installing FFTW in both single and double precision
4653
 
4654
@cindex floating-point precision
4655
It is often useful to install both single- and double-precision versions
4656
of the FFTW libraries on the same machine, and we provide a convenient
4657
mechanism for achieving this on Unix systems.
4658
 
4659
@fpindex configure
4660
When the @code{--enable-type-prefix} option of configure is used, the
4661
FFTW libraries and header files are installed with a prefix of @samp{d}
4662
or @samp{s}, depending upon whether you compiled in double or single
4663
precision.  Then, instead of linking your program with @code{-lrfftw
4664
-lfftw}, for example, you would link with @code{-ldrfftw -ldfftw} to use
4665
the double-precision version or with @code{-lsrfftw -lsfftw} to use the
4666
single-precision version.  Also, you would @code{#include}
4667
@code{<drfftw.h>} or @code{<srfftw.h>} instead of @code{<rfftw.h>}, and
4668
so on.
4669
 
4670
@emph{The names of FFTW functions, data types, and constants remain
4671
unchanged!}  You still call, for instance, @code{fftw_one} and not
4672
@code{dfftw_one}.  Only the names of header files and libraries are
4673
modified.  One consequence of this is that @emph{you @b{cannot} use both
4674
the single- and double-precision FFTW libraries in the same program,
4675
simultaneously,} as the function names would conflict.
4676
 
4677
So, to install both the single- and double-precision libraries on the
4678
same machine, you would do:
4679
 
4680
@example
4681
./configure --enable-type-prefix @i{[ other options ]}
4682
make
4683
make install
4684
make clean
4685
./configure --enable-float --enable-type-prefix @i{[ other options ]}
4686
make
4687
make install
4688
@end example
4689
 
4690
@node gcc and Pentium/PentiumPro hacks, Customizing the timer, Installing FFTW in both single and double precision, Installation and Customization
4691
@section @code{gcc} and Pentium/PentiumPro hacks
4692
@cindex Pentium hack
4693
The @code{configure} option @code{--enable-i386-hacks} enables specific
4694
optimizations for @code{gcc} and Pentium/PentiumPro, which can
4695
significantly improve performance of double-precision transforms.
4696
Specifically, we have tested these hacks on Linux with @code{gcc}
4697
2.[78] and versions of @code{egcs} since 1.0.3.  These optimizations
4698
only affect the performance, not the correctness of FFTW (i.e. it is
4699
always safe to try them out).
4700
 
4701
These hacks provide a workaround to the incorrect alignment of local
4702
@code{double} variables in @code{gcc}.  The compiler aligns these
4703
@cindex compiler
4704
variables to multiples of 4 bytes, but execution is much faster (on
4705
Pentium and PentiumPro) if @code{double}s are aligned to a multiple of 8
4706
bytes.  By carefully counting the number of variables allocated by the
4707
compiler in performance-critical regions of the code, we have been able
4708
to introduce dummy allocations (using @code{alloca}) that align the
4709
stack properly.  The hack depends crucially on the compiler flags that
4710
are used.  For example, it won't work without
4711
@code{-fomit-frame-pointer}.
4712
 
4713
The @code{fftw_test} program outputs speed measurements that you can use
4714
to see if these hacks are beneficial.
4715
@pindex fftw_test
4716
@cindex benchmark
4717
 
4718
The @code{configure} option @code{--enable-pentium-timer} enables the
4719
use of the Pentium and PentiumPro cycle counter for timing purposes.  In
4720
order to get correct results, you must define @code{FFTW_CYCLES_PER_SEC}
4721
in @code{fftw/config.h} to be the clock speed of your processor; the
4722
resulting FFTW library will be nonportable.  The use of this option is
4723
deprecated.  On serious operating systems (such as Linux), FFTW uses
4724
@code{gettimeofday()}, which has enough resolution and is portable.
4725
(Note that Win32 has its own high-resolution timing routines as well.
4726
FFTW contains unsupported code to use these routines.)
4727
 
4728
@node Customizing the timer, Generating your own code, gcc and Pentium/PentiumPro hacks, Installation and Customization
4729
@section Customizing the timer
4730
@cindex timer, customization of
4731
 
4732
FFTW needs a reasonably-precise clock in order to find the optimal way
4733
to compute a transform.  On Unix systems, @code{configure} looks for
4734
@code{gettimeofday} and other system-specific timers.  If it does not
4735
find any high resolution clock, it defaults to using the @code{clock()}
4736
function, which is very portable, but forces FFTW to run for a long time
4737
in order to get reliable measurements.
4738
@ffindex gettimeofday
4739
@ffindex clock
4740
 
4741
If your machine supports a high-resolution clock not recognized by FFTW,
4742
it is therefore advisable to use it.  You must edit
4743
@code{fftw/fftw-int.h}.  There are a few macros you must redefine.  The
4744
code is documented and should be self-explanatory.  (By the way,
4745
@code{fftw-int} stands for @code{fftw-internal}, but for some
4746
inexplicable reason people are still using primitive systems with 8.3
4747
filenames.)
4748
 
4749
Even if you don't install high-resolution timing code, we still
4750
recommend that you look at the @code{FFTW_TIME_MIN} constant in
4751
@ctindex FFTW_TIME_MIN
4752
@code{fftw/fftw-int.h}. This constant holds the minimum time interval (in
4753
seconds) required to get accurate timing measurements, and should be (at
4754
least) several hundred times the resolution of your clock.  The default
4755
constants are on the conservative side, and may cause FFTW to take
4756
longer than necessary when you create a plan. Set @code{FFTW_TIME_MIN}
4757
to whatever is appropriate on your system (be sure to set the
4758
@emph{right} @code{FFTW_TIME_MIN}@dots{}there are several definitions in
4759
@code{fftw-int.h}, corresponding to different platforms and timers).
4760
 
4761
As an aid in checking the resolution of your clock, you can use the
4762
@code{tests/fftw_test} program with the @code{-t} option
4763
(c.f. @code{tests/README}). Remember, the mere fact that your clock
4764
reports times in, say, picoseconds, does not mean that it is actually
4765
@emph{accurate} to that resolution.
4766
 
4767
@node Generating your own code,  , Customizing the timer, Installation and Customization
4768
@section Generating your own code
4769
@cindex Caml
4770
@cindex ML
4771
@cindex code generator
4772
 
4773
If you know that you will only use transforms of a certain size (say,
4774
powers of @math{2}) and want to reduce the size of the library, you can
4775
reconfigure FFTW to support only those sizes you are interested in.  You
4776
may even generate code to enable efficient transforms of a size not
4777
supported by the default distribution.  The default distribution
4778
supports transforms of any size, but not all sizes are equally fast.
4779
The default installation of FFTW is best at handling sizes of the form
4780
@ifinfo
4781
@math{2^a 3^b 5^c 7^d 11^e 13^f},
4782
@end ifinfo
4783
@iftex
4784
@tex
4785
$2^a 3^b 5^c 7^d 11^e 13^f$,
4786
@end tex
4787
@end iftex
4788
@ifhtml
4789
2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
4790
        11<SUP>e</SUP> 13<SUP>f</SUP>,
4791
@end ifhtml
4792
where @math{e+f} is either @math{0} or
4793
@math{1}, and the other exponents are arbitrary.  Other sizes are
4794
computed by means of a slow, general-purpose routine.  However, if you
4795
have an application that requires fast transforms of size, say,
4796
@code{17}, there is a way to generate specialized code to handle that.
4797
 
4798
The directory @code{gensrc} contains all the programs and scripts that
4799
were used to generate FFTW.  In particular, the program
4800
@code{gensrc/genfft.ml} was used to generate the code that FFTW uses to
4801
compute the transforms.  We do not expect casual users to use it.
4802
@code{genfft} is a rather sophisticated program that generates directed
4803
acyclic graphs of FFT algorithms and performs algebraic simplifications
4804
on them.  @code{genfft} is written in Objective Caml, a dialect of ML.
4805
Objective Caml is described at @uref{http://pauillac.inria.fr/ocaml/}
4806
and can be downloaded from from @uref{ftp://ftp.inria.fr/lang/caml-light}.
4807
@pindex genfft
4808
@cindex Caml
4809
 
4810
If you have Objective Caml installed, you can type @code{sh
4811
bootstrap.sh} in the top-level directory to re-generate the files.  If
4812
you change the @code{gensrc/config} file, you can optimize FFTW for
4813
sizes that are not currently supported efficiently (say, 17 or 19).
4814
 
4815
We do not provide more details about the code-generation process, since
4816
we do not expect that users will need to generate their own code.
4817
However, feel free to contact us at @email{fftw@@theory.lcs.mit.edu} if
4818
you are interested in the subject.  
4819
 
4820
@cindex monadic programming
4821
You might find it interesting to learn Caml and/or some modern
4822
programming techniques that we used in the generator (including monadic
4823
programming), especially if you heard the rumor that Java and
4824
object-oriented programming are the latest advancement in the field.
4825
The internal operation of the codelet generator is described in the
4826
paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
4827
available from the @uref{http://theory.lcs.mit.edu/~fftw,FFTW home page}
4828
and will appear in the @cite{Proceedings of the 1999 ACM SIGPLAN
4829
Conference on Programming Language Design and Implementation (PLDI)}.
4830
 
4831
@c ************************************************************
4832
@node Acknowledgments, License and Copyright, Installation and Customization, Top
4833
@chapter Acknowledgments
4834
 
4835
Matteo Frigo was supported in part by the Defense Advanced Research
4836
Projects Agency (DARPA) under Grants N00014-94-1-0985 and
4837
F30602-97-1-0270, and by a Digital Equipment Corporation Fellowship.
4838
Steven G. Johnson was supported in part by a DoD NDSEG Fellowship, an
4839
MIT Karl Taylor Compton Fellowship, and by the Materials Research
4840
Science and Engineering Center program of the National Science
4841
Foundation under award DMR-9400334.
4842
 
4843
Both authors were also supported in part by their respective
4844
girlfriends, by the letters ``Q'' and ``R'', and by the number 12.
4845
@cindex girlfriends
4846
 
4847
We are grateful to SUN Microsystems Inc.@ for its donation of a cluster
4848
of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These machines
4849
served as the primary platform for the development of earlier versions
4850
of FFTW.
4851
 
4852
We thank Intel Corporation for donating a four-processor Pentium Pro
4853
machine.  We thank the Linux community for giving us a decent OS to run
4854
on that machine.
4855
 
4856
The @code{genfft} program was written using Objective Caml, a dialect of
4857
ML.  Objective Caml is a small and elegant language developed by Xavier
4858
Leroy.  The implementation is available from @code{ftp.inria.fr} in the
4859
directory @code{lang/caml-light}.  We used versions 1.07 and 2.00 of the
4860
software.  In previous releases of FFTW, @code{genfft} was written in
4861
Caml Light, by the same authors.  An even earlier implementation of
4862
@code{genfft} was written in Scheme, but Caml is definitely better for
4863
this kind of application.
4864
@pindex genfft
4865
@cindex Caml
4866
@cindex LISP
4867
 
4868
FFTW uses many tools from the GNU project, including @code{automake},
4869
@code{texinfo}, and @code{libtool}.
4870
 
4871
Prof.@ Charles E.@ Leiserson of MIT provided continuous support and
4872
encouragement.  This program would not exist without him.  Charles also
4873
proposed the name ``codelets'' for the basic FFT blocks.
4874
 
4875
Prof.@ John D.@ Joannopoulos of MIT demonstrated continuing tolerance of
4876
Steven's ``extra-curricular'' computer-science activities.  Steven's
4877
chances at a physics degree would not exist without him.
4878
 
4879
Andrew Sterian contributed the Windows timing code.  
4880
 
4881
Didier Miras reported a bug in the test procedure used in FFTW 1.2.  We
4882
now use a completely different test algorithm by Funda Ergun that does
4883
not require a separate FFT program to compare against.
4884
 
4885
Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
4886
that help portability.
4887
 
4888
Ming-Chang Liu uncovered a well-hidden bug in the complex transforms of
4889
FFTW 2.0 and supplied a patch to correct it.
4890
 
4891
The FFTW FAQ was written in @code{bfnn} (Bizarre Format With No Name)
4892
and formatted using the tools developed by Ian Jackson for the Linux
4893
FAQ.
4894
 
4895
@emph{We are especially thankful to all of our users for their
4896
continuing support, feedback, and interest during our development of
4897
FFTW.}
4898
 
4899
@c ************************************************************
4900
@node License and Copyright, Concept Index, Acknowledgments, Top
4901
@chapter License and Copyright
4902
 
4903
FFTW is copyright @copyright{} 1997--1999 Massachusetts Institute of
4904
Technology.
4905
 
4906
FFTW is free software; you can redistribute it and/or modify
4907
it under the terms of the GNU General Public License as published by
4908
the Free Software Foundation; either version 2 of the License, or
4909
(at your option) any later version.
4910
 
4911
This program is distributed in the hope that it will be useful,
4912
but WITHOUT ANY WARRANTY; without even the implied warranty of
4913
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
4914
GNU General Public License for more details.
4915
 
4916
You should have received a copy of the GNU General Public License along
4917
with this program; if not, write to the Free Software Foundation, Inc.,
4918
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.  You can also
4919
find the @uref{http://www.gnu.org/copyleft/gpl.html, GPL on the GNU web
4920
site}.
4921
 
4922
In addition, we kindly ask you to acknowledge FFTW and its authors in
4923
any program or publication in which you use FFTW.  (You are not
4924
@emph{required} to do so; it is up to your common sense to decide
4925
whether you want to comply with this request or not.)
4926
 
4927
Non-free versions of FFTW are available under terms different than the
4928
General Public License. (e.g. they do not require you to accompany any
4929
object code using FFTW with the corresponding source code.)  For these
4930
alternate terms you must purchase a license from MIT's Technology
4931
Licensing Office.  Users interested in such a license should contact us
4932
(@email{fftw@@theory.lcs.mit.edu}) for more information.
4933
 
4934
@node Concept Index, Library Index, License and Copyright, Top
4935
@chapter Concept Index
4936
@printindex cp
4937
 
4938
@node Library Index,  , Concept Index, Top
4939
@chapter Library Index
4940
@printindex fn
4941
 
4942
@c ************************************************************
4943
@contents
4944
 
4945
@bye
4946
@c  LocalWords:  texinfo fftw texi Exp setfilename settitle setchapternewpage
4947
@c  LocalWords:  syncodeindex fn cp vr pg tp ifinfo titlepage sp Matteo Frigo
4948
@c  LocalWords:  vskip pt filll dir detailmenu halfcomplex DFT fftwnd rfftw gcc
4949
@c  LocalWords:  rfftwnd Pentium PentiumPro NJ FFT cindex ESSL FFTW's emph uref
4950
@c  LocalWords:  http lcs mit edu benchfft benchFFT pindex dfn iftex tex cdot
4951
@c  LocalWords:  ifhtml nbsp TR Sep Proc ICASSP Tukey Rader's ref xref int FFTs
4952
@c  LocalWords:  findex tindex vindex im strided pxref DC lfftw lm samp const
4953
@c  LocalWords:  nx ny nz ldots rk ik hc freq lrfftw datatypes cdots pointwise
4954
@c  LocalWords:  pinv ij rote increaseth Ecclesiastes nerd stdout fopen printf
4955
@c  LocalWords:  fclose dimension's malloc sizeof comp lang www eskimo com scs
4956
@c  LocalWords:  faq html README Cilk SMP cilk POSIX Solaris BeOS MacOS mpi mcs
4957
@c  LocalWords:  anl gov mutex THREADSAFE struct leq conj lt hbox istride lg rt
4958
@c  LocalWords:  ostride subsubheading howmany idist odist exp IMG SRC gif IFFT
4959
@c  LocalWords:  frftw dist datatype datatype's nc noindent callback putc getc
4960
@c  LocalWords:  EOF Pentia PentiumPro's codelets CFLAGS cc Omax config org toc
4961
@c  LocalWords:  pentium preconfigured egcs alloca fomit SEC gettimeofday MIN
4962
@c  LocalWords:  picoseconds Caml gensrc genfft pauillac inria fr ocaml ftp sh
4963
@c  LocalWords:  caml DoD NDSEG DMR HPC SMPs Gflops automake libtool Leiserson
4964
@c  LocalWords:  Sterian Didier Miras Funda Ergun Reimer bfnn copyleft gpl
4965
@c  LocalWords:  printindex LocalWords