Subversion Repositories shark

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 pj 1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2
<HTML>
3
<HEAD>
4
<!-- This HTML file has been created by texi2html 1.52
5
     from fftw.texi on 18 May 1999 -->
6
 
7
<TITLE>FFTW - Introduction</TITLE>
8
</HEAD>
9
<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
10
Go to the first, previous, <A HREF="fftw_2.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
11
<P><HR><P>
12
 
13
 
14
<H1><A NAME="SEC1">Introduction</A></H1>
15
<P>
16
This manual documents version 2.1.2 of FFTW, the <EM>Fastest
17
Fourier Transform in the West</EM>.  FFTW is a comprehensive collection of
18
fast C routines for computing the discrete Fourier transform (DFT) in
19
one or more dimensions, of both real and complex data, and of arbitrary
20
input size.  FFTW also includes parallel transforms for both shared- and
21
distributed-memory systems.  We assume herein that the reader is already
22
familiar with the properties and uses of the DFT that are relevant to
23
her application.  Otherwise, see e.g. <CITE>The Fast Fourier Transform</CITE>
24
by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974).
25
<A HREF="http://theory.lcs.mit.edu/~fftw">Our web page</A> also has links to
26
FFT-related information online.
27
<A NAME="IDX1"></A>
28
 
29
 
30
<P>
31
FFTW is usually faster (and sometimes much faster) than all other
32
freely-available Fourier transform programs found on the Net.  For
33
transforms whose size is a power of two, it compares favorably with the
34
FFT codes in Sun's Performance Library and IBM's ESSL library, which are
35
targeted at specific machines.  Moreover, FFTW's performance is
36
<EM>portable</EM>.  Indeed, FFTW is unique in that it automatically adapts
37
itself to your machine, your cache, the size of your memory, the number
38
of registers, and all the other factors that normally make it impossible
39
to optimize a program for more than one machine.  An extensive
40
comparison of FFTW's performance with that of other Fourier transform
41
codes has been made. The results are available on the Web at
42
<A HREF="http://theory.lcs.mit.edu/~benchfft">the benchFFT home page</A>.
43
<A NAME="IDX2"></A>
44
<A NAME="IDX3"></A>
45
 
46
 
47
<P>
48
In order to use FFTW effectively, you need to understand one basic
49
concept of FFTW's internal structure.  FFTW does not used a fixed
50
algorithm for computing the transform, but it can adapt the DFT
51
algorithm to details of the underlying hardware in order to achieve best
52
performance.  Hence, the computation of the transform is split into two
53
phases.  First, FFTW's <EM>planner</EM> is called, which "learns" the
54
<A NAME="IDX4"></A>
55
fastest way to compute the transform on your machine.  The planner
56
<A NAME="IDX5"></A>
57
produces a data structure called a <EM>plan</EM> that contains this
58
information.  Subsequently, the plan is passed to FFTW's <EM>executor</EM>,
59
<A NAME="IDX6"></A>
60
along with an array of input data.  The executor computes the actual
61
transform, as dictated by the plan.  The plan can be reused as many
62
times as needed.  In typical high-performance applications, many
63
transforms of the same size are computed, and consequently a
64
relatively-expensive initialization of this sort is acceptable.  On the
65
other hand, if you need a single transform of a given size, the one-time
66
cost of the planner becomes significant.  For this case, FFTW provides
67
fast planners based on heuristics or on previously computed plans.
68
 
69
 
70
<P>
71
The pattern of planning/execution applies to all four operation modes of
72
FFTW, that is, I) one-dimensional complex transforms (FFTW), II)
73
multi-dimensional complex transforms (FFTWND), III) one-dimensional
74
transforms of real data (RFFTW), IV) multi-dimensional transforms of
75
real data (RFFTWND).  Each mode comes with its own planner and executor.
76
 
77
 
78
<P>
79
Besides the automatic performance adaptation performed by the planner,
80
it is also possible for advanced users to customize FFTW for their
81
special needs.  As distributed, FFTW works most efficiently for arrays
82
whose size can be factored into small primes (2, 3,
83
5, and 7), and uses a slower general-purpose routine for
84
other factors.  FFTW, however, comes with a code generator that can
85
produce fast C programs for any particular array size you may care
86
about.
87
<A NAME="IDX7"></A>
88
For example, if you need transforms of size
89
513&nbsp;=&nbsp;19*3<sup>3</sup>,
90
you can customize FFTW to support the factor 19 efficiently.
91
 
92
 
93
<P>
94
FFTW can exploit multiple processors if you have them.  FFTW comes with
95
a shared-memory implementation on top of POSIX (and similar) threads, as
96
well as a distributed-memory implementation based on MPI.
97
<A NAME="IDX8"></A>
98
<A NAME="IDX9"></A>
99
<A NAME="IDX10"></A>
100
We also provide an experimental parallel implementation written in Cilk,
101
<EM>the superior programming tool of choice for discriminating
102
hackers</EM> (Olin Shivers).  (See <A HREF="http://supertech.lcs.mit.edu/cilk">the Cilk home page</A>.)
103
<A NAME="IDX11"></A>
104
 
105
 
106
<P>
107
For more information regarding FFTW, see the paper, "The Fastest
108
Fourier Transform in the West," by M. Frigo and S. G. Johnson, which is
109
the technical report MIT-LCS-TR-728 (Sep. '97).  See also, "FFTW: An
110
Adaptive Software Architecture for the FFT," by M. Frigo and
111
S. G. Johnson, which appeared in the 23rd International Conference on
112
Acoustics, Speech, and Signal Processing (<CITE>Proc. ICASSP 1998</CITE>
113
<B>3</B>, p. 1381).  The code generator is described in the paper "A Fast
114
Fourier Transform Compiler",
115
<A NAME="IDX12"></A>
116
by M. Frigo, to appear in the <CITE>Proceedings of the 1999 ACM SIGPLAN
117
Conference on Programming Language Design and Implementation (PLDI),
118
Atlanta, Georgia, May 1999</CITE>.  These papers, along with the latest
119
version of FFTW, the FAQ, benchmarks, and other links, are available at
120
<A HREF="http://theory.lcs.mit.edu/~fftw">the FFTW home page</A>.  The current
121
version of FFTW incorporates many good ideas from the past thirty years
122
of FFT literature.  In one way or another, FFTW uses the Cooley-Tukey
123
algorithm, the Prime Factor algorithm, Rader's algorithm for prime
124
sizes, and the split-radix algorithm (with a variation due to Dan
125
Bernstein).  Our code generator also produces new algorithms that we do
126
not yet completely understand.
127
<A NAME="IDX13"></A>
128
The reader is referred to the cited papers for the appropriate
129
references.
130
 
131
 
132
<P>
133
The rest of this manual is organized as follows.  We first discuss the
134
sequential (one-processor) implementation.  We start by describing the
135
basic features of FFTW in Section <A HREF="fftw_2.html#SEC2">Tutorial</A>.  This discussion includes the
136
storage scheme of multi-dimensional arrays (Section <A HREF="fftw_2.html#SEC7">Multi-dimensional Array Format</A>) and FFTW's mechanisms for storing plans on disk (Section <A HREF="fftw_2.html#SEC13">Words of Wisdom</A>).  Next, Section <A HREF="fftw_3.html#SEC16">FFTW Reference</A> provides comprehensive
137
documentation of all FFTW's features.  Parallel transforms are discussed
138
in their own chapter Section <A HREF="fftw_4.html#SEC47">Parallel FFTW</A>.  Fortran programmers can also
139
use FFTW, as described in Section <A HREF="fftw_5.html#SEC62">Calling FFTW from Fortran</A>.
140
Section <A HREF="fftw_6.html#SEC66">Installation and Customization</A> explains how to install FFTW in
141
your computer system and how to adapt FFTW to your needs.  License and
142
copyright information is given in Section <A HREF="fftw_8.html#SEC74">License and Copyright</A>.  Finally,
143
we thank all the people who helped us in Section <A HREF="fftw_7.html#SEC73">Acknowledgments</A>.
144
 
145
 
146
<P><HR><P>
147
Go to the first, previous, <A HREF="fftw_2.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
148
</BODY>
149
</HTML>