Rev 3 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
2 | pj | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
2 | <HTML> |
||
3 | <HEAD> |
||
4 | <!-- This HTML file has been created by texi2html 1.52 |
||
5 | from fftw.texi on 18 May 1999 --> |
||
6 | |||
7 | <TITLE>FFTW - Introduction</TITLE> |
||
8 | </HEAD> |
||
9 | <BODY TEXT="#000000" BGCOLOR="#FFFFFF"> |
||
10 | Go to the first, previous, <A HREF="fftw_2.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>. |
||
11 | <P><HR><P> |
||
12 | |||
13 | |||
14 | <H1><A NAME="SEC1">Introduction</A></H1> |
||
15 | <P> |
||
16 | This manual documents version 2.1.2 of FFTW, the <EM>Fastest |
||
17 | Fourier Transform in the West</EM>. FFTW is a comprehensive collection of |
||
18 | fast C routines for computing the discrete Fourier transform (DFT) in |
||
19 | one or more dimensions, of both real and complex data, and of arbitrary |
||
20 | input size. FFTW also includes parallel transforms for both shared- and |
||
21 | distributed-memory systems. We assume herein that the reader is already |
||
22 | familiar with the properties and uses of the DFT that are relevant to |
||
23 | her application. Otherwise, see e.g. <CITE>The Fast Fourier Transform</CITE> |
||
24 | by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974). |
||
25 | <A HREF="http://theory.lcs.mit.edu/~fftw">Our web page</A> also has links to |
||
26 | FFT-related information online. |
||
27 | <A NAME="IDX1"></A> |
||
28 | |||
29 | |||
30 | <P> |
||
31 | FFTW is usually faster (and sometimes much faster) than all other |
||
32 | freely-available Fourier transform programs found on the Net. For |
||
33 | transforms whose size is a power of two, it compares favorably with the |
||
34 | FFT codes in Sun's Performance Library and IBM's ESSL library, which are |
||
35 | targeted at specific machines. Moreover, FFTW's performance is |
||
36 | <EM>portable</EM>. Indeed, FFTW is unique in that it automatically adapts |
||
37 | itself to your machine, your cache, the size of your memory, the number |
||
38 | of registers, and all the other factors that normally make it impossible |
||
39 | to optimize a program for more than one machine. An extensive |
||
40 | comparison of FFTW's performance with that of other Fourier transform |
||
41 | codes has been made. The results are available on the Web at |
||
42 | <A HREF="http://theory.lcs.mit.edu/~benchfft">the benchFFT home page</A>. |
||
43 | <A NAME="IDX2"></A> |
||
44 | <A NAME="IDX3"></A> |
||
45 | |||
46 | |||
47 | <P> |
||
48 | In order to use FFTW effectively, you need to understand one basic |
||
49 | concept of FFTW's internal structure. FFTW does not used a fixed |
||
50 | algorithm for computing the transform, but it can adapt the DFT |
||
51 | algorithm to details of the underlying hardware in order to achieve best |
||
52 | performance. Hence, the computation of the transform is split into two |
||
53 | phases. First, FFTW's <EM>planner</EM> is called, which "learns" the |
||
54 | <A NAME="IDX4"></A> |
||
55 | fastest way to compute the transform on your machine. The planner |
||
56 | <A NAME="IDX5"></A> |
||
57 | produces a data structure called a <EM>plan</EM> that contains this |
||
58 | information. Subsequently, the plan is passed to FFTW's <EM>executor</EM>, |
||
59 | <A NAME="IDX6"></A> |
||
60 | along with an array of input data. The executor computes the actual |
||
61 | transform, as dictated by the plan. The plan can be reused as many |
||
62 | times as needed. In typical high-performance applications, many |
||
63 | transforms of the same size are computed, and consequently a |
||
64 | relatively-expensive initialization of this sort is acceptable. On the |
||
65 | other hand, if you need a single transform of a given size, the one-time |
||
66 | cost of the planner becomes significant. For this case, FFTW provides |
||
67 | fast planners based on heuristics or on previously computed plans. |
||
68 | |||
69 | |||
70 | <P> |
||
71 | The pattern of planning/execution applies to all four operation modes of |
||
72 | FFTW, that is, I) one-dimensional complex transforms (FFTW), II) |
||
73 | multi-dimensional complex transforms (FFTWND), III) one-dimensional |
||
74 | transforms of real data (RFFTW), IV) multi-dimensional transforms of |
||
75 | real data (RFFTWND). Each mode comes with its own planner and executor. |
||
76 | |||
77 | |||
78 | <P> |
||
79 | Besides the automatic performance adaptation performed by the planner, |
||
80 | it is also possible for advanced users to customize FFTW for their |
||
81 | special needs. As distributed, FFTW works most efficiently for arrays |
||
82 | whose size can be factored into small primes (2, 3, |
||
83 | 5, and 7), and uses a slower general-purpose routine for |
||
84 | other factors. FFTW, however, comes with a code generator that can |
||
85 | produce fast C programs for any particular array size you may care |
||
86 | about. |
||
87 | <A NAME="IDX7"></A> |
||
88 | For example, if you need transforms of size |
||
89 | 513 = 19*3<sup>3</sup>, |
||
90 | you can customize FFTW to support the factor 19 efficiently. |
||
91 | |||
92 | |||
93 | <P> |
||
94 | FFTW can exploit multiple processors if you have them. FFTW comes with |
||
95 | a shared-memory implementation on top of POSIX (and similar) threads, as |
||
96 | well as a distributed-memory implementation based on MPI. |
||
97 | <A NAME="IDX8"></A> |
||
98 | <A NAME="IDX9"></A> |
||
99 | <A NAME="IDX10"></A> |
||
100 | We also provide an experimental parallel implementation written in Cilk, |
||
101 | <EM>the superior programming tool of choice for discriminating |
||
102 | hackers</EM> (Olin Shivers). (See <A HREF="http://supertech.lcs.mit.edu/cilk">the Cilk home page</A>.) |
||
103 | <A NAME="IDX11"></A> |
||
104 | |||
105 | |||
106 | <P> |
||
107 | For more information regarding FFTW, see the paper, "The Fastest |
||
108 | Fourier Transform in the West," by M. Frigo and S. G. Johnson, which is |
||
109 | the technical report MIT-LCS-TR-728 (Sep. '97). See also, "FFTW: An |
||
110 | Adaptive Software Architecture for the FFT," by M. Frigo and |
||
111 | S. G. Johnson, which appeared in the 23rd International Conference on |
||
112 | Acoustics, Speech, and Signal Processing (<CITE>Proc. ICASSP 1998</CITE> |
||
113 | <B>3</B>, p. 1381). The code generator is described in the paper "A Fast |
||
114 | Fourier Transform Compiler", |
||
115 | <A NAME="IDX12"></A> |
||
116 | by M. Frigo, to appear in the <CITE>Proceedings of the 1999 ACM SIGPLAN |
||
117 | Conference on Programming Language Design and Implementation (PLDI), |
||
118 | Atlanta, Georgia, May 1999</CITE>. These papers, along with the latest |
||
119 | version of FFTW, the FAQ, benchmarks, and other links, are available at |
||
120 | <A HREF="http://theory.lcs.mit.edu/~fftw">the FFTW home page</A>. The current |
||
121 | version of FFTW incorporates many good ideas from the past thirty years |
||
122 | of FFT literature. In one way or another, FFTW uses the Cooley-Tukey |
||
123 | algorithm, the Prime Factor algorithm, Rader's algorithm for prime |
||
124 | sizes, and the split-radix algorithm (with a variation due to Dan |
||
125 | Bernstein). Our code generator also produces new algorithms that we do |
||
126 | not yet completely understand. |
||
127 | <A NAME="IDX13"></A> |
||
128 | The reader is referred to the cited papers for the appropriate |
||
129 | references. |
||
130 | |||
131 | |||
132 | <P> |
||
133 | The rest of this manual is organized as follows. We first discuss the |
||
134 | sequential (one-processor) implementation. We start by describing the |
||
135 | basic features of FFTW in Section <A HREF="fftw_2.html#SEC2">Tutorial</A>. This discussion includes the |
||
136 | storage scheme of multi-dimensional arrays (Section <A HREF="fftw_2.html#SEC7">Multi-dimensional Array Format</A>) and FFTW's mechanisms for storing plans on disk (Section <A HREF="fftw_2.html#SEC13">Words of Wisdom</A>). Next, Section <A HREF="fftw_3.html#SEC16">FFTW Reference</A> provides comprehensive |
||
137 | documentation of all FFTW's features. Parallel transforms are discussed |
||
138 | in their own chapter Section <A HREF="fftw_4.html#SEC47">Parallel FFTW</A>. Fortran programmers can also |
||
139 | use FFTW, as described in Section <A HREF="fftw_5.html#SEC62">Calling FFTW from Fortran</A>. |
||
140 | Section <A HREF="fftw_6.html#SEC66">Installation and Customization</A> explains how to install FFTW in |
||
141 | your computer system and how to adapt FFTW to your needs. License and |
||
142 | copyright information is given in Section <A HREF="fftw_8.html#SEC74">License and Copyright</A>. Finally, |
||
143 | we thank all the people who helped us in Section <A HREF="fftw_7.html#SEC73">Acknowledgments</A>. |
||
144 | |||
145 | |||
146 | <P><HR><P> |
||
147 | Go to the first, previous, <A HREF="fftw_2.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>. |
||
148 | </BODY> |
||
149 | </HTML> |