Subversion Repositories shark

Rev

Rev 3 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 pj 1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2
<HTML>
3
<HEAD>
4
<!-- This HTML file has been created by texi2html 1.52
5
     from fftw.texi on 18 May 1999 -->
6
 
7
<TITLE>FFTW - Installation and Customization</TITLE>
8
</HEAD>
9
<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
10
Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
11
<P><HR><P>
12
 
13
 
14
<H1><A NAME="SEC66">Installation and Customization</A></H1>
15
 
16
<P>
17
This chapter describes the installation and customization of FFTW, the
18
latest version of which may be downloaded from
19
<A HREF="http://theory.lcs.mit.edu/~fftw">the FFTW home page</A>.
20
 
21
 
22
<P>
23
As distributed, FFTW makes very few assumptions about your system.  All
24
you need is an ANSI C compiler (<CODE>gcc</CODE> is fine, although
25
vendor-provided compilers often produce faster code).
26
<A NAME="IDX313"></A>
27
However, installation of FFTW is somewhat simpler if you have a Unix or
28
a GNU system, such as Linux.  In this chapter, we first describe the
29
installation of FFTW on Unix and non-Unix systems.  We then describe how
30
you can customize FFTW to achieve better performance.  Specifically, you
31
can I) enable <CODE>gcc</CODE>/x86-specific hacks that improve performance on
32
Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock
33
of your machine, if any; III) produce code (<EM>codelets</EM>) to support
34
fast transforms of sizes that are not supported efficiently by the
35
standard FFTW distribution.
36
<A NAME="IDX314"></A>
37
 
38
 
39
 
40
 
41
<H2><A NAME="SEC67">Installation on Unix</A></H2>
42
 
43
<P>
44
FFTW comes with a <CODE>configure</CODE> program in the GNU style.
45
Installation can be as simple as:
46
<A NAME="IDX315"></A>
47
 
48
 
49
 
50
<PRE>
51
./configure
52
make
53
make install
54
</PRE>
55
 
56
<P>
57
This will build the uniprocessor complex and real transform libraries
58
along with the test programs.  We strongly recommend that you use GNU
59
<CODE>make</CODE> if it is available; on some systems it is called
60
<CODE>gmake</CODE>.  The "<CODE>make install</CODE>" command installs the fftw and
61
rfftw libraries in standard places, and typically requires root
62
privileges (unless you specify a different install directory with the
63
<CODE>--prefix</CODE> flag to <CODE>configure</CODE>).  You can also type
64
"<CODE>make check</CODE>" to put the FFTW test programs through their paces.
65
If you have problems during configuration or compilation, you may want
66
to run "<CODE>make distclean</CODE>" before trying again; this ensures that
67
you don't have any stale files left over from previous compilation
68
attempts.
69
 
70
 
71
<P>
72
The <CODE>configure</CODE> script knows good <CODE>CFLAGS</CODE> (C compiler flags)
73
<A NAME="IDX316"></A>
74
for a few systems.  If your system is not known, the <CODE>configure</CODE>
75
script will print out a warning.  <A NAME="DOCF9" HREF="fftw_foot.html#FOOT9">(9)</A>  In this case, you can compile
76
FFTW with the command
77
 
78
<PRE>
79
make CFLAGS="&#60;write your CFLAGS here&#62;"
80
</PRE>
81
 
82
<P>
83
If you do find an optimal set of <CODE>CFLAGS</CODE> for your system, please
84
let us know what they are (along with the output of <CODE>config.guess</CODE>)
85
so that we can include them in future releases.
86
 
87
 
88
<P>
89
The <CODE>configure</CODE> program supports all the standard flags defined by
90
the GNU Coding Standards; see the <CODE>INSTALL</CODE> file in FFTW or
91
<A HREF="http://www.gnu.org/prep/standards_toc.html">the GNU web page</A>.
92
Note especially <CODE>--help</CODE> to list all flags and
93
<CODE>--enable-shared</CODE> to create shared, rather than static, libraries.
94
<CODE>configure</CODE> also accepts a few FFTW-specific flags, particularly:
95
 
96
 
97
 
98
<UL>
99
 
100
<LI>
101
 
102
<A NAME="IDX317"></A>
103
<CODE>--enable-float</CODE> Produces a single-precision version of FFTW
104
(<CODE>float</CODE>) instead of the default double-precision (<CODE>double</CODE>).
105
See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>.
106
 
107
<LI>
108
 
109
<CODE>--enable-type-prefix</CODE> Adds a <SAMP>`d'</SAMP> or <SAMP>`s'</SAMP> prefix to all
110
installed libraries and header files to indicate the floating-point
111
precision.  See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>.  (<CODE>--enable-type-prefix=&#60;prefix&#62;</CODE> lets you add an
112
arbitrary prefix.)  By default, no prefix is used.
113
 
114
<LI>
115
 
116
<A NAME="IDX318"></A>
117
<CODE>--enable-threads</CODE> Enables compilation and installation of the FFTW
118
threads library (see Section <A HREF="fftw_4.html#SEC48">Multi-threaded FFTW</A>), which provides a
119
simple interface to parallel transforms for SMP systems.  (By default,
120
the threads routines are not compiled.)
121
 
122
<LI>
123
 
124
<A NAME="IDX319"></A>
125
<CODE>--enable-mpi</CODE> Enables compilation and installation of the FFTW MPI
126
library (see Section <A HREF="fftw_4.html#SEC55">MPI FFTW</A>), which provides parallel transforms for
127
distributed-memory systems with MPI.  (By default, the MPI routines are
128
not compiled.)
129
 
130
<LI>
131
 
132
<A NAME="IDX320"></A>
133
<CODE>--disable-fortran</CODE> Disables inclusion of Fortran-callable wrapper
134
routines (see Section <A HREF="fftw_5.html#SEC62">Calling FFTW from Fortran</A>) in the standard FFTW
135
libraries.  These wrapper routines increase the library size by only a
136
negligible amount, so they are included by default as long as the
137
<CODE>configure</CODE> script finds a Fortran compiler on your system.
138
 
139
<LI>
140
 
141
<CODE>--with-gcc</CODE> Enables the use of <CODE>gcc</CODE>.  By default, FFTW uses
142
the vendor-supplied <CODE>cc</CODE> compiler if present.  Unfortunately,
143
<CODE>gcc</CODE> produces slower code than <CODE>cc</CODE> on many systems.
144
 
145
<LI>
146
 
147
<CODE>--enable-i386-hacks</CODE>  See below.
148
 
149
<LI>
150
 
151
<CODE>--enable-pentium-timer</CODE>  See below.
152
 
153
</UL>
154
 
155
<P>
156
To force <CODE>configure</CODE> to use a particular C compiler (instead of the
157
<A NAME="IDX321"></A>
158
default, usually <CODE>cc</CODE>), set the environment variable <CODE>CC</CODE> to
159
the name of the desired compiler before running <CODE>configure</CODE>; you
160
may also need to set the flags via the variable <CODE>CFLAGS</CODE>.
161
<A NAME="IDX322"></A>
162
 
163
 
164
 
165
 
166
<H2><A NAME="SEC68">Installation on non-Unix Systems</A></H2>
167
 
168
<P>
169
It is quite straightforward to install FFTW even on non-Unix systems
170
lacking the niceties of the <CODE>configure</CODE> script.  The FFTW Home Page
171
may include some FFTW packages preconfigured for particular
172
systems/compilers, and also contains installation notes sent in by
173
<A NAME="IDX323"></A>
174
users.  All you really need to do, though, is to compile all of the
175
<CODE>.c</CODE> files in the appropriate directories of the FFTW package.
176
(You needn't worry about the many extraneous files lying around.)
177
 
178
 
179
<P>
180
For the complex transforms, compile all of the <CODE>.c</CODE> files in the
181
<CODE>fftw</CODE> directory and link them into a library.  Similarly, for the
182
real transforms, compile all of the <CODE>.c</CODE> files in the <CODE>rfftw</CODE>
183
directory into a library.  Note that these sources <CODE>#include</CODE>
184
various files in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you
185
may need to set up the <CODE>#include</CODE> paths for your compiler
186
appropriately.  Be sure to enable the highest-possible level of
187
optimization in your compiler.
188
 
189
 
190
<P>
191
<A NAME="IDX324"></A>
192
By default, FFTW is compiled for double-precision transforms.  To work
193
in single precision rather than double precision, <CODE>#define</CODE> the
194
symbol <CODE>FFTW_ENABLE_FLOAT</CODE> in <CODE>fftw.h</CODE> (in the <CODE>fftw</CODE>
195
directory) and (re)compile FFTW.
196
 
197
 
198
<P>
199
These libraries should be linked with any program that uses the
200
corresponding transforms.  The required header files, <CODE>fftw.h</CODE> and
201
<CODE>rfftw.h</CODE>, are located in the <CODE>fftw</CODE> and <CODE>rfftw</CODE>
202
directories respectively; you may want to put them with the libraries,
203
or wherever header files normally go on your system.
204
 
205
 
206
<P>
207
FFTW includes test programs, <CODE>fftw_test</CODE> and <CODE>rfftw_test</CODE>, in
208
<A NAME="IDX325"></A>
209
<A NAME="IDX326"></A>
210
the <CODE>tests</CODE> directory.  These are compiled and linked like any
211
program using FFTW, except that they use additional header files located
212
in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you will need to set
213
your compiler <CODE>#include</CODE> paths appropriately.  <CODE>fftw_test</CODE> is
214
compiled from <CODE>fftw_test.c</CODE> and <CODE>test_main.c</CODE>, while
215
<CODE>rfftw_test</CODE> is compiled from <CODE>rfftw_test.c</CODE> and
216
<CODE>test_main.c</CODE>.  When you run these programs, you will be prompted
217
interactively for various possible tests to perform; see also
218
<CODE>tests/README</CODE> for more information.
219
 
220
 
221
 
222
 
223
<H2><A NAME="SEC69">Installing FFTW in both single and double precision</A></H2>
224
 
225
<P>
226
<A NAME="IDX327"></A>
227
It is often useful to install both single- and double-precision versions
228
of the FFTW libraries on the same machine, and we provide a convenient
229
mechanism for achieving this on Unix systems.
230
 
231
 
232
<P>
233
<A NAME="IDX328"></A>
234
When the <CODE>--enable-type-prefix</CODE> option of configure is used, the
235
FFTW libraries and header files are installed with a prefix of <SAMP>`d'</SAMP>
236
or <SAMP>`s'</SAMP>, depending upon whether you compiled in double or single
237
precision.  Then, instead of linking your program with <CODE>-lrfftw
238
-lfftw</CODE>, for example, you would link with <CODE>-ldrfftw -ldfftw</CODE> to use
239
the double-precision version or with <CODE>-lsrfftw -lsfftw</CODE> to use the
240
single-precision version.  Also, you would <CODE>#include</CODE>
241
<CODE>&#60;drfftw.h&#62;</CODE> or <CODE>&#60;srfftw.h&#62;</CODE> instead of <CODE>&#60;rfftw.h&#62;</CODE>, and
242
so on.
243
 
244
 
245
<P>
246
<EM>The names of FFTW functions, data types, and constants remain
247
unchanged!</EM>  You still call, for instance, <CODE>fftw_one</CODE> and not
248
<CODE>dfftw_one</CODE>.  Only the names of header files and libraries are
249
modified.  One consequence of this is that <EM>you <B>cannot</B> use both
250
the single- and double-precision FFTW libraries in the same program,
251
simultaneously,</EM> as the function names would conflict.
252
 
253
 
254
<P>
255
So, to install both the single- and double-precision libraries on the
256
same machine, you would do:
257
 
258
 
259
 
260
<PRE>
261
./configure --enable-type-prefix <I>[ other options ]</I>
262
make
263
make install
264
make clean
265
./configure --enable-float --enable-type-prefix <I>[ other options ]</I>
266
make
267
make install
268
</PRE>
269
 
270
 
271
 
272
<H2><A NAME="SEC70"><CODE>gcc</CODE> and Pentium/PentiumPro hacks</A></H2>
273
<P>
274
<A NAME="IDX329"></A>
275
The <CODE>configure</CODE> option <CODE>--enable-i386-hacks</CODE> enables specific
276
optimizations for <CODE>gcc</CODE> and Pentium/PentiumPro, which can
277
significantly improve performance of double-precision transforms.
278
Specifically, we have tested these hacks on Linux with <CODE>gcc</CODE>
279
2.[78] and versions of <CODE>egcs</CODE> since 1.0.3.  These optimizations
280
only affect the performance, not the correctness of FFTW (i.e. it is
281
always safe to try them out).
282
 
283
 
284
<P>
285
These hacks provide a workaround to the incorrect alignment of local
286
<CODE>double</CODE> variables in <CODE>gcc</CODE>.  The compiler aligns these
287
<A NAME="IDX330"></A>
288
variables to multiples of 4 bytes, but execution is much faster (on
289
Pentium and PentiumPro) if <CODE>double</CODE>s are aligned to a multiple of 8
290
bytes.  By carefully counting the number of variables allocated by the
291
compiler in performance-critical regions of the code, we have been able
292
to introduce dummy allocations (using <CODE>alloca</CODE>) that align the
293
stack properly.  The hack depends crucially on the compiler flags that
294
are used.  For example, it won't work without
295
<CODE>-fomit-frame-pointer</CODE>.
296
 
297
 
298
<P>
299
The <CODE>fftw_test</CODE> program outputs speed measurements that you can use
300
to see if these hacks are beneficial.
301
<A NAME="IDX331"></A>
302
<A NAME="IDX332"></A>
303
 
304
 
305
<P>
306
The <CODE>configure</CODE> option <CODE>--enable-pentium-timer</CODE> enables the
307
use of the Pentium and PentiumPro cycle counter for timing purposes.  In
308
order to get correct results, you must define <CODE>FFTW_CYCLES_PER_SEC</CODE>
309
in <CODE>fftw/config.h</CODE> to be the clock speed of your processor; the
310
resulting FFTW library will be nonportable.  The use of this option is
311
deprecated.  On serious operating systems (such as Linux), FFTW uses
312
<CODE>gettimeofday()</CODE>, which has enough resolution and is portable.
313
(Note that Win32 has its own high-resolution timing routines as well.
314
FFTW contains unsupported code to use these routines.)
315
 
316
 
317
 
318
 
319
<H2><A NAME="SEC71">Customizing the timer</A></H2>
320
<P>
321
<A NAME="IDX333"></A>
322
 
323
 
324
<P>
325
FFTW needs a reasonably-precise clock in order to find the optimal way
326
to compute a transform.  On Unix systems, <CODE>configure</CODE> looks for
327
<CODE>gettimeofday</CODE> and other system-specific timers.  If it does not
328
find any high resolution clock, it defaults to using the <CODE>clock()</CODE>
329
function, which is very portable, but forces FFTW to run for a long time
330
in order to get reliable measurements.
331
<A NAME="IDX334"></A>
332
<A NAME="IDX335"></A>
333
 
334
 
335
<P>
336
If your machine supports a high-resolution clock not recognized by FFTW,
337
it is therefore advisable to use it.  You must edit
338
<CODE>fftw/fftw-int.h</CODE>.  There are a few macros you must redefine.  The
339
code is documented and should be self-explanatory.  (By the way,
340
<CODE>fftw-int</CODE> stands for <CODE>fftw-internal</CODE>, but for some
341
inexplicable reason people are still using primitive systems with 8.3
342
filenames.)
343
 
344
 
345
<P>
346
Even if you don't install high-resolution timing code, we still
347
recommend that you look at the <CODE>FFTW_TIME_MIN</CODE> constant in
348
<A NAME="IDX336"></A>
349
<CODE>fftw/fftw-int.h</CODE>. This constant holds the minimum time interval (in
350
seconds) required to get accurate timing measurements, and should be (at
351
least) several hundred times the resolution of your clock.  The default
352
constants are on the conservative side, and may cause FFTW to take
353
longer than necessary when you create a plan. Set <CODE>FFTW_TIME_MIN</CODE>
354
to whatever is appropriate on your system (be sure to set the
355
<EM>right</EM> <CODE>FFTW_TIME_MIN</CODE>...there are several definitions in
356
<CODE>fftw-int.h</CODE>, corresponding to different platforms and timers).
357
 
358
 
359
<P>
360
As an aid in checking the resolution of your clock, you can use the
361
<CODE>tests/fftw_test</CODE> program with the <CODE>-t</CODE> option
362
(c.f. <CODE>tests/README</CODE>). Remember, the mere fact that your clock
363
reports times in, say, picoseconds, does not mean that it is actually
364
<EM>accurate</EM> to that resolution.
365
 
366
 
367
 
368
 
369
<H2><A NAME="SEC72">Generating your own code</A></H2>
370
<P>
371
<A NAME="IDX337"></A>
372
<A NAME="IDX338"></A>
373
<A NAME="IDX339"></A>
374
 
375
 
376
<P>
377
If you know that you will only use transforms of a certain size (say,
378
powers of 2) and want to reduce the size of the library, you can
379
reconfigure FFTW to support only those sizes you are interested in.  You
380
may even generate code to enable efficient transforms of a size not
381
supported by the default distribution.  The default distribution
382
supports transforms of any size, but not all sizes are equally fast.
383
The default installation of FFTW is best at handling sizes of the form
384
2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
385
        11<SUP>e</SUP> 13<SUP>f</SUP>,
386
where e+f is either 0 or
387
1, and the other exponents are arbitrary.  Other sizes are
388
computed by means of a slow, general-purpose routine.  However, if you
389
have an application that requires fast transforms of size, say,
390
<CODE>17</CODE>, there is a way to generate specialized code to handle that.
391
 
392
 
393
<P>
394
The directory <CODE>gensrc</CODE> contains all the programs and scripts that
395
were used to generate FFTW.  In particular, the program
396
<CODE>gensrc/genfft.ml</CODE> was used to generate the code that FFTW uses to
397
compute the transforms.  We do not expect casual users to use it.
398
<CODE>genfft</CODE> is a rather sophisticated program that generates directed
399
acyclic graphs of FFT algorithms and performs algebraic simplifications
400
on them.  <CODE>genfft</CODE> is written in Objective Caml, a dialect of ML.
401
Objective Caml is described at <A HREF="http://pauillac.inria.fr/ocaml/">http://pauillac.inria.fr/ocaml/</A>
402
and can be downloaded from from <A HREF="ftp://ftp.inria.fr/lang/caml-light">ftp://ftp.inria.fr/lang/caml-light</A>.
403
<A NAME="IDX340"></A>
404
<A NAME="IDX341"></A>
405
 
406
 
407
<P>
408
If you have Objective Caml installed, you can type <CODE>sh
409
bootstrap.sh</CODE> in the top-level directory to re-generate the files.  If
410
you change the <CODE>gensrc/config</CODE> file, you can optimize FFTW for
411
sizes that are not currently supported efficiently (say, 17 or 19).
412
 
413
 
414
<P>
415
We do not provide more details about the code-generation process, since
416
we do not expect that users will need to generate their own code.
417
However, feel free to contact us at <A HREF="mailto:fftw@theory.lcs.mit.edu">fftw@theory.lcs.mit.edu</A> if
418
you are interested in the subject.  
419
 
420
 
421
<P>
422
<A NAME="IDX342"></A>
423
You might find it interesting to learn Caml and/or some modern
424
programming techniques that we used in the generator (including monadic
425
programming), especially if you heard the rumor that Java and
426
object-oriented programming are the latest advancement in the field.
427
The internal operation of the codelet generator is described in the
428
paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
429
available from the <A HREF="http://theory.lcs.mit.edu/~fftw">FFTW home page</A>
430
and will appear in the <CITE>Proceedings of the 1999 ACM SIGPLAN
431
Conference on Programming Language Design and Implementation (PLDI)</CITE>.
432
 
433
 
434
<P><HR><P>
435
Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
436
</BODY>
437
</HTML>