Rev 3 | Details | Compare with Previous | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
2 | pj | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
2 | <HTML> |
||
3 | <HEAD> |
||
4 | <!-- This HTML file has been created by texi2html 1.52 |
||
5 | from fftw.texi on 18 May 1999 --> |
||
6 | |||
7 | <TITLE>FFTW - Installation and Customization</TITLE> |
||
8 | </HEAD> |
||
9 | <BODY TEXT="#000000" BGCOLOR="#FFFFFF"> |
||
10 | Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>. |
||
11 | <P><HR><P> |
||
12 | |||
13 | |||
14 | <H1><A NAME="SEC66">Installation and Customization</A></H1> |
||
15 | |||
16 | <P> |
||
17 | This chapter describes the installation and customization of FFTW, the |
||
18 | latest version of which may be downloaded from |
||
19 | <A HREF="http://theory.lcs.mit.edu/~fftw">the FFTW home page</A>. |
||
20 | |||
21 | |||
22 | <P> |
||
23 | As distributed, FFTW makes very few assumptions about your system. All |
||
24 | you need is an ANSI C compiler (<CODE>gcc</CODE> is fine, although |
||
25 | vendor-provided compilers often produce faster code). |
||
26 | <A NAME="IDX313"></A> |
||
27 | However, installation of FFTW is somewhat simpler if you have a Unix or |
||
28 | a GNU system, such as Linux. In this chapter, we first describe the |
||
29 | installation of FFTW on Unix and non-Unix systems. We then describe how |
||
30 | you can customize FFTW to achieve better performance. Specifically, you |
||
31 | can I) enable <CODE>gcc</CODE>/x86-specific hacks that improve performance on |
||
32 | Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock |
||
33 | of your machine, if any; III) produce code (<EM>codelets</EM>) to support |
||
34 | fast transforms of sizes that are not supported efficiently by the |
||
35 | standard FFTW distribution. |
||
36 | <A NAME="IDX314"></A> |
||
37 | |||
38 | |||
39 | |||
40 | |||
41 | <H2><A NAME="SEC67">Installation on Unix</A></H2> |
||
42 | |||
43 | <P> |
||
44 | FFTW comes with a <CODE>configure</CODE> program in the GNU style. |
||
45 | Installation can be as simple as: |
||
46 | <A NAME="IDX315"></A> |
||
47 | |||
48 | |||
49 | |||
50 | <PRE> |
||
51 | ./configure |
||
52 | make |
||
53 | make install |
||
54 | </PRE> |
||
55 | |||
56 | <P> |
||
57 | This will build the uniprocessor complex and real transform libraries |
||
58 | along with the test programs. We strongly recommend that you use GNU |
||
59 | <CODE>make</CODE> if it is available; on some systems it is called |
||
60 | <CODE>gmake</CODE>. The "<CODE>make install</CODE>" command installs the fftw and |
||
61 | rfftw libraries in standard places, and typically requires root |
||
62 | privileges (unless you specify a different install directory with the |
||
63 | <CODE>--prefix</CODE> flag to <CODE>configure</CODE>). You can also type |
||
64 | "<CODE>make check</CODE>" to put the FFTW test programs through their paces. |
||
65 | If you have problems during configuration or compilation, you may want |
||
66 | to run "<CODE>make distclean</CODE>" before trying again; this ensures that |
||
67 | you don't have any stale files left over from previous compilation |
||
68 | attempts. |
||
69 | |||
70 | |||
71 | <P> |
||
72 | The <CODE>configure</CODE> script knows good <CODE>CFLAGS</CODE> (C compiler flags) |
||
73 | <A NAME="IDX316"></A> |
||
74 | for a few systems. If your system is not known, the <CODE>configure</CODE> |
||
75 | script will print out a warning. <A NAME="DOCF9" HREF="fftw_foot.html#FOOT9">(9)</A> In this case, you can compile |
||
76 | FFTW with the command |
||
77 | |||
78 | <PRE> |
||
79 | make CFLAGS="<write your CFLAGS here>" |
||
80 | </PRE> |
||
81 | |||
82 | <P> |
||
83 | If you do find an optimal set of <CODE>CFLAGS</CODE> for your system, please |
||
84 | let us know what they are (along with the output of <CODE>config.guess</CODE>) |
||
85 | so that we can include them in future releases. |
||
86 | |||
87 | |||
88 | <P> |
||
89 | The <CODE>configure</CODE> program supports all the standard flags defined by |
||
90 | the GNU Coding Standards; see the <CODE>INSTALL</CODE> file in FFTW or |
||
91 | <A HREF="http://www.gnu.org/prep/standards_toc.html">the GNU web page</A>. |
||
92 | Note especially <CODE>--help</CODE> to list all flags and |
||
93 | <CODE>--enable-shared</CODE> to create shared, rather than static, libraries. |
||
94 | <CODE>configure</CODE> also accepts a few FFTW-specific flags, particularly: |
||
95 | |||
96 | |||
97 | |||
98 | <UL> |
||
99 | |||
100 | <LI> |
||
101 | |||
102 | <A NAME="IDX317"></A> |
||
103 | <CODE>--enable-float</CODE> Produces a single-precision version of FFTW |
||
104 | (<CODE>float</CODE>) instead of the default double-precision (<CODE>double</CODE>). |
||
105 | See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>. |
||
106 | |||
107 | <LI> |
||
108 | |||
109 | <CODE>--enable-type-prefix</CODE> Adds a <SAMP>`d'</SAMP> or <SAMP>`s'</SAMP> prefix to all |
||
110 | installed libraries and header files to indicate the floating-point |
||
111 | precision. See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>. (<CODE>--enable-type-prefix=<prefix></CODE> lets you add an |
||
112 | arbitrary prefix.) By default, no prefix is used. |
||
113 | |||
114 | <LI> |
||
115 | |||
116 | <A NAME="IDX318"></A> |
||
117 | <CODE>--enable-threads</CODE> Enables compilation and installation of the FFTW |
||
118 | threads library (see Section <A HREF="fftw_4.html#SEC48">Multi-threaded FFTW</A>), which provides a |
||
119 | simple interface to parallel transforms for SMP systems. (By default, |
||
120 | the threads routines are not compiled.) |
||
121 | |||
122 | <LI> |
||
123 | |||
124 | <A NAME="IDX319"></A> |
||
125 | <CODE>--enable-mpi</CODE> Enables compilation and installation of the FFTW MPI |
||
126 | library (see Section <A HREF="fftw_4.html#SEC55">MPI FFTW</A>), which provides parallel transforms for |
||
127 | distributed-memory systems with MPI. (By default, the MPI routines are |
||
128 | not compiled.) |
||
129 | |||
130 | <LI> |
||
131 | |||
132 | <A NAME="IDX320"></A> |
||
133 | <CODE>--disable-fortran</CODE> Disables inclusion of Fortran-callable wrapper |
||
134 | routines (see Section <A HREF="fftw_5.html#SEC62">Calling FFTW from Fortran</A>) in the standard FFTW |
||
135 | libraries. These wrapper routines increase the library size by only a |
||
136 | negligible amount, so they are included by default as long as the |
||
137 | <CODE>configure</CODE> script finds a Fortran compiler on your system. |
||
138 | |||
139 | <LI> |
||
140 | |||
141 | <CODE>--with-gcc</CODE> Enables the use of <CODE>gcc</CODE>. By default, FFTW uses |
||
142 | the vendor-supplied <CODE>cc</CODE> compiler if present. Unfortunately, |
||
143 | <CODE>gcc</CODE> produces slower code than <CODE>cc</CODE> on many systems. |
||
144 | |||
145 | <LI> |
||
146 | |||
147 | <CODE>--enable-i386-hacks</CODE> See below. |
||
148 | |||
149 | <LI> |
||
150 | |||
151 | <CODE>--enable-pentium-timer</CODE> See below. |
||
152 | |||
153 | </UL> |
||
154 | |||
155 | <P> |
||
156 | To force <CODE>configure</CODE> to use a particular C compiler (instead of the |
||
157 | <A NAME="IDX321"></A> |
||
158 | default, usually <CODE>cc</CODE>), set the environment variable <CODE>CC</CODE> to |
||
159 | the name of the desired compiler before running <CODE>configure</CODE>; you |
||
160 | may also need to set the flags via the variable <CODE>CFLAGS</CODE>. |
||
161 | <A NAME="IDX322"></A> |
||
162 | |||
163 | |||
164 | |||
165 | |||
166 | <H2><A NAME="SEC68">Installation on non-Unix Systems</A></H2> |
||
167 | |||
168 | <P> |
||
169 | It is quite straightforward to install FFTW even on non-Unix systems |
||
170 | lacking the niceties of the <CODE>configure</CODE> script. The FFTW Home Page |
||
171 | may include some FFTW packages preconfigured for particular |
||
172 | systems/compilers, and also contains installation notes sent in by |
||
173 | <A NAME="IDX323"></A> |
||
174 | users. All you really need to do, though, is to compile all of the |
||
175 | <CODE>.c</CODE> files in the appropriate directories of the FFTW package. |
||
176 | (You needn't worry about the many extraneous files lying around.) |
||
177 | |||
178 | |||
179 | <P> |
||
180 | For the complex transforms, compile all of the <CODE>.c</CODE> files in the |
||
181 | <CODE>fftw</CODE> directory and link them into a library. Similarly, for the |
||
182 | real transforms, compile all of the <CODE>.c</CODE> files in the <CODE>rfftw</CODE> |
||
183 | directory into a library. Note that these sources <CODE>#include</CODE> |
||
184 | various files in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you |
||
185 | may need to set up the <CODE>#include</CODE> paths for your compiler |
||
186 | appropriately. Be sure to enable the highest-possible level of |
||
187 | optimization in your compiler. |
||
188 | |||
189 | |||
190 | <P> |
||
191 | <A NAME="IDX324"></A> |
||
192 | By default, FFTW is compiled for double-precision transforms. To work |
||
193 | in single precision rather than double precision, <CODE>#define</CODE> the |
||
194 | symbol <CODE>FFTW_ENABLE_FLOAT</CODE> in <CODE>fftw.h</CODE> (in the <CODE>fftw</CODE> |
||
195 | directory) and (re)compile FFTW. |
||
196 | |||
197 | |||
198 | <P> |
||
199 | These libraries should be linked with any program that uses the |
||
200 | corresponding transforms. The required header files, <CODE>fftw.h</CODE> and |
||
201 | <CODE>rfftw.h</CODE>, are located in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> |
||
202 | directories respectively; you may want to put them with the libraries, |
||
203 | or wherever header files normally go on your system. |
||
204 | |||
205 | |||
206 | <P> |
||
207 | FFTW includes test programs, <CODE>fftw_test</CODE> and <CODE>rfftw_test</CODE>, in |
||
208 | <A NAME="IDX325"></A> |
||
209 | <A NAME="IDX326"></A> |
||
210 | the <CODE>tests</CODE> directory. These are compiled and linked like any |
||
211 | program using FFTW, except that they use additional header files located |
||
212 | in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you will need to set |
||
213 | your compiler <CODE>#include</CODE> paths appropriately. <CODE>fftw_test</CODE> is |
||
214 | compiled from <CODE>fftw_test.c</CODE> and <CODE>test_main.c</CODE>, while |
||
215 | <CODE>rfftw_test</CODE> is compiled from <CODE>rfftw_test.c</CODE> and |
||
216 | <CODE>test_main.c</CODE>. When you run these programs, you will be prompted |
||
217 | interactively for various possible tests to perform; see also |
||
218 | <CODE>tests/README</CODE> for more information. |
||
219 | |||
220 | |||
221 | |||
222 | |||
223 | <H2><A NAME="SEC69">Installing FFTW in both single and double precision</A></H2> |
||
224 | |||
225 | <P> |
||
226 | <A NAME="IDX327"></A> |
||
227 | It is often useful to install both single- and double-precision versions |
||
228 | of the FFTW libraries on the same machine, and we provide a convenient |
||
229 | mechanism for achieving this on Unix systems. |
||
230 | |||
231 | |||
232 | <P> |
||
233 | <A NAME="IDX328"></A> |
||
234 | When the <CODE>--enable-type-prefix</CODE> option of configure is used, the |
||
235 | FFTW libraries and header files are installed with a prefix of <SAMP>`d'</SAMP> |
||
236 | or <SAMP>`s'</SAMP>, depending upon whether you compiled in double or single |
||
237 | precision. Then, instead of linking your program with <CODE>-lrfftw |
||
238 | -lfftw</CODE>, for example, you would link with <CODE>-ldrfftw -ldfftw</CODE> to use |
||
239 | the double-precision version or with <CODE>-lsrfftw -lsfftw</CODE> to use the |
||
240 | single-precision version. Also, you would <CODE>#include</CODE> |
||
241 | <CODE><drfftw.h></CODE> or <CODE><srfftw.h></CODE> instead of <CODE><rfftw.h></CODE>, and |
||
242 | so on. |
||
243 | |||
244 | |||
245 | <P> |
||
246 | <EM>The names of FFTW functions, data types, and constants remain |
||
247 | unchanged!</EM> You still call, for instance, <CODE>fftw_one</CODE> and not |
||
248 | <CODE>dfftw_one</CODE>. Only the names of header files and libraries are |
||
249 | modified. One consequence of this is that <EM>you <B>cannot</B> use both |
||
250 | the single- and double-precision FFTW libraries in the same program, |
||
251 | simultaneously,</EM> as the function names would conflict. |
||
252 | |||
253 | |||
254 | <P> |
||
255 | So, to install both the single- and double-precision libraries on the |
||
256 | same machine, you would do: |
||
257 | |||
258 | |||
259 | |||
260 | <PRE> |
||
261 | ./configure --enable-type-prefix <I>[ other options ]</I> |
||
262 | make |
||
263 | make install |
||
264 | make clean |
||
265 | ./configure --enable-float --enable-type-prefix <I>[ other options ]</I> |
||
266 | make |
||
267 | make install |
||
268 | </PRE> |
||
269 | |||
270 | |||
271 | |||
272 | <H2><A NAME="SEC70"><CODE>gcc</CODE> and Pentium/PentiumPro hacks</A></H2> |
||
273 | <P> |
||
274 | <A NAME="IDX329"></A> |
||
275 | The <CODE>configure</CODE> option <CODE>--enable-i386-hacks</CODE> enables specific |
||
276 | optimizations for <CODE>gcc</CODE> and Pentium/PentiumPro, which can |
||
277 | significantly improve performance of double-precision transforms. |
||
278 | Specifically, we have tested these hacks on Linux with <CODE>gcc</CODE> |
||
279 | 2.[78] and versions of <CODE>egcs</CODE> since 1.0.3. These optimizations |
||
280 | only affect the performance, not the correctness of FFTW (i.e. it is |
||
281 | always safe to try them out). |
||
282 | |||
283 | |||
284 | <P> |
||
285 | These hacks provide a workaround to the incorrect alignment of local |
||
286 | <CODE>double</CODE> variables in <CODE>gcc</CODE>. The compiler aligns these |
||
287 | <A NAME="IDX330"></A> |
||
288 | variables to multiples of 4 bytes, but execution is much faster (on |
||
289 | Pentium and PentiumPro) if <CODE>double</CODE>s are aligned to a multiple of 8 |
||
290 | bytes. By carefully counting the number of variables allocated by the |
||
291 | compiler in performance-critical regions of the code, we have been able |
||
292 | to introduce dummy allocations (using <CODE>alloca</CODE>) that align the |
||
293 | stack properly. The hack depends crucially on the compiler flags that |
||
294 | are used. For example, it won't work without |
||
295 | <CODE>-fomit-frame-pointer</CODE>. |
||
296 | |||
297 | |||
298 | <P> |
||
299 | The <CODE>fftw_test</CODE> program outputs speed measurements that you can use |
||
300 | to see if these hacks are beneficial. |
||
301 | <A NAME="IDX331"></A> |
||
302 | <A NAME="IDX332"></A> |
||
303 | |||
304 | |||
305 | <P> |
||
306 | The <CODE>configure</CODE> option <CODE>--enable-pentium-timer</CODE> enables the |
||
307 | use of the Pentium and PentiumPro cycle counter for timing purposes. In |
||
308 | order to get correct results, you must define <CODE>FFTW_CYCLES_PER_SEC</CODE> |
||
309 | in <CODE>fftw/config.h</CODE> to be the clock speed of your processor; the |
||
310 | resulting FFTW library will be nonportable. The use of this option is |
||
311 | deprecated. On serious operating systems (such as Linux), FFTW uses |
||
312 | <CODE>gettimeofday()</CODE>, which has enough resolution and is portable. |
||
313 | (Note that Win32 has its own high-resolution timing routines as well. |
||
314 | FFTW contains unsupported code to use these routines.) |
||
315 | |||
316 | |||
317 | |||
318 | |||
319 | <H2><A NAME="SEC71">Customizing the timer</A></H2> |
||
320 | <P> |
||
321 | <A NAME="IDX333"></A> |
||
322 | |||
323 | |||
324 | <P> |
||
325 | FFTW needs a reasonably-precise clock in order to find the optimal way |
||
326 | to compute a transform. On Unix systems, <CODE>configure</CODE> looks for |
||
327 | <CODE>gettimeofday</CODE> and other system-specific timers. If it does not |
||
328 | find any high resolution clock, it defaults to using the <CODE>clock()</CODE> |
||
329 | function, which is very portable, but forces FFTW to run for a long time |
||
330 | in order to get reliable measurements. |
||
331 | <A NAME="IDX334"></A> |
||
332 | <A NAME="IDX335"></A> |
||
333 | |||
334 | |||
335 | <P> |
||
336 | If your machine supports a high-resolution clock not recognized by FFTW, |
||
337 | it is therefore advisable to use it. You must edit |
||
338 | <CODE>fftw/fftw-int.h</CODE>. There are a few macros you must redefine. The |
||
339 | code is documented and should be self-explanatory. (By the way, |
||
340 | <CODE>fftw-int</CODE> stands for <CODE>fftw-internal</CODE>, but for some |
||
341 | inexplicable reason people are still using primitive systems with 8.3 |
||
342 | filenames.) |
||
343 | |||
344 | |||
345 | <P> |
||
346 | Even if you don't install high-resolution timing code, we still |
||
347 | recommend that you look at the <CODE>FFTW_TIME_MIN</CODE> constant in |
||
348 | <A NAME="IDX336"></A> |
||
349 | <CODE>fftw/fftw-int.h</CODE>. This constant holds the minimum time interval (in |
||
350 | seconds) required to get accurate timing measurements, and should be (at |
||
351 | least) several hundred times the resolution of your clock. The default |
||
352 | constants are on the conservative side, and may cause FFTW to take |
||
353 | longer than necessary when you create a plan. Set <CODE>FFTW_TIME_MIN</CODE> |
||
354 | to whatever is appropriate on your system (be sure to set the |
||
355 | <EM>right</EM> <CODE>FFTW_TIME_MIN</CODE>...there are several definitions in |
||
356 | <CODE>fftw-int.h</CODE>, corresponding to different platforms and timers). |
||
357 | |||
358 | |||
359 | <P> |
||
360 | As an aid in checking the resolution of your clock, you can use the |
||
361 | <CODE>tests/fftw_test</CODE> program with the <CODE>-t</CODE> option |
||
362 | (c.f. <CODE>tests/README</CODE>). Remember, the mere fact that your clock |
||
363 | reports times in, say, picoseconds, does not mean that it is actually |
||
364 | <EM>accurate</EM> to that resolution. |
||
365 | |||
366 | |||
367 | |||
368 | |||
369 | <H2><A NAME="SEC72">Generating your own code</A></H2> |
||
370 | <P> |
||
371 | <A NAME="IDX337"></A> |
||
372 | <A NAME="IDX338"></A> |
||
373 | <A NAME="IDX339"></A> |
||
374 | |||
375 | |||
376 | <P> |
||
377 | If you know that you will only use transforms of a certain size (say, |
||
378 | powers of 2) and want to reduce the size of the library, you can |
||
379 | reconfigure FFTW to support only those sizes you are interested in. You |
||
380 | may even generate code to enable efficient transforms of a size not |
||
381 | supported by the default distribution. The default distribution |
||
382 | supports transforms of any size, but not all sizes are equally fast. |
||
383 | The default installation of FFTW is best at handling sizes of the form |
||
384 | 2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP> |
||
385 | 11<SUP>e</SUP> 13<SUP>f</SUP>, |
||
386 | where e+f is either 0 or |
||
387 | 1, and the other exponents are arbitrary. Other sizes are |
||
388 | computed by means of a slow, general-purpose routine. However, if you |
||
389 | have an application that requires fast transforms of size, say, |
||
390 | <CODE>17</CODE>, there is a way to generate specialized code to handle that. |
||
391 | |||
392 | |||
393 | <P> |
||
394 | The directory <CODE>gensrc</CODE> contains all the programs and scripts that |
||
395 | were used to generate FFTW. In particular, the program |
||
396 | <CODE>gensrc/genfft.ml</CODE> was used to generate the code that FFTW uses to |
||
397 | compute the transforms. We do not expect casual users to use it. |
||
398 | <CODE>genfft</CODE> is a rather sophisticated program that generates directed |
||
399 | acyclic graphs of FFT algorithms and performs algebraic simplifications |
||
400 | on them. <CODE>genfft</CODE> is written in Objective Caml, a dialect of ML. |
||
401 | Objective Caml is described at <A HREF="http://pauillac.inria.fr/ocaml/">http://pauillac.inria.fr/ocaml/</A> |
||
402 | and can be downloaded from from <A HREF="ftp://ftp.inria.fr/lang/caml-light">ftp://ftp.inria.fr/lang/caml-light</A>. |
||
403 | <A NAME="IDX340"></A> |
||
404 | <A NAME="IDX341"></A> |
||
405 | |||
406 | |||
407 | <P> |
||
408 | If you have Objective Caml installed, you can type <CODE>sh |
||
409 | bootstrap.sh</CODE> in the top-level directory to re-generate the files. If |
||
410 | you change the <CODE>gensrc/config</CODE> file, you can optimize FFTW for |
||
411 | sizes that are not currently supported efficiently (say, 17 or 19). |
||
412 | |||
413 | |||
414 | <P> |
||
415 | We do not provide more details about the code-generation process, since |
||
416 | we do not expect that users will need to generate their own code. |
||
417 | However, feel free to contact us at <A HREF="mailto:fftw@theory.lcs.mit.edu">fftw@theory.lcs.mit.edu</A> if |
||
418 | you are interested in the subject. |
||
419 | |||
420 | |||
421 | <P> |
||
422 | <A NAME="IDX342"></A> |
||
423 | You might find it interesting to learn Caml and/or some modern |
||
424 | programming techniques that we used in the generator (including monadic |
||
425 | programming), especially if you heard the rumor that Java and |
||
426 | object-oriented programming are the latest advancement in the field. |
||
427 | The internal operation of the codelet generator is described in the |
||
428 | paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is |
||
429 | available from the <A HREF="http://theory.lcs.mit.edu/~fftw">FFTW home page</A> |
||
430 | and will appear in the <CITE>Proceedings of the 1999 ACM SIGPLAN |
||
431 | Conference on Programming Language Design and Implementation (PLDI)</CITE>. |
||
432 | |||
433 | |||
434 | <P><HR><P> |
||
435 | Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>. |
||
436 | </BODY> |
||
437 | </HTML> |