WebSVN - shark - Blame - Rev 1083 - /shark/tags/rel_1_5_4/ports/fftw/doc/fftw

Rev	Author	Line No.	Line
2	pj	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
		2	<HTML>
		3	<HEAD>
		4	<!-- This HTML file has been created by texi2html 1.52
		5	from fftw.texi on 18 May 1999 -->
		6
		7	<TITLE>FFTW - Installation and Customization</TITLE>
		8	</HEAD>
		9	<BODY TEXT="#000000" BGCOLOR="#FFFFFF">
		10	Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
		11	<P><HR><P>
		12
		13
		14	<H1><A NAME="SEC66">Installation and Customization</A></H1>
		15
		16	<P>
		17	This chapter describes the installation and customization of FFTW, the
		18	latest version of which may be downloaded from
		19	<A HREF="http://theory.lcs.mit.edu/~fftw">the FFTW home page</A>.
		20
		21
		22	<P>
		23	As distributed, FFTW makes very few assumptions about your system. All
		24	you need is an ANSI C compiler (<CODE>gcc</CODE> is fine, although
		25	vendor-provided compilers often produce faster code).
		26	<A NAME="IDX313"></A>
		27	However, installation of FFTW is somewhat simpler if you have a Unix or
		28	a GNU system, such as Linux. In this chapter, we first describe the
		29	installation of FFTW on Unix and non-Unix systems. We then describe how
		30	you can customize FFTW to achieve better performance. Specifically, you
		31	can I) enable <CODE>gcc</CODE>/x86-specific hacks that improve performance on
		32	Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock
		33	of your machine, if any; III) produce code (<EM>codelets</EM>) to support
		34	fast transforms of sizes that are not supported efficiently by the
		35	standard FFTW distribution.
		36	<A NAME="IDX314"></A>
		37
		38
		39
		40
		41	<H2><A NAME="SEC67">Installation on Unix</A></H2>
		42
		43	<P>
		44	FFTW comes with a <CODE>configure</CODE> program in the GNU style.
		45	Installation can be as simple as:
		46	<A NAME="IDX315"></A>
		47
		48
		49
		50	<PRE>
		51	./configure
		52	make
		53	make install
		54	</PRE>
		55
		56	<P>
		57	This will build the uniprocessor complex and real transform libraries
		58	along with the test programs. We strongly recommend that you use GNU
		59	<CODE>make</CODE> if it is available; on some systems it is called
		60	<CODE>gmake</CODE>. The "<CODE>make install</CODE>" command installs the fftw and
		61	rfftw libraries in standard places, and typically requires root
		62	privileges (unless you specify a different install directory with the
		63	<CODE>--prefix</CODE> flag to <CODE>configure</CODE>). You can also type
		64	"<CODE>make check</CODE>" to put the FFTW test programs through their paces.
		65	If you have problems during configuration or compilation, you may want
		66	to run "<CODE>make distclean</CODE>" before trying again; this ensures that
		67	you don't have any stale files left over from previous compilation
		68	attempts.
		69
		70
		71	<P>
		72	The <CODE>configure</CODE> script knows good <CODE>CFLAGS</CODE> (C compiler flags)
		73	<A NAME="IDX316"></A>
		74	for a few systems. If your system is not known, the <CODE>configure</CODE>
		75	script will print out a warning. <A NAME="DOCF9" HREF="fftw_foot.html#FOOT9">(9)</A> In this case, you can compile
		76	FFTW with the command
		77
		78	<PRE>
		79	make CFLAGS="<write your CFLAGS here>"
		80	</PRE>
		81
		82	<P>
		83	If you do find an optimal set of <CODE>CFLAGS</CODE> for your system, please
		84	let us know what they are (along with the output of <CODE>config.guess</CODE>)
		85	so that we can include them in future releases.
		86
		87
		88	<P>
		89	The <CODE>configure</CODE> program supports all the standard flags defined by
		90	the GNU Coding Standards; see the <CODE>INSTALL</CODE> file in FFTW or
		91	<A HREF="http://www.gnu.org/prep/standards_toc.html">the GNU web page</A>.
		92	Note especially <CODE>--help</CODE> to list all flags and
		93	<CODE>--enable-shared</CODE> to create shared, rather than static, libraries.
		94	<CODE>configure</CODE> also accepts a few FFTW-specific flags, particularly:
		95
		96
		97
		98	<UL>
		99
		100	<LI>
		101
		102	<A NAME="IDX317"></A>
		103	<CODE>--enable-float</CODE> Produces a single-precision version of FFTW
		104	(<CODE>float</CODE>) instead of the default double-precision (<CODE>double</CODE>).
		105	See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>.
		106
		107	<LI>
		108
		109	<CODE>--enable-type-prefix</CODE> Adds a <SAMP>`d'</SAMP> or <SAMP>`s'</SAMP> prefix to all
		110	installed libraries and header files to indicate the floating-point
		111	precision. See Section <A HREF="fftw_6.html#SEC69">Installing FFTW in both single and double precision</A>. (<CODE>--enable-type-prefix=<prefix></CODE> lets you add an
		112	arbitrary prefix.) By default, no prefix is used.
		113
		114	<LI>
		115
		116	<A NAME="IDX318"></A>
		117	<CODE>--enable-threads</CODE> Enables compilation and installation of the FFTW
		118	threads library (see Section <A HREF="fftw_4.html#SEC48">Multi-threaded FFTW</A>), which provides a
		119	simple interface to parallel transforms for SMP systems. (By default,
		120	the threads routines are not compiled.)
		121
		122	<LI>
		123
		124	<A NAME="IDX319"></A>
		125	<CODE>--enable-mpi</CODE> Enables compilation and installation of the FFTW MPI
		126	library (see Section <A HREF="fftw_4.html#SEC55">MPI FFTW</A>), which provides parallel transforms for
		127	distributed-memory systems with MPI. (By default, the MPI routines are
		128	not compiled.)
		129
		130	<LI>
		131
		132	<A NAME="IDX320"></A>
		133	<CODE>--disable-fortran</CODE> Disables inclusion of Fortran-callable wrapper
		134	routines (see Section <A HREF="fftw_5.html#SEC62">Calling FFTW from Fortran</A>) in the standard FFTW
		135	libraries. These wrapper routines increase the library size by only a
		136	negligible amount, so they are included by default as long as the
		137	<CODE>configure</CODE> script finds a Fortran compiler on your system.
		138
		139	<LI>
		140
		141	<CODE>--with-gcc</CODE> Enables the use of <CODE>gcc</CODE>. By default, FFTW uses
		142	the vendor-supplied <CODE>cc</CODE> compiler if present. Unfortunately,
		143	<CODE>gcc</CODE> produces slower code than <CODE>cc</CODE> on many systems.
		144
		145	<LI>
		146
		147	<CODE>--enable-i386-hacks</CODE> See below.
		148
		149	<LI>
		150
		151	<CODE>--enable-pentium-timer</CODE> See below.
		152
		153	</UL>
		154
		155	<P>
		156	To force <CODE>configure</CODE> to use a particular C compiler (instead of the
		157	<A NAME="IDX321"></A>
		158	default, usually <CODE>cc</CODE>), set the environment variable <CODE>CC</CODE> to
		159	the name of the desired compiler before running <CODE>configure</CODE>; you
		160	may also need to set the flags via the variable <CODE>CFLAGS</CODE>.
		161	<A NAME="IDX322"></A>
		162
		163
		164
		165
		166	<H2><A NAME="SEC68">Installation on non-Unix Systems</A></H2>
		167
		168	<P>
		169	It is quite straightforward to install FFTW even on non-Unix systems
		170	lacking the niceties of the <CODE>configure</CODE> script. The FFTW Home Page
		171	may include some FFTW packages preconfigured for particular
		172	systems/compilers, and also contains installation notes sent in by
		173	<A NAME="IDX323"></A>
		174	users. All you really need to do, though, is to compile all of the
		175	<CODE>.c</CODE> files in the appropriate directories of the FFTW package.
		176	(You needn't worry about the many extraneous files lying around.)
		177
		178
		179	<P>
		180	For the complex transforms, compile all of the <CODE>.c</CODE> files in the
		181	<CODE>fftw</CODE> directory and link them into a library. Similarly, for the
		182	real transforms, compile all of the <CODE>.c</CODE> files in the <CODE>rfftw</CODE>
		183	directory into a library. Note that these sources <CODE>#include</CODE>
		184	various files in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you
		185	may need to set up the <CODE>#include</CODE> paths for your compiler
		186	appropriately. Be sure to enable the highest-possible level of
		187	optimization in your compiler.
		188
		189
		190	<P>
		191	<A NAME="IDX324"></A>
		192	By default, FFTW is compiled for double-precision transforms. To work
		193	in single precision rather than double precision, <CODE>#define</CODE> the
		194	symbol <CODE>FFTW_ENABLE_FLOAT</CODE> in <CODE>fftw.h</CODE> (in the <CODE>fftw</CODE>
		195	directory) and (re)compile FFTW.
		196
		197
		198	<P>
		199	These libraries should be linked with any program that uses the
		200	corresponding transforms. The required header files, <CODE>fftw.h</CODE> and
		201	<CODE>rfftw.h</CODE>, are located in the <CODE>fftw</CODE> and <CODE>rfftw</CODE>
		202	directories respectively; you may want to put them with the libraries,
		203	or wherever header files normally go on your system.
		204
		205
		206	<P>
		207	FFTW includes test programs, <CODE>fftw_test</CODE> and <CODE>rfftw_test</CODE>, in
		208	<A NAME="IDX325"></A>
		209	<A NAME="IDX326"></A>
		210	the <CODE>tests</CODE> directory. These are compiled and linked like any
		211	program using FFTW, except that they use additional header files located
		212	in the <CODE>fftw</CODE> and <CODE>rfftw</CODE> directories, so you will need to set
		213	your compiler <CODE>#include</CODE> paths appropriately. <CODE>fftw_test</CODE> is
		214	compiled from <CODE>fftw_test.c</CODE> and <CODE>test_main.c</CODE>, while
		215	<CODE>rfftw_test</CODE> is compiled from <CODE>rfftw_test.c</CODE> and
		216	<CODE>test_main.c</CODE>. When you run these programs, you will be prompted
		217	interactively for various possible tests to perform; see also
		218	<CODE>tests/README</CODE> for more information.
		219
		220
		221
		222
		223	<H2><A NAME="SEC69">Installing FFTW in both single and double precision</A></H2>
		224
		225	<P>
		226	<A NAME="IDX327"></A>
		227	It is often useful to install both single- and double-precision versions
		228	of the FFTW libraries on the same machine, and we provide a convenient
		229	mechanism for achieving this on Unix systems.
		230
		231
		232	<P>
		233	<A NAME="IDX328"></A>
		234	When the <CODE>--enable-type-prefix</CODE> option of configure is used, the
		235	FFTW libraries and header files are installed with a prefix of <SAMP>`d'</SAMP>
		236	or <SAMP>`s'</SAMP>, depending upon whether you compiled in double or single
		237	precision. Then, instead of linking your program with <CODE>-lrfftw
		238	-lfftw</CODE>, for example, you would link with <CODE>-ldrfftw -ldfftw</CODE> to use
		239	the double-precision version or with <CODE>-lsrfftw -lsfftw</CODE> to use the
		240	single-precision version. Also, you would <CODE>#include</CODE>
		241	<CODE><drfftw.h></CODE> or <CODE><srfftw.h></CODE> instead of <CODE><rfftw.h></CODE>, and
		242	so on.
		243
		244
		245	<P>
		246	<EM>The names of FFTW functions, data types, and constants remain
		247	unchanged!</EM> You still call, for instance, <CODE>fftw_one</CODE> and not
		248	<CODE>dfftw_one</CODE>. Only the names of header files and libraries are
		249	modified. One consequence of this is that <EM>you <B>cannot</B> use both
		250	the single- and double-precision FFTW libraries in the same program,
		251	simultaneously,</EM> as the function names would conflict.
		252
		253
		254	<P>
		255	So, to install both the single- and double-precision libraries on the
		256	same machine, you would do:
		257
		258
		259
		260	<PRE>
		261	./configure --enable-type-prefix <I>[ other options ]</I>
		262	make
		263	make install
		264	make clean
		265	./configure --enable-float --enable-type-prefix <I>[ other options ]</I>
		266	make
		267	make install
		268	</PRE>
		269
		270
		271
		272	<H2><A NAME="SEC70"><CODE>gcc</CODE> and Pentium/PentiumPro hacks</A></H2>
		273	<P>
		274	<A NAME="IDX329"></A>
		275	The <CODE>configure</CODE> option <CODE>--enable-i386-hacks</CODE> enables specific
		276	optimizations for <CODE>gcc</CODE> and Pentium/PentiumPro, which can
		277	significantly improve performance of double-precision transforms.
		278	Specifically, we have tested these hacks on Linux with <CODE>gcc</CODE>
		279	2.[78] and versions of <CODE>egcs</CODE> since 1.0.3. These optimizations
		280	only affect the performance, not the correctness of FFTW (i.e. it is
		281	always safe to try them out).
		282
		283
		284	<P>
		285	These hacks provide a workaround to the incorrect alignment of local
		286	<CODE>double</CODE> variables in <CODE>gcc</CODE>. The compiler aligns these
		287	<A NAME="IDX330"></A>
		288	variables to multiples of 4 bytes, but execution is much faster (on
		289	Pentium and PentiumPro) if <CODE>double</CODE>s are aligned to a multiple of 8
		290	bytes. By carefully counting the number of variables allocated by the
		291	compiler in performance-critical regions of the code, we have been able
		292	to introduce dummy allocations (using <CODE>alloca</CODE>) that align the
		293	stack properly. The hack depends crucially on the compiler flags that
		294	are used. For example, it won't work without
		295	<CODE>-fomit-frame-pointer</CODE>.
		296
		297
		298	<P>
		299	The <CODE>fftw_test</CODE> program outputs speed measurements that you can use
		300	to see if these hacks are beneficial.
		301	<A NAME="IDX331"></A>
		302	<A NAME="IDX332"></A>
		303
		304
		305	<P>
		306	The <CODE>configure</CODE> option <CODE>--enable-pentium-timer</CODE> enables the
		307	use of the Pentium and PentiumPro cycle counter for timing purposes. In
		308	order to get correct results, you must define <CODE>FFTW_CYCLES_PER_SEC</CODE>
		309	in <CODE>fftw/config.h</CODE> to be the clock speed of your processor; the
		310	resulting FFTW library will be nonportable. The use of this option is
		311	deprecated. On serious operating systems (such as Linux), FFTW uses
		312	<CODE>gettimeofday()</CODE>, which has enough resolution and is portable.
		313	(Note that Win32 has its own high-resolution timing routines as well.
		314	FFTW contains unsupported code to use these routines.)
		315
		316
		317
		318
		319	<H2><A NAME="SEC71">Customizing the timer</A></H2>
		320	<P>
		321	<A NAME="IDX333"></A>
		322
		323
		324	<P>
		325	FFTW needs a reasonably-precise clock in order to find the optimal way
		326	to compute a transform. On Unix systems, <CODE>configure</CODE> looks for
		327	<CODE>gettimeofday</CODE> and other system-specific timers. If it does not
		328	find any high resolution clock, it defaults to using the <CODE>clock()</CODE>
		329	function, which is very portable, but forces FFTW to run for a long time
		330	in order to get reliable measurements.
		331	<A NAME="IDX334"></A>
		332	<A NAME="IDX335"></A>
		333
		334
		335	<P>
		336	If your machine supports a high-resolution clock not recognized by FFTW,
		337	it is therefore advisable to use it. You must edit
		338	<CODE>fftw/fftw-int.h</CODE>. There are a few macros you must redefine. The
		339	code is documented and should be self-explanatory. (By the way,
		340	<CODE>fftw-int</CODE> stands for <CODE>fftw-internal</CODE>, but for some
		341	inexplicable reason people are still using primitive systems with 8.3
		342	filenames.)
		343
		344
		345	<P>
		346	Even if you don't install high-resolution timing code, we still
		347	recommend that you look at the <CODE>FFTW_TIME_MIN</CODE> constant in
		348	<A NAME="IDX336"></A>
		349	<CODE>fftw/fftw-int.h</CODE>. This constant holds the minimum time interval (in
		350	seconds) required to get accurate timing measurements, and should be (at
		351	least) several hundred times the resolution of your clock. The default
		352	constants are on the conservative side, and may cause FFTW to take
		353	longer than necessary when you create a plan. Set <CODE>FFTW_TIME_MIN</CODE>
		354	to whatever is appropriate on your system (be sure to set the
		355	<EM>right</EM> <CODE>FFTW_TIME_MIN</CODE>...there are several definitions in
		356	<CODE>fftw-int.h</CODE>, corresponding to different platforms and timers).
		357
		358
		359	<P>
		360	As an aid in checking the resolution of your clock, you can use the
		361	<CODE>tests/fftw_test</CODE> program with the <CODE>-t</CODE> option
		362	(c.f. <CODE>tests/README</CODE>). Remember, the mere fact that your clock
		363	reports times in, say, picoseconds, does not mean that it is actually
		364	<EM>accurate</EM> to that resolution.
		365
		366
		367
		368
		369	<H2><A NAME="SEC72">Generating your own code</A></H2>
		370	<P>
		371	<A NAME="IDX337"></A>
		372	<A NAME="IDX338"></A>
		373	<A NAME="IDX339"></A>
		374
		375
		376	<P>
		377	If you know that you will only use transforms of a certain size (say,
		378	powers of 2) and want to reduce the size of the library, you can
		379	reconfigure FFTW to support only those sizes you are interested in. You
		380	may even generate code to enable efficient transforms of a size not
		381	supported by the default distribution. The default distribution
		382	supports transforms of any size, but not all sizes are equally fast.
		383	The default installation of FFTW is best at handling sizes of the form
		384	2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
		385	11<SUP>e</SUP> 13<SUP>f</SUP>,
		386	where e+f is either 0 or
		387	1, and the other exponents are arbitrary. Other sizes are
		388	computed by means of a slow, general-purpose routine. However, if you
		389	have an application that requires fast transforms of size, say,
		390	<CODE>17</CODE>, there is a way to generate specialized code to handle that.
		391
		392
		393	<P>
		394	The directory <CODE>gensrc</CODE> contains all the programs and scripts that
		395	were used to generate FFTW. In particular, the program
		396	<CODE>gensrc/genfft.ml</CODE> was used to generate the code that FFTW uses to
		397	compute the transforms. We do not expect casual users to use it.
		398	<CODE>genfft</CODE> is a rather sophisticated program that generates directed
		399	acyclic graphs of FFT algorithms and performs algebraic simplifications
		400	on them. <CODE>genfft</CODE> is written in Objective Caml, a dialect of ML.
		401	Objective Caml is described at <A HREF="http://pauillac.inria.fr/ocaml/">http://pauillac.inria.fr/ocaml/</A>
		402	and can be downloaded from from <A HREF="ftp://ftp.inria.fr/lang/caml-light">ftp://ftp.inria.fr/lang/caml-light</A>.
		403	<A NAME="IDX340"></A>
		404	<A NAME="IDX341"></A>
		405
		406
		407	<P>
		408	If you have Objective Caml installed, you can type <CODE>sh
		409	bootstrap.sh</CODE> in the top-level directory to re-generate the files. If
		410	you change the <CODE>gensrc/config</CODE> file, you can optimize FFTW for
		411	sizes that are not currently supported efficiently (say, 17 or 19).
		412
		413
		414	<P>
		415	We do not provide more details about the code-generation process, since
		416	we do not expect that users will need to generate their own code.
		417	However, feel free to contact us at <A HREF="mailto:fftw@theory.lcs.mit.edu">fftw@theory.lcs.mit.edu</A> if
		418	you are interested in the subject.
		419
		420
		421	<P>
		422	<A NAME="IDX342"></A>
		423	You might find it interesting to learn Caml and/or some modern
		424	programming techniques that we used in the generator (including monadic
		425	programming), especially if you heard the rumor that Java and
		426	object-oriented programming are the latest advancement in the field.
		427	The internal operation of the codelet generator is described in the
		428	paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
		429	available from the <A HREF="http://theory.lcs.mit.edu/~fftw">FFTW home page</A>
		430	and will appear in the <CITE>Proceedings of the 1999 ACM SIGPLAN
		431	Conference on Programming Language Design and Implementation (PLDI)</CITE>.
		432
		433
		434	<P><HR><P>
		435	Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.
		436	</BODY>
		437	</HTML>

Subversion Repositories shark

(root)/shark/tags/rel_1_5_4/ports/fftw/doc/fftw_6.htm - Rev 1083