WebSVN - shark - Blame - Rev 1618 - /shark/branches/xen/ports/fftw/doc/fftw.tex

Rev	Author	Line No.	Line
2	pj	1	\input texinfo @c -- texinfo --
		2	@c % $Id: fftw.tex,v 1.1.1.1 2002-03-29 14:12:55 pj Exp $
		3	@c %**start of header
		4	@setfilename fftw.info
		5	@settitle FFTW
		6	@c %**end of header
		7
		8	@include version.texi
		9	@setchapternewpage odd
		10	@c define constant index (ct)
		11	@defcodeindex ct
		12	@syncodeindex ct fn
		13	@syncodeindex vr fn
		14	@syncodeindex pg fn
		15	@syncodeindex tp fn
		16	@c define foreign function index (ff)
		17	@defcodeindex ff
		18	@syncodeindex ff cp
		19	@c define foreign constant index (fc)
		20	@defcodeindex fc
		21	@syncodeindex fc cp
		22	@c define foreign program index (fp)
		23	@defcodeindex fp
		24	@syncodeindex fp cp
		25
		26	@ifinfo
		27	This is the FFTW User's manual.
		28
		29	Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology
		30
		31	Permission is granted to make and distribute verbatim copies of this
		32	manual provided the copyright notice and this permission notice are
		33	preserved on all copies.
		34
		35	Permission is granted to copy and distribute modified versions of this
		36	manual under the conditions for verbatim copying, provided that the
		37	entire resulting derived work is distributed under the terms of a
		38	permission notice identical to this one.
		39
		40	Permission is granted to copy and distribute translations of this manual
		41	into another language, under the above conditions for modified versions,
		42	except that this permission notice may be stated in a translation
		43	approved by the Free Software Foundation.
		44	@end ifinfo
		45
		46	@titlepage
		47	@sp 10
		48	@comment The title is printed in a large font.
		49	@title{FFTW User's Manual}
		50	@subtitle For version @value{VERSION}, @value{UPDATED}
		51	@author{Matteo Frigo}
		52	@author{Steven G. Johnson}
		53
		54	@c The following two commands start the copyright page.
		55	@page
		56	@vskip 0pt plus 1filll
		57	Copyright @copyright{} 1997--1999 Massachusetts Institute of Technology.
		58
		59	Permission is granted to make and distribute verbatim copies of this
		60	manual provided the copyright notice and this permission notice are
		61	preserved on all copies.
		62
		63	Permission is granted to copy and distribute modified versions of this
		64	manual under the conditions for verbatim copying, provided that the
		65	entire resulting derived work is distributed under the terms of a
		66	permission notice identical to this one.
		67
		68	Permission is granted to copy and distribute translations of this manual
		69	into another language, under the above conditions for modified versions,
		70	except that this permission notice may be stated in a translation
		71	approved by the Free Software Foundation.
		72	@end titlepage
		73
		74	@node Top, Introduction, (dir), (dir)
		75	@ifinfo
		76	@top FFTW User Manual
		77	Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a
		78	collection of fast C routines to compute the discrete Fourier transform.
		79	This manual documents FFTW version @value{VERSION}.
		80	@end ifinfo
		81
		82	@menu
		83	* Introduction::
		84	* Tutorial::
		85	* FFTW Reference::
		86	* Parallel FFTW::
		87	* Calling FFTW from Fortran::
		88	* Installation and Customization::
		89	* Acknowledgments::
		90	* License and Copyright::
		91	* Concept Index::
		92	* Library Index::
		93
		94	@detailmenu --- The Detailed Node Listing ---
		95
		96	Tutorial
		97
		98	* Complex One-dimensional Transforms Tutorial::
		99	* Complex Multi-dimensional Transforms Tutorial::
		100	* Real One-dimensional Transforms Tutorial::
		101	* Real Multi-dimensional Transforms Tutorial::
		102	* Multi-dimensional Array Format::
		103	* Words of Wisdom::
		104
		105	Multi-dimensional Array Format
		106
		107	* Row-major Format::
		108	* Column-major Format::
		109	* Static Arrays in C::
		110	* Dynamic Arrays in C::
		111	* Dynamic Arrays in C-The Wrong Way::
		112
		113	Words of Wisdom
		114
		115	* Caveats in Using Wisdom:: What you should worry about in using wisdom
		116	* Importing and Exporting Wisdom:: I/O of wisdom to disk and other media
		117
		118	FFTW Reference
		119
		120	* Data Types:: real, complex, and halfcomplex numbers
		121	* One-dimensional Transforms Reference::
		122	* Multi-dimensional Transforms Reference::
		123	* Real One-dimensional Transforms Reference::
		124	* Real Multi-dimensional Transforms Reference::
		125	* Wisdom Reference::
		126	* Memory Allocator Reference::
		127	* Thread safety::
		128
		129	One-dimensional Transforms Reference
		130
		131	* fftw_create_plan:: Plan Creation
		132	* Discussion on Specific Plans::
		133	* fftw:: Plan Execution
		134	* fftw_destroy_plan:: Plan Destruction
		135	* What FFTW Really Computes:: Definition of the DFT.
		136
		137	Multi-dimensional Transforms Reference
		138
		139	* fftwnd_create_plan:: Plan Creation
		140	* fftwnd:: Plan Execution
		141	* fftwnd_destroy_plan:: Plan Destruction
		142	* What FFTWND Really Computes::
		143
		144	Real One-dimensional Transforms Reference
		145
		146	* rfftw_create_plan:: Plan Creation
		147	* rfftw:: Plan Execution
		148	* rfftw_destroy_plan:: Plan Destruction
		149	* What RFFTW Really Computes::
		150
		151	Real Multi-dimensional Transforms Reference
		152
		153	* rfftwnd_create_plan:: Plan Creation
		154	* rfftwnd:: Plan Execution
		155	* Array Dimensions for Real Multi-dimensional Transforms::
		156	* Strides in In-place RFFTWND::
		157	* rfftwnd_destroy_plan:: Plan Destruction
		158	* What RFFTWND Really Computes::
		159
		160	Wisdom Reference
		161
		162	* fftw_export_wisdom::
		163	* fftw_import_wisdom::
		164	* fftw_forget_wisdom::
		165
		166	Parallel FFTW
		167
		168	* Multi-threaded FFTW::
		169	* MPI FFTW::
		170
		171	Multi-threaded FFTW
		172
		173	* Installation and Supported Hardware/Software::
		174	* Usage of Multi-threaded FFTW::
		175	* How Many Threads to Use?::
		176	* Using Multi-threaded FFTW in a Multi-threaded Program::
		177	* Tips for Optimal Threading::
		178
		179	MPI FFTW
		180
		181	* MPI FFTW Installation::
		182	* Usage of MPI FFTW for Complex Multi-dimensional Transforms::
		183	* MPI Data Layout::
		184	* Usage of MPI FFTW for Real Multi-dimensional Transforms::
		185	* Usage of MPI FFTW for Complex One-dimensional Transforms::
		186	* MPI Tips::
		187
		188	Calling FFTW from Fortran
		189
		190	* Wrapper Routines::
		191	* FFTW Constants in Fortran::
		192	* Fortran Examples::
		193
		194	Installation and Customization
		195
		196	* Installation on Unix::
		197	* Installation on non-Unix Systems::
		198	* Installing FFTW in both single and double precision::
		199	* gcc and Pentium/PentiumPro hacks::
		200	* Customizing the timer::
		201	* Generating your own code::
		202
		203	@end detailmenu
		204	@end menu
		205
		206	@c ************************************************************
		207	@node Introduction, Tutorial, Top, Top
		208	@chapter Introduction
		209	This manual documents version @value{VERSION} of FFTW, the @emph{Fastest
		210	Fourier Transform in the West}. FFTW is a comprehensive collection of
		211	fast C routines for computing the discrete Fourier transform (DFT) in
		212	one or more dimensions, of both real and complex data, and of arbitrary
		213	input size. FFTW also includes parallel transforms for both shared- and
		214	distributed-memory systems. We assume herein that the reader is already
		215	familiar with the properties and uses of the DFT that are relevant to
		216	her application. Otherwise, see e.g. @cite{The Fast Fourier Transform}
		217	by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974).
		218	@uref{http://theory.lcs.mit.edu/~fftw, Our web page} also has links to
		219	FFT-related information online.
		220	@cindex FFTW
		221
		222	FFTW is usually faster (and sometimes much faster) than all other
		223	freely-available Fourier transform programs found on the Net. For
		224	transforms whose size is a power of two, it compares favorably with the
		225	FFT codes in Sun's Performance Library and IBM's ESSL library, which are
		226	targeted at specific machines. Moreover, FFTW's performance is
		227	@emph{portable}. Indeed, FFTW is unique in that it automatically adapts
		228	itself to your machine, your cache, the size of your memory, the number
		229	of registers, and all the other factors that normally make it impossible
		230	to optimize a program for more than one machine. An extensive
		231	comparison of FFTW's performance with that of other Fourier transform
		232	codes has been made. The results are available on the Web at
		233	@uref{http://theory.lcs.mit.edu/~benchfft, the benchFFT home page}.
		234	@cindex benchmark
		235	@fpindex benchfft
		236
		237	In order to use FFTW effectively, you need to understand one basic
		238	concept of FFTW's internal structure. FFTW does not used a fixed
		239	algorithm for computing the transform, but it can adapt the DFT
		240	algorithm to details of the underlying hardware in order to achieve best
		241	performance. Hence, the computation of the transform is split into two
		242	phases. First, FFTW's @dfn{planner} is called, which ``learns'' the
		243	@cindex plan
		244	fastest way to compute the transform on your machine. The planner
		245	@cindex planner
		246	produces a data structure called a @dfn{plan} that contains this
		247	information. Subsequently, the plan is passed to FFTW's @dfn{executor},
		248	@cindex executor
		249	along with an array of input data. The executor computes the actual
		250	transform, as dictated by the plan. The plan can be reused as many
		251	times as needed. In typical high-performance applications, many
		252	transforms of the same size are computed, and consequently a
		253	relatively-expensive initialization of this sort is acceptable. On the
		254	other hand, if you need a single transform of a given size, the one-time
		255	cost of the planner becomes significant. For this case, FFTW provides
		256	fast planners based on heuristics or on previously computed plans.
		257
		258	The pattern of planning/execution applies to all four operation modes of
		259	FFTW, that is, @w{I) one-dimensional} complex transforms (FFTW), @w{II)
		260	multi-dimensional} complex transforms (FFTWND), @w{III) one-dimensional}
		261	transforms of real data (RFFTW), @w{IV) multi-dimensional} transforms of
		262	real data (RFFTWND). Each mode comes with its own planner and executor.
		263
		264	Besides the automatic performance adaptation performed by the planner,
		265	it is also possible for advanced users to customize FFTW for their
		266	special needs. As distributed, FFTW works most efficiently for arrays
		267	whose size can be factored into small primes (@math{2}, @math{3},
		268	@math{5}, and @math{7}), and uses a slower general-purpose routine for
		269	other factors. FFTW, however, comes with a code generator that can
		270	produce fast C programs for any particular array size you may care
		271	about.
		272	@cindex code generator
		273	For example, if you need transforms of size
		274	@ifinfo
		275	@math{513 = 19 x 3^3},
		276	@end ifinfo
		277	@iftex
		278	@tex
		279	$513 = 19 \cdot 3^3$,
		280	@end tex
		281	@end iftex
		282	@ifhtml
		283	513 = 19*3<sup>3</sup>,
		284	@end ifhtml
		285	you can customize FFTW to support the factor @math{19} efficiently.
		286
		287	FFTW can exploit multiple processors if you have them. FFTW comes with
		288	a shared-memory implementation on top of POSIX (and similar) threads, as
		289	well as a distributed-memory implementation based on MPI.
		290	@cindex parallel transform
		291	@cindex threads
		292	@cindex MPI
		293	We also provide an experimental parallel implementation written in Cilk,
		294	@emph{the superior programming tool of choice for discriminating
		295	hackers} (Olin Shivers). (See @uref{http://supertech.lcs.mit.edu/cilk,
		296	the Cilk home page}.)
		297	@cindex Cilk
		298
		299	For more information regarding FFTW, see the paper, ``The Fastest
		300	Fourier Transform in the West,'' by M. Frigo and S. G. Johnson, which is
		301	the technical report MIT-LCS-TR-728 (Sep. '97). See also, ``FFTW: An
		302	Adaptive Software Architecture for the FFT,'' by M. Frigo and
		303	S. G. Johnson, which appeared in the 23rd International Conference on
		304	Acoustics, Speech, and Signal Processing (@cite{Proc. ICASSP 1998}
		305	@b{3}, p. 1381). The code generator is described in the paper ``A Fast
		306	Fourier Transform Compiler'',
		307	@cindex compiler
		308	by M. Frigo, to appear in the @cite{Proceedings of the 1999 ACM SIGPLAN
		309	Conference on Programming Language Design and Implementation (PLDI),
		310	Atlanta, Georgia, May 1999}. These papers, along with the latest
		311	version of FFTW, the FAQ, benchmarks, and other links, are available at
		312	@uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}. The current
		313	version of FFTW incorporates many good ideas from the past thirty years
		314	of FFT literature. In one way or another, FFTW uses the Cooley-Tukey
		315	algorithm, the Prime Factor algorithm, Rader's algorithm for prime
		316	sizes, and the split-radix algorithm (with a variation due to Dan
		317	Bernstein). Our code generator also produces new algorithms that we do
		318	not yet completely understand.
		319	@cindex algorithm
		320	The reader is referred to the cited papers for the appropriate
		321	references.
		322
		323	The rest of this manual is organized as follows. We first discuss the
		324	sequential (one-processor) implementation. We start by describing the
		325	basic features of FFTW in @ref{Tutorial}. This discussion includes the
		326	storage scheme of multi-dimensional arrays (@ref{Multi-dimensional Array
		327	Format}) and FFTW's mechanisms for storing plans on disk (@ref{Words of
		328	Wisdom}). Next, @ref{FFTW Reference} provides comprehensive
		329	documentation of all FFTW's features. Parallel transforms are discussed
		330	in their own chapter @ref{Parallel FFTW}. Fortran programmers can also
		331	use FFTW, as described in @ref{Calling FFTW from Fortran}.
		332	@ref{Installation and Customization} explains how to install FFTW in
		333	your computer system and how to adapt FFTW to your needs. License and
		334	copyright information is given in @ref{License and Copyright}. Finally,
		335	we thank all the people who helped us in @ref{Acknowledgments}.
		336
		337	@c ************************************************************
		338	@node Tutorial, FFTW Reference, Introduction, Top
		339	@chapter Tutorial
		340	@cindex Tutorial
		341	This chapter describes the basic usage of FFTW, i.e., how to compute the
		342	Fourier transform of a single array. This chapter tells the truth, but
		343	not the @emph{whole} truth. Specifically, FFTW implements additional
		344	routines and flags, providing extra functionality, that are not
		345	documented here. @xref{FFTW Reference}, for more complete information.
		346	(Note that you need to compile and install FFTW before you can use it in
		347	a program. @xref{Installation and Customization}, for the details of
		348	the installation.)
		349
		350	This tutorial chapter is structured as follows. @ref{Complex
		351	One-dimensional Transforms Tutorial} describes the basic usage of the
		352	one-dimensional transform of complex data. @ref{Complex
		353	Multi-dimensional Transforms Tutorial} describes the basic usage of the
		354	multi-dimensional transform of complex data. @ref{Real One-dimensional
		355	Transforms Tutorial} describes the one-dimensional transform of real
		356	data and its inverse. Finally, @ref{Real Multi-dimensional Transforms
		357	Tutorial} describes the multi-dimensional transform of real data and its
		358	inverse. We recommend that you read these sections in the order that
		359	they are presented. We then discuss two topics in detail. In
		360	@ref{Multi-dimensional Array Format}, we discuss the various
		361	alternatives for storing multi-dimensional arrays in memory. @ref{Words
		362	of Wisdom} shows how you can save FFTW's plans for future use.
		363
		364	@menu
		365	* Complex One-dimensional Transforms Tutorial::
		366	* Complex Multi-dimensional Transforms Tutorial::
		367	* Real One-dimensional Transforms Tutorial::
		368	* Real Multi-dimensional Transforms Tutorial::
		369	* Multi-dimensional Array Format::
		370	* Words of Wisdom::
		371	@end menu
		372
		373	@node Complex One-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial, Tutorial
		374	@section Complex One-dimensional Transforms Tutorial
		375	@cindex complex one-dimensional transform
		376	@cindex complex transform
		377
		378	The basic usage of FFTW is simple. A typical call to FFTW looks like:
		379
		380	@example
		381	#include <fftw.h>
		382	...
		383	@{
		384	fftw_complex in[N], out[N];
		385	fftw_plan p;
		386	...
		387	p = fftw_create_plan(N, FFTW_FORWARD, FFTW_ESTIMATE);
		388	...
		389	fftw_one(p, in, out);
		390	...
		391	fftw_destroy_plan(p);
		392	@}
		393	@end example
		394
		395	The first thing we do is to create a @dfn{plan}, which is an object
		396	@cindex plan
		397	that contains all the data that FFTW needs to compute the FFT, using the
		398	following function:
		399
		400	@example
		401	fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);
		402	@end example
		403	@findex fftw_create_plan
		404	@findex fftw_direction
		405	@tindex fftw_plan
		406
		407	The first argument, @code{n}, is the size of the transform you are
		408	trying to compute. The size @code{n} can be any positive integer, but
		409	sizes that are products of small factors are transformed most
		410	efficiently. The second argument, @code{dir}, can be either
		411	@code{FFTW_FORWARD} or @code{FFTW_BACKWARD}, and indicates the direction
		412	of the transform you
		413	@ctindex FFTW_FORWARD
		414	@ctindex FFTW_BACKWARD
		415	are interested in. Alternatively, you can use the sign of the exponent
		416	in the transform, @math{-1} or @math{+1}, which corresponds to
		417	@code{FFTW_FORWARD} or @code{FFTW_BACKWARD} respectively. The
		418	@code{flags} argument is either @code{FFTW_MEASURE} or
		419	@cindex flags
		420	@code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} means that FFTW actually runs
		421	@ctindex FFTW_MEASURE
		422	and measures the execution time of several FFTs in order to find the
		423	best way to compute the transform of size @code{n}. This may take some
		424	time, depending on your installation and on the precision of the timer
		425	in your machine. @code{FFTW_ESTIMATE}, on the contrary, does not run
		426	any computation, and just builds a
		427	@ctindex FFTW_ESTIMATE
		428	reasonable plan, which may be sub-optimal. In other words, if your
		429	program performs many transforms of the same size and initialization
		430	time is not important, use @code{FFTW_MEASURE}; otherwise use the
		431	estimate. (A compromise between these two extremes exists. @xref{Words
		432	of Wisdom}.)
		433
		434	Once the plan has been created, you can use it as many times as you like
		435	for transforms on arrays of the same size. When you are done with the
		436	plan, you deallocate it by calling @code{fftw_destroy_plan(plan)}.
		437	@findex fftw_destroy_plan
		438
		439	The transform itself is computed by passing the plan along with the
		440	input and output arrays to @code{fftw_one}:
		441
		442	@example
		443	void fftw_one(fftw_plan plan, fftw_complex in, fftw_complex out);
		444	@end example
		445	@findex fftw_one
		446
		447	Note that the transform is out of place: @code{in} and @code{out} must
		448	point to distinct arrays. It operates on data of type
		449	@code{fftw_complex}, a data structure with real (@code{in[i].re}) and
		450	imaginary (@code{in[i].im}) floating-point components. The @code{in}
		451	and @code{out} arrays should have the length specified when the plan was
		452	created. An alternative function, @code{fftw}, allows you to
		453	efficiently perform multiple and/or strided transforms (@pxref{FFTW
		454	Reference}).
		455	@tindex fftw_complex
		456
		457	The DFT results are stored in-order in the array @code{out}, with the
		458	zero-frequency (DC) component in @code{out[0]}.
		459	@cindex frequency
		460	The array @code{in} is not modified. Users should note that FFTW
		461	computes an unnormalized DFT, the sign of whose exponent is given by the
		462	@code{dir} parameter of @code{fftw_create_plan}. Thus, computing a
		463	forward followed by a backward transform (or vice versa) results in the
		464	original array scaled by @code{n}. @xref{What FFTW Really Computes},
		465	for the definition of DFT.
		466	@cindex normalization
		467
		468	A program using FFTW should be linked with @code{-lfftw -lm} on Unix
		469	systems, or with the FFTW and standard math libraries in general.
		470	@cindex linking on Unix
		471
		472	@node Complex Multi-dimensional Transforms Tutorial, Real One-dimensional Transforms Tutorial, Complex One-dimensional Transforms Tutorial, Tutorial
		473	@section Complex Multi-dimensional Transforms Tutorial
		474	@cindex complex multi-dimensional transform
		475	@cindex multi-dimensional transform
		476
		477	FFTW can also compute transforms of any number of dimensions
		478	(@dfn{rank}). The syntax is similar to that for the one-dimensional
		479	@cindex rank
		480	transforms, with @samp{fftw_} replaced by @samp{fftwnd_} (which stands
		481	for ``@code{fftw} in @code{N} dimensions'').
		482
		483	As before, we @code{#include <fftw.h>} and create a plan for the
		484	transforms, this time of type @code{fftwnd_plan}:
		485
		486	@example
		487	fftwnd_plan fftwnd_create_plan(int rank, const int *n,
		488	fftw_direction dir, int flags);
		489	@end example
		490	@tindex fftwnd_plan
		491	@tindex fftw_direction
		492	@findex fftwnd_create_plan
		493
		494	@code{rank} is the dimensionality of the array, and can be any
		495	non-negative integer. The next argument, @code{n}, is a pointer to an
		496	integer array of length @code{rank} containing the (positive) sizes of
		497	each dimension of the array. (Note that the array will be stored in
		498	row-major order. @xref{Multi-dimensional Array Format}, for information
		499	on row-major order.) The last two parameters are the same as in
		500	@code{fftw_create_plan}. We now, however, have an additional possible
		501	flag, @code{FFTW_IN_PLACE}, since @code{fftwnd} supports true in-place
		502	@cindex flags
		503	@ctindex FFTW_IN_PLACE
		504	@findex fftwnd
		505	transforms. Multiple flags are combined using a bitwise @dfn{or}
		506	(@samp{\|}). (An @dfn{in-place} transform is one in which the output
		507	data overwrite the input data. It thus requires half as much memory
		508	as---and is often faster than---its opposite, an @dfn{out-of-place}
		509	transform.)
		510	@cindex in-place transform
		511	@cindex out-of-place transform
		512
		513	For two- and three-dimensional transforms, FFTWND provides alternative
		514	routines that accept the sizes of each dimension directly, rather than
		515	indirectly through a rank and an array of sizes. These are otherwise
		516	identical to @code{fftwnd_create_plan}, and are sometimes more
		517	convenient:
		518
		519	@example
		520	fftwnd_plan fftw2d_create_plan(int nx, int ny,
		521	fftw_direction dir, int flags);
		522	fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz,
		523	fftw_direction dir, int flags);
		524	@end example
		525	@findex fftw2d_create_plan
		526	@findex fftw3d_create_plan
		527
		528	Once the plan has been created, you can use it any number of times for
		529	transforms of the same size. When you do not need a plan anymore, you
		530	can deallocate the plan by calling @code{fftwnd_destroy_plan(plan)}.
		531	@findex fftwnd_destroy_plan
		532
		533	Given a plan, you can compute the transform of an array of data by
		534	calling:
		535
		536	@example
		537	void fftwnd_one(fftwnd_plan plan, fftw_complex in, fftw_complex out);
		538	@end example
		539	@findex fftwnd_one
		540
		541	Here, @code{in} and @code{out} point to multi-dimensional arrays in
		542	row-major order, of the size specified when the plan was created. In
		543	the case of an in-place transform, the @code{out} parameter is ignored
		544	and the output data are stored in the @code{in} array. The results are
		545	stored in-order, unnormalized, with the zero-frequency component in
		546	@code{out[0]}.
		547	@cindex frequency
		548	A forward followed by a backward transform (or vice-versa) yields the
		549	original data multiplied by the size of the array (i.e. the product of
		550	the dimensions). @xref{What FFTWND Really Computes}, for a discussion
		551	of what FFTWND computes.
		552	@cindex normalization
		553
		554	For example, code to perform an in-place FFT of a three-dimensional
		555	array might look like:
		556
		557	@example
		558	#include <fftw.h>
		559	...
		560	@{
		561	fftw_complex in[L][M][N];
		562	fftwnd_plan p;
		563	...
		564	p = fftw3d_create_plan(L, M, N, FFTW_FORWARD,
		565	FFTW_MEASURE \| FFTW_IN_PLACE);
		566	...
		567	fftwnd_one(p, &in[0][0][0], NULL);
		568	...
		569	fftwnd_destroy_plan(p);
		570	@}
		571	@end example
		572
		573	Note that @code{in} is a statically-declared array, which is
		574	automatically in row-major order, but we must take the address of the
		575	first element in order to fit the type expected by @code{fftwnd_one}.
		576	(@xref{Multi-dimensional Array Format}.)
		577
		578	@node Real One-dimensional Transforms Tutorial, Real Multi-dimensional Transforms Tutorial, Complex Multi-dimensional Transforms Tutorial, Tutorial
		579	@section Real One-dimensional Transforms Tutorial
		580	@cindex real transform
		581	@cindex complex to real transform
		582	@cindex RFFTW
		583
		584	If the input data are purely real, you can save roughly a factor of two
		585	in both time and storage by using the @dfn{rfftw} transforms, which are
		586	FFTs specialized for real data. The output of a such a transform is a
		587	@dfn{halfcomplex} array, which consists of only half of the complex DFT
		588	amplitudes (since the negative-frequency amplitudes for real data are
		589	the complex conjugate of the positive-frequency amplitudes).
		590	@cindex halfcomplex array
		591
		592	In exchange for these speed and space advantages, the user sacrifices
		593	some of the simplicity of FFTW's complex transforms. First of all, to
		594	allow maximum performance, the output format of the one-dimensional real
		595	transforms is different from that used by the multi-dimensional
		596	transforms. Second, the inverse transform (halfcomplex to real) has the
		597	side-effect of destroying its input array. Neither of these
		598	inconveniences should pose a serious problem for users, but it is
		599	important to be aware of them. (Both the inconvenient output format
		600	and the side-effect of the inverse transform can be ameliorated for
		601	one-dimensional transforms, at the expense of some performance, by using
		602	instead the multi-dimensional transform routines with a rank of one.)
		603
		604	The computation of the plan is similar to that for the complex
		605	transforms. First, you @code{#include <rfftw.h>}. Then, you create a
		606	plan (of type @code{rfftw_plan}) by calling:
		607
		608	@example
		609	rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags);
		610	@end example
		611	@tindex rfftw_plan
		612	@tindex fftw_direction
		613	@findex rfftw_create_plan
		614
		615	@code{n} is the length of the @emph{real} array in the transform (even
		616	for halfcomplex-to-real transforms), and can be any positive integer
		617	(although sizes with small factors are transformed more efficiently).
		618	@code{dir} is either @code{FFTW_REAL_TO_COMPLEX} or
		619	@code{FFTW_COMPLEX_TO_REAL}.
		620	@ctindex FFTW_REAL_TO_COMPLEX
		621	@ctindex FFTW_COMPLEX_TO_REAL
		622	The @code{flags} parameter is the same as in @code{fftw_create_plan}.
		623
		624	Once created, a plan can be used for any number of transforms, and is
		625	deallocated when you are done with it by calling
		626	@code{rfftw_destroy_plan(plan)}.
		627	@findex rfftw_destroy_plan
		628
		629	Given a plan, a real-to-complex or complex-to-real transform is computed
		630	by calling:
		631
		632	@example
		633	void rfftw_one(rfftw_plan plan, fftw_real in, fftw_real out);
		634	@end example
		635	@findex rfftw_one
		636
		637	(Note that @code{fftw_real} is an alias for the floating-point type for
		638	which FFTW was compiled.) Depending upon the direction of the plan,
		639	either the input or the output array is halfcomplex, and is stored in
		640	the following format:
		641	@cindex halfcomplex array
		642
		643	@iftex
		644	@tex
		645	$$
		646	r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
		647	$$
		648	@end tex
		649	@end iftex
		650	@ifinfo
		651	r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
		652	@end ifinfo
		653	@ifhtml
		654	<p align=center>
		655	r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
		656	</p>
		657	@end ifhtml
		658
		659	Here,
		660	@ifinfo
		661	rk
		662	@end ifinfo
		663	@iftex
		664	@tex
		665	$r_k$
		666	@end tex
		667	@end iftex
		668	@ifhtml
		669	r<sub>k</sub>
		670	@end ifhtml
		671	is the real part of the @math{k}th output, and
		672	@ifinfo
		673	ik
		674	@end ifinfo
		675	@iftex
		676	@tex
		677	$i_k$
		678	@end tex
		679	@end iftex
		680	@ifhtml
		681	i<sub>k</sub>
		682	@end ifhtml
		683	is the imaginary part. (We follow here the C convention that integer
		684	division is rounded down, e.g. @math{7 / 2 = 3}.) For a halfcomplex
		685	array @code{hc[]}, the @math{k}th component has its real part in
		686	@code{hc[k]} and its imaginary part in @code{hc[n-k]}, with the
		687	exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter only
		688	if n is even)---in these two cases, the imaginary part is zero due to
		689	symmetries of the real-complex transform, and is not stored. Thus, the
		690	transform of @code{n} real values is a halfcomplex array of length
		691	@code{n}, and vice versa. @footnote{The output for the
		692	multi-dimensional rfftw is a more-conventional array of
		693	@code{fftw_complex} values, but the format here permitted us greater
		694	efficiency in one dimension.} This is actually only half of the DFT
		695	spectrum of the data. Although the other half can be obtained by
		696	complex conjugation, it is not required by many applications such as
		697	convolution and filtering.
		698
		699	Like the complex transforms, the RFFTW transforms are unnormalized, so a
		700	forward followed by a backward transform (or vice-versa) yields the
		701	original data scaled by the length of the array, @code{n}.
		702	@cindex normalization
		703
		704	Let us reiterate here our warning that an @code{FFTW_COMPLEX_TO_REAL}
		705	transform has the side-effect of destroying its (halfcomplex) input.
		706	The @code{FFTW_REAL_TO_COMPLEX} transform, however, leaves its (real)
		707	input untouched, just as you would hope.
		708
		709	As an example, here is an outline of how you might use RFFTW to compute
		710	the power spectrum of a real array (i.e. the squares of the absolute
		711	values of the DFT amplitudes):
		712	@cindex power spectrum
		713
		714	@example
		715	#include <rfftw.h>
		716	...
		717	@{
		718	fftw_real in[N], out[N], power_spectrum[N/2+1];
		719	rfftw_plan p;
		720	int k;
		721	...
		722	p = rfftw_create_plan(N, FFTW_REAL_TO_COMPLEX, FFTW_ESTIMATE);
		723	...
		724	rfftw_one(p, in, out);
		725	power_spectrum[0] = out[0]out[0]; / DC component */
		726	for (k = 1; k < (N+1)/2; ++k) /* (k < N/2 rounded up) */
		727	power_spectrum[k] = out[k]out[k] + out[N-k]out[N-k];
		728	if (N % 2 == 0) /* N is even */
		729	power_spectrum[N/2] = out[N/2]out[N/2]; / Nyquist freq. */
		730	...
		731	rfftw_destroy_plan(p);
		732	@}
		733	@end example
		734
		735	Programs using RFFTW should link with @code{-lrfftw -lfftw -lm} on Unix,
		736	or with the FFTW, RFFTW, and math libraries in general.
		737	@cindex linking on Unix
		738
		739	@node Real Multi-dimensional Transforms Tutorial, Multi-dimensional Array Format, Real One-dimensional Transforms Tutorial, Tutorial
		740	@section Real Multi-dimensional Transforms Tutorial
		741	@cindex real multi-dimensional transform
		742
		743	FFTW includes multi-dimensional transforms for real data of any rank.
		744	As with the one-dimensional real transforms, they save roughly a factor
		745	of two in time and storage over complex transforms of the same size.
		746	Also as in one dimension, these gains come at the expense of some
		747	increase in complexity---the output format is different from the
		748	one-dimensional RFFTW (and is more similar to that of the complex FFTW)
		749	and the inverse (complex to real) transforms have the side-effect of
		750	overwriting their input data.
		751
		752	To use the real multi-dimensional transforms, you first @code{#include
		753	<rfftw.h>} and then create a plan for the size and direction of
		754	transform that you are interested in:
		755
		756	@example
		757	rfftwnd_plan rfftwnd_create_plan(int rank, const int *n,
		758	fftw_direction dir, int flags);
		759	@end example
		760	@tindex rfftwnd_plan
		761	@findex rfftwnd_create_plan
		762
		763	The first two parameters describe the size of the real data (not the
		764	halfcomplex data, which will have different dimensions). The last two
		765	parameters are the same as those for @code{rfftw_create_plan}. Just as
		766	for fftwnd, there are two alternate versions of this routine,
		767	@code{rfftw2d_create_plan} and @code{rfftw3d_create_plan}, that are
		768	sometimes more convenient for two- and three-dimensional transforms.
		769	@findex rfftw2d_create_plan
		770	@findex rfftw3d_create_plan
		771	Also as in fftwnd, rfftwnd supports true in-place transforms, specified
		772	by including @code{FFTW_IN_PLACE} in the flags.
		773
		774	Once created, a plan can be used for any number of transforms, and is
		775	deallocated by calling @code{rfftwnd_destroy_plan(plan)}.
		776
		777	Given a plan, the transform is computed by calling one of the following
		778	two routines:
		779
		780	@example
		781	void rfftwnd_one_real_to_complex(rfftwnd_plan plan,
		782	fftw_real in, fftw_complex out);
		783	void rfftwnd_one_complex_to_real(rfftwnd_plan plan,
		784	fftw_complex in, fftw_real out);
		785	@end example
		786	@findex rfftwnd_one_real_to_complex
		787	@findex rfftwnd_one_complex_to_real
		788
		789	As is clear from their names and parameter types, the former function is
		790	for @code{FFTW_REAL_TO_COMPLEX} transforms and the latter is for
		791	@code{FFTW_COMPLEX_TO_REAL} transforms. (We could have used only a
		792	single routine, since the direction of the transform is encoded in the
		793	plan, but we wanted to correctly express the datatypes of the
		794	parameters.) The latter routine, as we discuss elsewhere, has the
		795	side-effect of overwriting its input (except when the rank of the array
		796	is one). In both cases, the @code{out} parameter is ignored for
		797	in-place transforms.
		798
		799	The format of the complex arrays deserves careful attention.
		800	@cindex rfftwnd array format
		801	Suppose that the real data has dimensions
		802	@iftex
		803	@tex
		804	$n_1 \times n_2 \times \cdots \times n_d$
		805	@end tex
		806	@end iftex
		807	@ifinfo
		808	n1 x n2 x ... x nd
		809	@end ifinfo
		810	@ifhtml
		811	n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub>
		812	@end ifhtml
		813	(in row-major order). Then, after a real-to-complex transform, the
		814	output is an
		815	@iftex
		816	@tex
		817	$n_1 \times n_2 \times \cdots \times (n_d/2+1)$
		818	@end tex
		819	@end iftex
		820	@ifinfo
		821	n1 x n2 x ... x (nd/2+1)
		822	@end ifinfo
		823	@ifhtml
		824	n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1)
		825	@end ifhtml
		826	array of @code{fftw_complex} values in row-major order, corresponding to
		827	slightly over half of the output of the corresponding complex transform.
		828	(Note that the division is rounded down.) The ordering of the data is
		829	otherwise exactly the same as in the complex case. (In principle, the
		830	output could be exactly half the size of the complex transform output,
		831	but in more than one dimension this requires too complicated a format to
		832	be practical.) Note that, unlike the one-dimensional RFFTW, the real
		833	and imaginary parts of the DFT amplitudes are here stored together in
		834	the natural way.
		835
		836	Since the complex data is slightly larger than the real data, some
		837	complications arise for in-place transforms. In this case, the final
		838	dimension of the real data must be padded with extra values to
		839	accommodate the size of the complex data---two extra if the last
		840	dimension is even and one if it is odd.
		841	@cindex padding
		842	That is, the last dimension of the real data must physically contain
		843	@iftex
		844	@tex
		845	$2 (n_d/2+1)$
		846	@end tex
		847	@end iftex
		848	@ifinfo
		849	2 * (nd/2+1)
		850	@end ifinfo
		851	@ifhtml
		852	2 * (n<sub>d</sub>/2+1)
		853	@end ifhtml
		854	@code{fftw_real} values (exactly enough to hold the complex data).
		855	This physical array size does not, however, change the @emph{logical}
		856	array size---only
		857	@iftex
		858	@tex
		859	$n_d$
		860	@end tex
		861	@end iftex
		862	@ifinfo
		863	nd
		864	@end ifinfo
		865	@ifhtml
		866	n<sub>d</sub>
		867	@end ifhtml
		868	values are actually stored in the last dimension, and
		869	@iftex
		870	@tex
		871	$n_d$
		872	@end tex
		873	@end iftex
		874	@ifinfo
		875	nd
		876	@end ifinfo
		877	@ifhtml
		878	n<sub>d</sub>
		879	@end ifhtml
		880	is the last dimension passed to @code{rfftwnd_create_plan}.
		881
		882	For example, consider the transform of a two-dimensional real array of
		883	size @code{nx} by @code{ny}. The output of the @code{rfftwnd} transform
		884	is a two-dimensional real array of size @code{nx} by @code{ny/2+1},
		885	where the @code{y} dimension has been cut nearly in half because of
		886	redundancies in the output. Because @code{fftw_complex} is twice the
		887	size of @code{fftw_real}, the output array is slightly bigger than the
		888	input array. Thus, if we want to compute the transform in place, we
		889	must @emph{pad} the input array so that it is of size @code{nx} by
		890	@code{2*(ny/2+1)}. If @code{ny} is even, then there are two padding
		891	elements at the end of each row (which need not be initialized, as they
		892	are only used for output).
		893	@ifhtml
		894	The following illustration depicts the input and output arrays just
		895	described, for both the out-of-place and in-place transforms (with the
		896	arrows indicating consecutive memory locations):
		897
		898	<p align=center><img src="rfftwnd.gif" width=389 height=583>
		899	@end ifhtml
		900
		901	@iftex
		902	Figure 1 depicts the input and output arrays just
		903	described, for both the out-of-place and in-place transforms (with the
		904	arrows indicating consecutive memory locations).
		905
		906	@tex
		907	{
		908	\pageinsert
		909	\vfill
		910	\vskip405pt
		911	\hskip40pt
		912	\special{psfile="rfftwnd.eps"
		913	}
		914	\vskip 24pt
		915	Figure 1: Illustration of the data layout for real to complex
		916	transforms.
		917	\vfill
		918	\endinsert}
		919	@end tex
		920	@end iftex
		921
		922	The RFFTWND transforms are unnormalized, so a forward followed by a
		923	backward transform will result in the original data scaled by the number
		924	of real data elements---that is, the product of the (logical) dimensions
		925	of the real data.
		926	@cindex normalization
		927
		928	Below, we illustrate the use of RFFTWND by showing how you might use it
		929	to compute the (cyclic) convolution of two-dimensional real arrays
		930	@code{a} and @code{b} (using the identity that a convolution corresponds
		931	to a pointwise product of the Fourier transforms). For variety,
		932	in-place transforms are used for the forward FFTs and an out-of-place
		933	transform is used for the inverse transform.
		934	@cindex convolution
		935	@cindex cyclic convolution
		936
		937	@example
		938	#include <rfftw.h>
		939	...
		940	@{
		941	fftw_real a[M][2(N/2+1)], b[M][2(N/2+1)], c[M][N];
		942	fftw_complex A, B, C[M][N/2+1];
		943	rfftwnd_plan p, pinv;
		944	fftw_real scale = 1.0 / (M * N);
		945	int i, j;
		946	...
		947	p = rfftw2d_create_plan(M, N, FFTW_REAL_TO_COMPLEX,
		948	FFTW_ESTIMATE \| FFTW_IN_PLACE);
		949	pinv = rfftw2d_create_plan(M, N, FFTW_COMPLEX_TO_REAL,
		950	FFTW_ESTIMATE);
		951
		952	/* aliases for accessing complex transform outputs: */
		953	A = (fftw_complex*) &a[0][0];
		954	B = (fftw_complex*) &b[0][0];
		955	...
		956	for (i = 0; i < M; ++i)
		957	for (j = 0; j < N; ++j) @{
		958	a[i][j] = ... ;
		959	b[i][j] = ... ;
		960	@}
		961	...
		962	rfftwnd_one_real_to_complex(p, &a[0][0], NULL);
		963	rfftwnd_one_real_to_complex(p, &b[0][0], NULL);
		964
		965	for (i = 0; i < M; ++i)
		966	for (j = 0; j < N/2+1; ++j) @{
		967	int ij = i*(N/2+1) + j;
		968	C[i][j].re = (A[ij].re * B[ij].re
		969	- A[ij].im * B[ij].im) * scale;
		970	C[i][j].im = (A[ij].re * B[ij].im
		971	+ A[ij].im * B[ij].re) * scale;
		972	@}
		973
		974	/* inverse transform to get c, the convolution of a and b;
		975	this has the side effect of overwriting C */
		976	rfftwnd_one_complex_to_real(pinv, &C[0][0], &c[0][0]);
		977	...
		978	rfftwnd_destroy_plan(p);
		979	rfftwnd_destroy_plan(pinv);
		980	@}
		981	@end example
		982
		983	We access the complex outputs of the in-place transforms by casting
		984	each real array to a @code{fftw_complex} pointer. Because this is a
		985	``flat'' pointer, we have to compute the row-major index @code{ij}
		986	explicitly in the convolution product loop.
		987	@cindex row-major
		988	In order to normalize the convolution, we must multiply by a scale
		989	factor---we can do so either before or after the inverse transform, and
		990	choose the former because it obviates the necessity of an additional
		991	loop.
		992	@cindex normalization
		993	Notice the limits of the loops and the dimensions of the various arrays.
		994
		995	As with the one-dimensional RFFTW, an out-of-place
		996	@code{FFTW_COMPLEX_TO_REAL} transform has the side-effect of overwriting
		997	its input array. (The real-to-complex transform, on the other hand,
		998	leaves its input array untouched.) If you use RFFTWND for a rank-one
		999	transform, however, this side-effect does not occur. Because of this
		1000	fact (and the simpler output format), users may find the RFFTWND
		1001	interface more convenient than RFFTW for one-dimensional transforms.
		1002	However, RFFTWND in one dimension is slightly slower than RFFTW because
		1003	RFFTWND uses an extra buffer array internally.
		1004
		1005	@c ------------------------------------------------------------
		1006	@node Multi-dimensional Array Format, Words of Wisdom, Real Multi-dimensional Transforms Tutorial, Tutorial
		1007	@section Multi-dimensional Array Format
		1008
		1009	This section describes the format in which multi-dimensional arrays are
		1010	stored. We felt that a detailed discussion of this topic was necessary,
		1011	since it is often a source of confusion among users and several
		1012	different formats are common. Although the comments below refer to
		1013	@code{fftwnd}, they are also applicable to the @code{rfftwnd} routines.
		1014
		1015	@menu
		1016	* Row-major Format::
		1017	* Column-major Format::
		1018	* Static Arrays in C::
		1019	* Dynamic Arrays in C::
		1020	* Dynamic Arrays in C-The Wrong Way::
		1021	@end menu
		1022
		1023	@node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
		1024	@subsection Row-major Format
		1025	@cindex row-major
		1026
		1027	The multi-dimensional arrays passed to @code{fftwnd} are expected to be
		1028	stored as a single contiguous block in @dfn{row-major} order (sometimes
		1029	called ``C order''). Basically, this means that as you step through
		1030	adjacent memory locations, the first dimension's index varies most
		1031	slowly and the last dimension's index varies most quickly.
		1032
		1033	To be more explicit, let us consider an array of rank @math{d} whose
		1034	dimensions are
		1035	@iftex
		1036	@tex
		1037	$n_1 \times n_2 \times n_3 \times \cdots \times n_d$.
		1038	@end tex
		1039	@end iftex
		1040	@ifinfo
		1041	n1 x n2 x n3 x ... x nd.
		1042	@end ifinfo
		1043	@ifhtml
		1044	n<sub>1</sub> x n<sub>2</sub> x n<sub>3</sub> x ... x n<sub>d</sub>.
		1045	@end ifhtml
		1046	Now, we specify a location in the array by a sequence of (zero-based) indices,
		1047	one for each dimension:
		1048	@iftex
		1049	@tex
		1050	$(i_1, i_2, i_3, \ldots, i_d)$.
		1051	@end tex
		1052	@end iftex
		1053	@ifinfo
		1054	(i1, i2, ..., id).
		1055	@end ifinfo
		1056	@ifhtml
		1057	(i<sub>1</sub>, i<sub>2</sub>, i<sub>3</sub>,..., i<sub>d</sub>).
		1058	@end ifhtml
		1059	If the array is stored in row-major
		1060	order, then this element is located at the position
		1061	@iftex
		1062	@tex
		1063	$i_d + n_d (i_{d-1} + n_{d-1} (\ldots + n_2 i_1))$.
		1064	@end tex
		1065	@end iftex
		1066	@ifinfo
		1067	id + nd * (id-1 + nd-1 * (... + n2 * i1)).
		1068	@end ifinfo
		1069	@ifhtml
		1070	i<sub>d</sub> + n<sub>d</sub> * (i<sub>d-1</sub> + n<sub>d-1</sub> * (... + n<sub>2</sub> * i<sub>1</sub>)).
		1071	@end ifhtml
		1072
		1073	Note that each element of the array must be of type @code{fftw_complex};
		1074	i.e. a (real, imaginary) pair of (double-precision) numbers. Note also
		1075	that, in @code{fftwnd}, the expression above is multiplied by the stride
		1076	to get the actual array index---this is useful in situations where each
		1077	element of the multi-dimensional array is actually a data structure or
		1078	another array, and you just want to transform a single field. In most
		1079	cases, however, you use a stride of 1.
		1080	@cindex stride
		1081
		1082	@node Column-major Format, Static Arrays in C, Row-major Format, Multi-dimensional Array Format
		1083	@subsection Column-major Format
		1084	@cindex column-major
		1085
		1086	Readers from the Fortran world are used to arrays stored in
		1087	@dfn{column-major} order (sometimes called ``Fortran order''). This is
		1088	essentially the exact opposite of row-major order in that, here, the
		1089	@emph{first} dimension's index varies most quickly.
		1090
		1091	If you have an array stored in column-major order and wish to transform
		1092	it using @code{fftwnd}, it is quite easy to do. When creating the plan,
		1093	simply pass the dimensions of the array to @code{fftwnd_create_plan} in
		1094	@emph{reverse order}. For example, if your array is a rank three
		1095	@code{N x M x L} matrix in column-major order, you should pass the
		1096	dimensions of the array as if it were an @code{L x M x N} matrix (which
		1097	it is, from perspective of @code{fftwnd}). This is done for you
		1098	automatically by the FFTW Fortran wrapper routines (@pxref{Calling FFTW
		1099	from Fortran}).
		1100	@cindex Fortran-callable wrappers
		1101
		1102	@node Static Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
		1103	@subsection Static Arrays in C
		1104	@cindex C multi-dimensional arrays
		1105
		1106	Multi-dimensional arrays declared statically (that is, at compile time,
		1107	not necessarily with the @code{static} keyword) in C are @emph{already}
		1108	in row-major order. You don't have to do anything special to transform
		1109	them. (@xref{Complex Multi-dimensional Transforms Tutorial}, for an
		1110	example of this sort of code.)
		1111
		1112	@node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Static Arrays in C, Multi-dimensional Array Format
		1113	@subsection Dynamic Arrays in C
		1114
		1115	Often, especially for large arrays, it is desirable to allocate the
		1116	arrays dynamically, at runtime. This isn't too hard to do, although it
		1117	is not as straightforward for multi-dimensional arrays as it is for
		1118	one-dimensional arrays.
		1119
		1120	Creating the array is simple: using a dynamic-allocation routine like
		1121	@code{malloc}, allocate an array big enough to store N @code{fftw_complex}
		1122	values, where N is the product of the sizes of the array dimensions
		1123	(i.e. the total number of complex values in the array). For example,
		1124	here is code to allocate a 5x12x27 rank 3 array:
		1125	@ffindex malloc
		1126
		1127	@example
		1128	fftw_complex *an_array;
		1129
		1130	an_array = (fftw_complex ) malloc(5 12 * 27 * sizeof(fftw_complex));
		1131	@end example
		1132
		1133	Accessing the array elements, however, is more tricky---you can't simply
		1134	use multiple applications of the @samp{[]} operator like you could for
		1135	static arrays. Instead, you have to explicitly compute the offset into
		1136	the array using the formula given earlier for row-major arrays. For
		1137	example, to reference the @math{(i,j,k)}-th element of the array
		1138	allocated above, you would use the expression @code{an_array[k + 27 * (j
		1139	+ 12 * i)]}.
		1140
		1141	This pain can be alleviated somewhat by defining appropriate macros, or,
		1142	in C++, creating a class and overloading the @samp{()} operator.
		1143
		1144	@node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format
		1145	@subsection Dynamic Arrays in C---The Wrong Way
		1146
		1147	A different method for allocating multi-dimensional arrays in C is often
		1148	suggested that is incompatible with @code{fftwnd}: @emph{using it will
		1149	cause FFTW to die a painful death}. We discuss the technique here,
		1150	however, because it is so commonly known and used. This method is to
		1151	create arrays of pointers of arrays of pointers of @dots{}etcetera. For
		1152	example, the analogue in this method to the example above is:
		1153
		1154	@example
		1155	int i,j;
		1156	fftw_complex **a_bad_array; / another way to make a 5x12x27 array */
		1157
		1158	a_bad_array = (fftw_complex **) malloc(5 sizeof(fftw_complex **));
		1159	for (i = 0; i < 5; ++i) @{
		1160	a_bad_array[i] =
		1161	(fftw_complex *) malloc(12 sizeof(fftw_complex *));
		1162	for (j = 0; j < 12; ++j)
		1163	a_bad_array[i][j] =
		1164	(fftw_complex ) malloc(27 sizeof(fftw_complex));
		1165	@}
		1166	@end example
		1167
		1168	As you can see, this sort of array is inconvenient to allocate (and
		1169	deallocate). On the other hand, it has the advantage that the
		1170	@math{(i,j,k)}-th element can be referenced simply by
		1171	@code{a_bad_array[i][j][k]}.
		1172
		1173	If you like this technique and want to maximize convenience in accessing
		1174	the array, but still want to pass the array to FFTW, you can use a
		1175	hybrid method. Allocate the array as one contiguous block, but also
		1176	declare an array of arrays of pointers that point to appropriate places
		1177	in the block. That sort of trick is beyond the scope of this
		1178	documentation; for more information on multi-dimensional arrays in C,
		1179	see the @code{comp.lang.c}
		1180	@uref{http://www.eskimo.com/~scs/C-faq/s6.html, FAQ}.
		1181
		1182	@c ------------------------------------------------------------
		1183	@node Words of Wisdom, , Multi-dimensional Array Format, Tutorial
		1184	@section Words of Wisdom
		1185	@cindex wisdom
		1186	@cindex saving plans to disk
		1187
		1188	FFTW implements a method for saving plans to disk and restoring them.
		1189	In fact, what FFTW does is more general than just saving and loading
		1190	plans. The mechanism is called @dfn{@code{wisdom}}. Here, we describe
		1191	this feature at a high level. @xref{FFTW Reference}, for a less casual
		1192	(but more complete) discussion of how to use @code{wisdom} in FFTW.
		1193
		1194	Plans created with the @code{FFTW_MEASURE} option produce near-optimal
		1195	FFT performance, but it can take a long time to compute a plan because
		1196	FFTW must actually measure the runtime of many possible plans and select
		1197	the best one. This is designed for the situations where so many
		1198	transforms of the same size must be computed that the start-up time is
		1199	irrelevant. For short initialization times but slightly slower
		1200	transforms, we have provided @code{FFTW_ESTIMATE}. The @code{wisdom}
		1201	mechanism is a way to get the best of both worlds. There are, however,
		1202	certain caveats that the user must be aware of in using @code{wisdom}.
		1203	For this reason, @code{wisdom} is an optional feature which is not
		1204	enabled by default.
		1205
		1206	At its simplest, @code{wisdom} provides a way of saving plans to disk so
		1207	that they can be reused in other program runs. You create a plan with
		1208	the flags @code{FFTW_MEASURE} and @code{FFTW_USE_WISDOM}, and then save
		1209	the @code{wisdom} using @code{fftw_export_wisdom}:
		1210	@ctindex FFTW_USE_WISDOM
		1211
		1212	@example
		1213	plan = fftw_create_plan(..., ... \| FFTW_MEASURE \| FFTW_USE_WISDOM);
		1214	fftw_export_wisdom(...);
		1215	@end example
		1216	@findex fftw_export_wisdom
		1217
		1218	The next time you run the program, you can restore the @code{wisdom}
		1219	with @code{fftw_import_wisdom}, and then recreate the plan using the
		1220	same flags as before. This time, however, the same optimal plan will be
		1221	created very quickly without measurements. (FFTW still needs some time
		1222	to compute trigonometric tables, however.) The basic outline is:
		1223
		1224	@example
		1225	fftw_import_wisdom(...);
		1226	plan = fftw_create_plan(..., ... \| FFTW_USE_WISDOM);
		1227	@end example
		1228	@findex fftw_import_wisdom
		1229
		1230	Wisdom is more than mere rote memorization, however. FFTW's
		1231	@code{wisdom} encompasses all of the knowledge and measurements that
		1232	were used to create the plan for a given size. Therefore, existing
		1233	@code{wisdom} is also applied to the creation of other plans of
		1234	different sizes.
		1235
		1236	Whenever a plan is created with the @code{FFTW_MEASURE} and
		1237	@code{FFTW_USE_WISDOM} flags, @code{wisdom} is generated. Thereafter,
		1238	plans for any transform with a similar factorization will be computed
		1239	more quickly, so long as they use the @code{FFTW_USE_WISDOM} flag. In
		1240	fact, for transforms with the same factors and of equal or lesser size,
		1241	no measurements at all need to be made and an optimal plan can be
		1242	created with negligible delay!
		1243
		1244	For example, suppose that you create a plan for
		1245	@iftex
		1246	@tex
		1247	$N = 2^{16}$.
		1248	@end tex
		1249	@end iftex
		1250	@ifinfo
		1251	N = 2^16.
		1252	@end ifinfo
		1253	@ifhtml
		1254	N = 2<sup>16</sup>.
		1255	@end ifhtml
		1256	Then, for any equal or smaller power of two, FFTW can create a
		1257	plan (with the same direction and flags) quickly, using the
		1258	precomputed @code{wisdom}. Even for larger powers of two, or sizes that
		1259	are a power of two times some other prime factors, plans will be
		1260	computed more quickly than they would otherwise (although some
		1261	measurements still have to be made).
		1262
		1263	The @code{wisdom} is cumulative, and is stored in a global, private data
		1264	structure managed internally by FFTW. The storage space required is
		1265	minimal, proportional to the logarithm of the sizes the @code{wisdom} was
		1266	generated from. The @code{wisdom} can be forgotten (and its associated
		1267	memory freed) by a call to @code{fftw_forget_wisdom()}; otherwise, it is
		1268	remembered until the program terminates. It can also be exported to a
		1269	file, a string, or any other medium using @code{fftw_export_wisdom} and
		1270	restored during a subsequent execution of the program (or a different
		1271	program) using @code{fftw_import_wisdom} (these functions are described
		1272	below).
		1273
		1274	Because @code{wisdom} is incorporated into FFTW at a very low level, the
		1275	same @code{wisdom} can be used for one-dimensional transforms,
		1276	multi-dimensional transforms, and even the parallel extensions to FFTW.
		1277	Just include @code{FFTW_USE_WISDOM} in the flags for whatever plans you
		1278	create (i.e., always plan wisely).
		1279
		1280	Plans created with the @code{FFTW_ESTIMATE} plan can use @code{wisdom},
		1281	but cannot generate it; only @code{FFTW_MEASURE} plans actually produce
		1282	@code{wisdom}. Also, plans can only use @code{wisdom} generated from
		1283	plans created with the same direction and flags. For example, a size
		1284	@code{42} @code{FFTW_BACKWARD} transform will not use @code{wisdom}
		1285	produced by a size @code{42} @code{FFTW_FORWARD} transform. The only
		1286	exception to this rule is that @code{FFTW_ESTIMATE} plans can use
		1287	@code{wisdom} from @code{FFTW_MEASURE} plans.
		1288
		1289	@menu
		1290	* Caveats in Using Wisdom:: What you should worry about in using wisdom
		1291	* Importing and Exporting Wisdom:: I/O of wisdom to disk and other media
		1292	@end menu
		1293
		1294	@node Caveats in Using Wisdom, Importing and Exporting Wisdom, Words of Wisdom, Words of Wisdom
		1295	@subsection Caveats in Using Wisdom
		1296	@cindex wisdom, problems with
		1297
		1298	@quotation
		1299	@ifhtml
		1300	<i>
		1301	@end ifhtml
		1302	For in much wisdom is much grief, and he that increaseth knowledge
		1303	increaseth sorrow.
		1304	@ifhtml
		1305	</i>
		1306	@end ifhtml
		1307	[Ecclesiastes 1:18]
		1308	@cindex Ecclesiastes
		1309	@end quotation
		1310
		1311	There are pitfalls to using @code{wisdom}, in that it can negate FFTW's
		1312	ability to adapt to changing hardware and other conditions. For example,
		1313	it would be perfectly possible to export @code{wisdom} from a program
		1314	running on one processor and import it into a program running on another
		1315	processor. Doing so, however, would mean that the second program would
		1316	use plans optimized for the first processor, instead of the one it is
		1317	running on.
		1318
		1319	It should be safe to reuse @code{wisdom} as long as the hardware and
		1320	program binaries remain unchanged. (Actually, the optimal plan may
		1321	change even between runs of the same binary on identical hardware, due
		1322	to differences in the virtual memory environment, etcetera. Users
		1323	seriously interested in performance should worry about this problem,
		1324	too.) It is likely that, if the same @code{wisdom} is used for two
		1325	different program binaries, even running on the same machine, the plans
		1326	may be sub-optimal because of differing code alignments. It is
		1327	therefore wise to recreate @code{wisdom} every time an application is
		1328	recompiled. The more the underlying hardware and software changes
		1329	between the creation of @code{wisdom} and its use, the greater grows the
		1330	risk of sub-optimal plans.
		1331
		1332	@node Importing and Exporting Wisdom, , Caveats in Using Wisdom, Words of Wisdom
		1333	@subsection Importing and Exporting Wisdom
		1334	@cindex wisdom, import and export
		1335
		1336	@example
		1337	void fftw_export_wisdom_to_file(FILE *output_file);
		1338	fftw_status fftw_import_wisdom_from_file(FILE *input_file);
		1339	@end example
		1340	@findex fftw_export_wisdom_to_file
		1341	@findex fftw_import_wisdom_from_file
		1342
		1343	@code{fftw_export_wisdom_to_file} writes the @code{wisdom} to
		1344	@code{output_file}, which must be a file open for
		1345	writing. @code{fftw_import_wisdom_from_file} reads the @code{wisdom}
		1346	from @code{input_file}, which must be a file open for reading, and
		1347	returns @code{FFTW_SUCCESS} if successful and @code{FFTW_FAILURE}
		1348	otherwise. In both cases, the file is left open and must be closed by
		1349	the caller. It is perfectly fine if other data lie before or after the
		1350	@code{wisdom} in the file, as long as the file is positioned at the
		1351	beginning of the @code{wisdom} data before import.
		1352
		1353	@example
		1354	char *fftw_export_wisdom_to_string(void);
		1355	fftw_status fftw_import_wisdom_from_string(const char *input_string)
		1356	@end example
		1357	@findex fftw_export_wisdom_to_string
		1358	@findex fftw_import_wisdom_from_string
		1359
		1360	@code{fftw_export_wisdom_to_string} allocates a string, exports the
		1361	@code{wisdom} to it in @code{NULL}-terminated format, and returns a
		1362	pointer to the string. If there is an error in allocating or writing
		1363	the data, it returns @code{NULL}. The caller is responsible for
		1364	deallocating the string (with @code{fftw_free}) when she is done with
		1365	it. @code{fftw_import_wisdom_from_string} imports the @code{wisdom} from
		1366	@code{input_string}, returning @code{FFTW_SUCCESS} if successful and
		1367	@code{FFTW_FAILURE} otherwise.
		1368
		1369	Exporting @code{wisdom} does not affect the store of @code{wisdom}. Imported
		1370	@code{wisdom} supplements the current store rather than replacing it
		1371	(except when there is conflicting @code{wisdom}, in which case the older
		1372	@code{wisdom} is discarded). The format of the exported @code{wisdom} is
		1373	``nerd-readable'' LISP-like ASCII text; we will not document it here
		1374	except to note that it is insensitive to white space (interested users
		1375	can contact us for more details).
		1376	@cindex LISP
		1377	@cindex nerd-readable text
		1378
		1379	@xref{FFTW Reference}, for more information, and for a description of
		1380	how you can implement @code{wisdom} import/export for other media
		1381	besides files and strings.
		1382
		1383	The following is a brief example in which the @code{wisdom} is read from
		1384	a file, a plan is created (possibly generating more @code{wisdom}), and
		1385	then the @code{wisdom} is exported to a string and printed to
		1386	@code{stdout}.
		1387
		1388	@example
		1389	@{
		1390	fftw_plan plan;
		1391	char *wisdom_string;
		1392	FILE *input_file;
		1393
		1394	/* open file to read wisdom from */
		1395	input_file = fopen("sample.wisdom", "r");
		1396	if (FFTW_FAILURE == fftw_import_wisdom_from_file(input_file))
		1397	printf("Error reading wisdom!\n");
		1398	fclose(input_file); /* be sure to close the file! */
		1399
		1400	/* create a plan for N=64, possibly creating and/or using wisdom */
		1401	plan = fftw_create_plan(64,FFTW_FORWARD,
		1402	FFTW_MEASURE \| FFTW_USE_WISDOM);
		1403
		1404	/* ... do some computations with the plan ... */
		1405
		1406	/* always destroy plans when you are done */
		1407	fftw_destroy_plan(plan);
		1408
		1409	/* write the wisdom to a string */
		1410	wisdom_string = fftw_export_wisdom_to_string();
		1411	if (wisdom_string != NULL) @{
		1412	printf("Accumulated wisdom: %s\n",wisdom_string);
		1413
		1414	/* Just for fun, destroy and restore the wisdom */
		1415	fftw_forget_wisdom(); /* all gone! */
		1416	fftw_import_wisdom_from_string(wisdom_string);
		1417	/* wisdom is back! */
		1418
		1419	fftw_free(wisdom_string); /* deallocate it since we're done */
		1420	@}
		1421	@}
		1422	@end example
		1423
		1424	@c ************************************************************
		1425	@node FFTW Reference, Parallel FFTW, Tutorial, Top
		1426	@chapter FFTW Reference
		1427
		1428	This chapter provides a complete reference for all sequential (i.e.,
		1429	one-processor) FFTW functions. We first define the data types upon
		1430	which FFTW operates, that is, real, complex, and ``halfcomplex'' numbers
		1431	(@pxref{Data Types}). Then, in four sections, we explain the FFTW
		1432	program interface for complex one-dimensional transforms
		1433	(@pxref{One-dimensional Transforms Reference}), complex
		1434	multi-dimensional transforms (@pxref{Multi-dimensional Transforms
		1435	Reference}), and real one-dimensional transforms (@pxref{Real
		1436	One-dimensional Transforms Reference}), real multi-dimensional
		1437	transforms (@pxref{Real Multi-dimensional Transforms Reference}).
		1438	@ref{Wisdom Reference} describes the @code{wisdom} mechanism for
		1439	exporting and importing plans. Finally, @ref{Memory Allocator
		1440	Reference} describes how to change FFTW's default memory allocator.
		1441	For parallel transforms, @xref{Parallel FFTW}.
		1442
		1443	@menu
		1444	* Data Types:: real, complex, and halfcomplex numbers
		1445	* One-dimensional Transforms Reference::
		1446	* Multi-dimensional Transforms Reference::
		1447	* Real One-dimensional Transforms Reference::
		1448	* Real Multi-dimensional Transforms Reference::
		1449	* Wisdom Reference::
		1450	* Memory Allocator Reference::
		1451	* Thread safety::
		1452	@end menu
		1453
		1454	@c -------------------------------------------------------
		1455	@node Data Types, One-dimensional Transforms Reference, FFTW Reference, FFTW Reference
		1456	@section Data Types
		1457	@cindex real number
		1458	@cindex complex number
		1459	@cindex halfcomplex array
		1460
		1461	The routines in the FFTW package use three main kinds of data types.
		1462	@dfn{Real} and @dfn{complex} numbers should be already known to the
		1463	reader. We also use the term @dfn{halfcomplex} to describe complex
		1464	arrays in a special packed format used by the one-dimensional real
		1465	transforms (taking advantage of the @dfn{hermitian} symmetry that arises
		1466	in those cases).
		1467
		1468	By including @code{<fftw.h>} or @code{<rfftw.h>}, you will have access
		1469	to the following definitions:
		1470
		1471	@example
		1472	typedef double fftw_real;
		1473
		1474	typedef struct @{
		1475	fftw_real re, im;
		1476	@} fftw_complex;
		1477
		1478	#define c_re(c) ((c).re)
		1479	#define c_im(c) ((c).im)
		1480	@end example
		1481	@tindex fftw_real
		1482	@tindex fftw_complex
		1483
		1484	All FFTW operations are performed on the @code{fftw_real} and
		1485	@code{fftw_complex} data types. For @code{fftw_complex} numbers, the
		1486	two macros @code{c_re} and @code{c_im} retrieve, respectively, the real
		1487	and imaginary parts of the number.
		1488
		1489	A @dfn{real array} is an array of real numbers. A @dfn{complex array}
		1490	is an array of complex numbers. A one-dimensional array @math{X} of
		1491	@math{n} complex numbers is @dfn{hermitian} if the following property
		1492	holds:
		1493	@iftex
		1494	@tex
		1495	for all $0 \leq i < n$, we have $X_i = X^{*}_{n-i}$, where
		1496	$x^*$ denotes the complex conjugate of $x$.
		1497	@end tex
		1498	@end iftex
		1499	@ifinfo
		1500	for all @math{0 <= i < n}, we have @math{X[i] = conj(X[n-i])}.
		1501	@end ifinfo
		1502	@ifhtml
		1503	for all 0 <= i < n, we have X<sub>i</sub> = conj(X<sub>n-i</sub>)}.
		1504	@end ifhtml
		1505	Hermitian arrays are relevant to FFTW because the Fourier transform of a
		1506	real array is hermitian.
		1507
		1508	Because of its symmetry, a hermitian array can be stored in half the
		1509	space of a complex array of the same size. FFTW's one-dimensional real
		1510	transforms store hermitian arrays as @dfn{halfcomplex} arrays. A
		1511	halfcomplex array of size @math{n} is
		1512	@cindex hermitian array
		1513	a one-dimensional array of @math{n} @code{fftw_real} numbers. A
		1514	hermitian array @math{X} in stored into a halfcomplex array @math{Y} as
		1515	follows.
		1516	@iftex
		1517	@tex
		1518	For all integers $i$ such that $0 \leq i \leq n / 2$, we have $Y_i :=
		1519	\hbox{Re}(X_i)$. For all integers $i$ such that $0 < i < n / 2$,
		1520	we have $Y_{n - i} := \hbox{Im}(X_i)$.
		1521	@end tex
		1522	@end iftex
		1523	@ifinfo
		1524	For all integers @math{i} such that @math{0 <= i <= n / 2}, we have
		1525	@math{Y[i] = Re(X[i])}. For all integers @math{i} such that @math{0 <
		1526	i < n / 2}, we have @math{Y[n-i] = Im(X[i])}.
		1527	@end ifinfo
		1528	@ifhtml
		1529	For all integers i such that 0 <= i <= n / 2, we have
		1530	Y<sub>i</sub> = Re(X<sub>i</sub>). For all integers i such that 0
		1531	< i < n / 2, we have Y<sub>n-i</sub> = Im(X<sub>i</sub>).
		1532	@end ifhtml
		1533
		1534	We now illustrate halfcomplex storage for @math{n = 4} and @math{n = 5},
		1535	since the scheme depends on the parity of @math{n}. Let @math{n = 4}.
		1536	In this case, we have
		1537	@iftex
		1538	@tex
		1539	$Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$,
		1540	$Y_2 := \hbox{Re}(X_2)$, and $Y_3 := \hbox{Im}(X_1)$.
		1541	@end tex
		1542	@end iftex
		1543	@ifinfo
		1544	@math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])},
		1545	@math{Y[2] = Re(X[2])}, and @math{Y[3] = Im(X[1])}.
		1546	@end ifinfo
		1547	@ifhtml
		1548	Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>),
		1549	Y<sub>2</sub> = Re(X<sub>2</sub>), and Y<sub>3</sub> = Im(X<sub>1</sub>).
		1550	@end ifhtml
		1551	Let now @math{n = 5}. In this case, we have
		1552	@iftex
		1553	@tex
		1554	$Y_0 := \hbox{Re}(X_0)$, $Y_1 := \hbox{Re}(X_1)$,
		1555	$Y_2 := \hbox{Re}(X_2)$, $Y_3 := \hbox{Im}(X_2)$, and
		1556	$Y_4 := \hbox{Im}(X_1)$.
		1557	@end tex
		1558	@end iftex
		1559	@ifinfo
		1560	@math{Y[0] = Re(X[0])}, @math{Y[1] = Re(X[1])},
		1561	@math{Y[2] = Re(X[2])}, @math{Y[3] = Im(X[2])}, and
		1562	@math{Y[4] = Im(X[1])}.
		1563	@end ifinfo
		1564	@ifhtml
		1565	Y<sub>0</sub> = Re(X<sub>0</sub>), Y<sub>1</sub> = Re(X<sub>1</sub>),
		1566	Y<sub>2</sub> = Re(X<sub>2</sub>), Y<sub>3</sub> = Im(X<sub>2</sub>),
		1567	and Y<sub>4</sub> = Im(X<sub>1</sub>).
		1568	@end ifhtml
		1569
		1570	@cindex floating-point precision
		1571	By default, the type @code{fftw_real} equals the C type @code{double}.
		1572	To work in single precision rather than double precision, @code{#define}
		1573	the symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} and then recompile
		1574	the library. On Unix systems, you can instead use @code{configure
		1575	--enable-float} at installation time (@pxref{Installation and
		1576	Customization}).
		1577	@fpindex configure
		1578	@ctindex FFTW_ENABLE_FLOAT
		1579
		1580	In version 1 of FFTW, the data types were called @code{FFTW_REAL} and
		1581	@code{FFTW_COMPLEX}. We changed the capitalization for consistency with
		1582	the rest of FFTW's conventions. The old names are still supported, but
		1583	their use is deprecated.
		1584	@tindex FFTW_REAL
		1585	@tindex FFTW_COMPLEX
		1586
		1587	@c -------------------------------------------------------
		1588	@node One-dimensional Transforms Reference, Multi-dimensional Transforms Reference, Data Types, FFTW Reference
		1589	@section One-dimensional Transforms Reference
		1590
		1591	The one-dimensional complex routines are generally prefixed with
		1592	@code{fftw_}. Programs using FFTW should be linked with @code{-lfftw
		1593	-lm} on Unix systems, or with the FFTW and standard math libraries in
		1594	general.
		1595
		1596	@menu
		1597	* fftw_create_plan:: Plan Creation
		1598	* Discussion on Specific Plans::
		1599	* fftw:: Plan Execution
		1600	* fftw_destroy_plan:: Plan Destruction
		1601	* What FFTW Really Computes:: Definition of the DFT.
		1602	@end menu
		1603
		1604	@node fftw_create_plan, Discussion on Specific Plans, One-dimensional Transforms Reference, One-dimensional Transforms Reference
		1605	@subsection Plan Creation for One-dimensional Transforms
		1606
		1607	@example
		1608	#include <fftw.h>
		1609
		1610	fftw_plan fftw_create_plan(int n, fftw_direction dir,
		1611	int flags);
		1612
		1613	fftw_plan fftw_create_plan_specific(int n, fftw_direction dir,
		1614	int flags,
		1615	fftw_complex *in, int istride,
		1616	fftw_complex *out, int ostride);
		1617	@end example
		1618	@tindex fftw_plan
		1619	@tindex fftw_direction
		1620	@findex fftw_create_plan
		1621	@findex fftw_create_plan_specific
		1622
		1623	The function @code{fftw_create_plan} creates a plan, which is
		1624	a data structure containing all the information that @code{fftw}
		1625	needs in order to compute the 1D Fourier transform. You can
		1626	create as many plans as you need, but only one plan for a given
		1627	array size is required (a plan can be reused many times).
		1628
		1629	@code{fftw_create_plan} returns a valid plan, or @code{NULL}
		1630	if, for some reason, the plan can't be created. In the
		1631	default installation, this cannot happen, but it is possible
		1632	to configure FFTW in such a way that some input sizes are
		1633	forbidden, and FFTW cannot create a plan.
		1634
		1635	The @code{fftw_create_plan_specific} variant takes as additional
		1636	arguments specific input/output arrays and their strides. For the last
		1637	four arguments, you should pass the arrays and strides that you will
		1638	eventually be passing to @code{fftw}. The resulting plans will be
		1639	optimized for those arrays and strides, although they may be used on
		1640	other arrays as well. Note: the contents of the in and out arrays are
		1641	@emph{destroyed} by the specific planner (the initial contents are
		1642	ignored, so the arrays need not have been initialized).
		1643
		1644	@subsubheading Arguments
		1645	@itemize @bullet
		1646	@item
		1647	@code{n} is the size of the transform. It can be
		1648	any positive integer.
		1649
		1650	@itemize @minus
		1651	@item
		1652	FFTW is best at handling sizes of the form
		1653	@ifinfo
		1654	@math{2^a 3^b 5^c 7^d 11^e 13^f},
		1655	@end ifinfo
		1656	@iftex
		1657	@tex
		1658	$2^a 3^b 5^c 7^d 11^e 13^f$,
		1659	@end tex
		1660	@end iftex
		1661	@ifhtml
		1662	2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
		1663	11<SUP>e</SUP> 13<SUP>f</SUP>,
		1664	@end ifhtml
		1665	where @math{e+f} is either @math{0} or
		1666	@math{1}, and the other exponents are arbitrary. Other sizes are
		1667	computed by means of a slow, general-purpose routine (which nevertheless
		1668	retains
		1669	@iftex
		1670	@tex
		1671	$O(n \log n)$
		1672	@end tex
		1673	@end iftex
		1674	@ifinfo
		1675	O(n lg n)
		1676	@end ifinfo
		1677	@ifhtml
		1678	O(n lg n)
		1679	@end ifhtml
		1680	performance, even for prime sizes). (It is
		1681	possible to customize FFTW for different array sizes.
		1682	@xref{Installation and Customization} for more information.) Transforms
		1683	whose sizes are powers of @math{2} are especially fast.
		1684	@end itemize
		1685
		1686	@item
		1687	@code{dir} is the sign of the exponent in the formula that
		1688	defines the Fourier transform. It can be @math{-1} or @math{+1}.
		1689	The aliases @code{FFTW_FORWARD} and @code{FFTW_BACKWARD}
		1690	are provided, where @code{FFTW_FORWARD} stands for @math{-1}.
		1691
		1692	@item
		1693	@cindex flags
		1694	@code{flags} is a boolean OR (@samp{\|}) of zero or more of the following:
		1695	@itemize @minus
		1696	@item
		1697	@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
		1698	actually @emph{computing} several FFTs and measuring their
		1699	execution time. Depending on the installation, this can take some
		1700	time. @footnote{The basic problem is the resolution of the clock:
		1701	FFTW needs to run for a certain time for the clock to be reliable.}
		1702
		1703	@item
		1704	@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
		1705	plan (for a RISC processor with many registers). If neither
		1706	@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
		1707	@code{FFTW_ESTIMATE}.
		1708
		1709	@item
		1710	@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input and
		1711	output arrays will be distinct (this is the default).
		1712	@ctindex FFTW_OUT_OF_PLACE
		1713
		1714	@item
		1715	@cindex in-place transform
		1716	@code{FFTW_IN_PLACE}: produce a plan assuming that you want the output
		1717	in the input array. The algorithm used is not necessarily in place:
		1718	FFTW is able to compute true in-place transforms only for small values
		1719	of @code{n}. If FFTW is not able to compute the transform in-place, it
		1720	will allocate a temporary array (unless you provide one yourself),
		1721	compute the transform out of place, and copy the result back.
		1722	@emph{Warning: This option changes the meaning of some parameters of
		1723	@code{fftw}} (@pxref{fftw,,Computing the One-dimensional Transform}).
		1724
		1725	The in-place option is mainly provided for people who want to write
		1726	their own in-place multi-dimensional Fourier transform, using FFTW as a
		1727	base. For example, consider a three-dimensional @code{n * n * n}
		1728	transform. An out-of-place algorithm will need another array (which may
		1729	be huge). However, FFTW can compute the in-place transform along
		1730	each dimension using only a temporary array of size @code{n}.
		1731	Moreover, if FFTW happens to be able to compute the transform truly
		1732	in-place, no temporary array and no copying are needed. As distributed,
		1733	FFTW `knows' how to compute in-place transforms of size 1, 2, 3, 4, 5, 6,
		1734	7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32 and 64.
		1735
		1736	The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
		1737
		1738	@item
		1739	@cindex wisdom
		1740	@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
		1741	in the creation of the plan. (@xref{Words of Wisdom}.)
		1742	This can greatly speed the creation of plans, especially with the
		1743	@code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take
		1744	advantage of @code{wisdom} to produce a more optimal plan (based on past
		1745	measurements) than the estimation heuristic would normally
		1746	generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
		1747	will also be generated if the current transform size is not completely
		1748	understood by existing @code{wisdom}.
		1749
		1750	@end itemize
		1751
		1752	@item
		1753	@code{in}, @code{out}, @code{istride}, @code{ostride} (only for
		1754	@code{fftw_create_plan_specific}): see corresponding arguments in the
		1755	description of @code{fftw}. (@xref{fftw,,Computing the One-dimensional
		1756	Transform}.) In particular, the @code{out} and @code{ostride}
		1757	parameters have the same special meaning for @code{FFTW_IN_PLACE}
		1758	transforms as they have for @code{fftw}.
		1759
		1760	@end itemize
		1761
		1762	@node Discussion on Specific Plans, fftw, fftw_create_plan, One-dimensional Transforms Reference
		1763	@subsection Discussion on Specific Plans
		1764	@cindex specific planner
		1765	We recommend the use of the specific planners, even in cases where you
		1766	will be transforming arrays different from those passed to the specific
		1767	planners, as they confer the following advantages:
		1768
		1769	@itemize @bullet
		1770
		1771	@item
		1772	The resulting plans will be optimized for your specific arrays and
		1773	strides. This may or may not make a significant difference, but it
		1774	certainly doesn't hurt. (The ordinary planner does its planning based
		1775	upon a stride-one temporary array that it allocates.)
		1776
		1777	@item
		1778	Less intermediate storage is required during the planning process. (The
		1779	ordinary planner uses O(@code{N}) temporary storage, where @code{N} is
		1780	the maximum dimension, while it is creating the plan.)
		1781
		1782	@item
		1783	For multi-dimensional transforms, new parameters become accessible for
		1784	optimization by the planner. (Since multi-dimensional arrays can be
		1785	very large, we don't dare to allocate one in the ordinary planner for
		1786	experimentation. This prevents us from doing certain optimizations
		1787	that can yield dramatic improvements in some cases.)
		1788
		1789	@end itemize
		1790
		1791	On the other hand, note that @emph{the specific planner destroys the
		1792	contents of the @code{in} and @code{out} arrays}.
		1793
		1794	@node fftw, fftw_destroy_plan, Discussion on Specific Plans, One-dimensional Transforms Reference
		1795	@subsection Computing the One-dimensional Transform
		1796
		1797	@example
		1798	#include <fftw.h>
		1799
		1800	void fftw(fftw_plan plan, int howmany,
		1801	fftw_complex *in, int istride, int idist,
		1802	fftw_complex *out, int ostride, int odist);
		1803
		1804	void fftw_one(fftw_plan plan, fftw_complex *in,
		1805	fftw_complex *out);
		1806	@end example
		1807	@findex fftw
		1808	@findex fftw_one
		1809
		1810	The function @code{fftw} computes the one-dimensional Fourier transform,
		1811	using a plan created by @code{fftw_create_plan} (@xref{fftw_create_plan,
		1812	, Plan Creation for One-dimensional Transforms}.) The function
		1813	@code{fftw_one} provides a simplified interface for the common case of
		1814	single input array of stride 1.
		1815	@cindex stride
		1816
		1817	@subsubheading Arguments
		1818	@itemize @bullet
		1819	@item
		1820	@code{plan} is the plan created by @code{fftw_create_plan}
		1821	(@pxref{fftw_create_plan,,Plan Creation for One-dimensional Transforms}).
		1822
		1823	@item
		1824	@code{howmany} is the number of transforms @code{fftw} will compute.
		1825	It is faster to tell FFTW to compute many transforms, instead of
		1826	simply calling @code{fftw} many times.
		1827
		1828	@item
		1829	@code{in}, @code{istride} and @code{idist} describe the input array(s).
		1830	There are @code{howmany} input arrays; the first one is pointed to by
		1831	@code{in}, the second one is pointed to by @code{in + idist}, and so on,
		1832	up to @code{in + (howmany - 1) * idist}. Each input array consists of
		1833	complex numbers (@pxref{Data Types}), which are not necessarily
		1834	contiguous in memory. Specifically, @code{in[0]} is the first element
		1835	of the first array, @code{in[istride]} is the second element of the
		1836	first array, and so on. In general, the @code{i}-th element of the
		1837	@code{j}-th input array will be in position @code{in[i * istride + j *
		1838	idist]}.
		1839
		1840	@item
		1841	@code{out}, @code{ostride} and @code{odist} describe the output
		1842	array(s). The format is the same as for the input array.
		1843
		1844	@itemize @minus
		1845	@item @emph{In-place transforms}:
		1846	@cindex in-place transform
		1847	If the @code{plan} specifies an in-place transform, @code{ostride} and
		1848	@code{odist} are always ignored. If @code{out} is @code{NULL},
		1849	@code{out} is ignored, too. Otherwise, @code{out} is interpreted as a
		1850	pointer to an array of @code{n} complex numbers, that FFTW will use as
		1851	temporary space to perform the in-place computation. @code{out} is used
		1852	as scratch space and its contents destroyed. In this case, @code{out}
		1853	must be an ordinary array whose elements are contiguous in memory (no
		1854	striding).
		1855	@end itemize
		1856
		1857	@end itemize
		1858
		1859	The function @code{fftw_one} transforms a single, contiguous input array
		1860	to a contiguous output array. By definition, the call
		1861	@example
		1862	fftw_one(plan, in, out)
		1863	@end example
		1864	is equivalent to
		1865	@example
		1866	fftw(plan, 1, in, 1, 1, out, 1, 1)
		1867	@end example
		1868
		1869	@node fftw_destroy_plan, What FFTW Really Computes, fftw, One-dimensional Transforms Reference
		1870	@subsection Destroying a One-dimensional Plan
		1871
		1872	@example
		1873	#include <fftw.h>
		1874
		1875	void fftw_destroy_plan(fftw_plan plan);
		1876	@end example
		1877	@tindex fftw_destroy_plan
		1878
		1879	The function @code{fftw_destroy_plan} frees the plan @code{plan} and
		1880	releases all the memory associated with it. After destruction, a plan
		1881	is no longer valid.
		1882
		1883	@node What FFTW Really Computes, , fftw_destroy_plan, One-dimensional Transforms Reference
		1884	@subsection What FFTW Really Computes
		1885	@cindex Discrete Fourier Transform
		1886	In this section, we define precisely what FFTW computes. Please be
		1887	warned that different authors and software packages might employ
		1888	different conventions than FFTW does.
		1889
		1890	The forward transform of a complex array @math{X} of size
		1891	@math{n} computes an array @math{Y}, where
		1892	@iftex
		1893	@tex
		1894	$$
		1895	Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n} \ .
		1896	$$
		1897	@end tex
		1898	@end iftex
		1899	@ifinfo
		1900	@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n) .
		1901	@end ifinfo
		1902	@ifhtml
		1903	<center><IMG SRC="equation-1.gif" ALIGN="top"></center>
		1904	@end ifhtml
		1905
		1906	The backward transform computes
		1907	@iftex
		1908	@tex
		1909	$$
		1910	Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n} \ .
		1911	$$
		1912	@end tex
		1913	@end iftex
		1914	@ifinfo
		1915	@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n) .
		1916	@end ifinfo
		1917	@ifhtml
		1918	<center><IMG SRC="equation-2.gif" ALIGN="top"></center>
		1919	@end ifhtml
		1920
		1921	@cindex normalization
		1922	FFTW computes an unnormalized transform, that is, the equation
		1923	@math{IFFT(FFT(X)) = n X} holds. In other words, applying the forward
		1924	and then the backward transform will multiply the input by @math{n}.
		1925
		1926	@cindex frequency
		1927	An @code{FFTW_FORWARD} transform corresponds to a sign of @math{-1} in
		1928	the exponent of the DFT. Note also that we use the standard
		1929	``in-order'' output ordering---the @math{k}-th output corresponds to the
		1930	frequency @math{k/n} (or @math{k/T}, where @math{T} is your total
		1931	sampling period). For those who like to think in terms of positive and
		1932	negative frequencies, this means that the positive frequencies are
		1933	stored in the first half of the output and the negative frequencies are
		1934	stored in backwards order in the second half of the output. (The
		1935	frequency @math{-k/n} is the same as the frequency @math{(n-k)/n}.)
		1936
		1937	@c -------------------------------------------------------
		1938	@node Multi-dimensional Transforms Reference, Real One-dimensional Transforms Reference, One-dimensional Transforms Reference, FFTW Reference
		1939	@section Multi-dimensional Transforms Reference
		1940	@cindex complex multi-dimensional transform
		1941	@cindex multi-dimensional transform
		1942	The multi-dimensional complex routines are generally prefixed with
		1943	@code{fftwnd_}. Programs using FFTWND should be linked with @code{-lfftw
		1944	-lm} on Unix systems, or with the FFTW and standard math libraries in
		1945	general.
		1946	@cindex FFTWND
		1947
		1948	@menu
		1949	* fftwnd_create_plan:: Plan Creation
		1950	* fftwnd:: Plan Execution
		1951	* fftwnd_destroy_plan:: Plan Destruction
		1952	* What FFTWND Really Computes::
		1953	@end menu
		1954
		1955	@node fftwnd_create_plan, fftwnd, Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference
		1956	@subsection Plan Creation for Multi-dimensional Transforms
		1957
		1958	@example
		1959	#include <fftw.h>
		1960
		1961	fftwnd_plan fftwnd_create_plan(int rank, const int *n,
		1962	fftw_direction dir, int flags);
		1963
		1964	fftwnd_plan fftw2d_create_plan(int nx, int ny,
		1965	fftw_direction dir, int flags);
		1966
		1967	fftwnd_plan fftw3d_create_plan(int nx, int ny, int nz,
		1968	fftw_direction dir, int flags);
		1969
		1970	fftwnd_plan fftwnd_create_plan_specific(int rank, const int *n,
		1971	fftw_direction dir,
		1972	int flags,
		1973	fftw_complex *in, int istride,
		1974	fftw_complex *out, int ostride);
		1975
		1976	fftwnd_plan fftw2d_create_plan_specific(int nx, int ny,
		1977	fftw_direction dir,
		1978	int flags,
		1979	fftw_complex *in, int istride,
		1980	fftw_complex *out, int ostride);
		1981
		1982	fftwnd_plan fftw3d_create_plan_specific(int nx, int ny, int nz,
		1983	fftw_direction dir, int flags,
		1984	fftw_complex *in, int istride,
		1985	fftw_complex *out, int ostride);
		1986	@end example
		1987	@tindex fftwnd_plan
		1988	@tindex fftw_direction
		1989	@findex fftwnd_create_plan
		1990	@findex fftw2d_create_plan
		1991	@findex fftw3d_create_plan
		1992	@findex fftwnd_create_plan_specific
		1993	@findex fftw2d_create_plan_specific
		1994	@findex fftw3d_create_plan_specific
		1995
		1996	The function @code{fftwnd_create_plan} creates a plan, which is a data
		1997	structure containing all the information that @code{fftwnd} needs in
		1998	order to compute a multi-dimensional Fourier transform. You can create
		1999	as many plans as you need, but only one plan for a given array size is
		2000	required (a plan can be reused many times). The functions
		2001	@code{fftw2d_create_plan} and @code{fftw3d_create_plan} are optional,
		2002	alternative interfaces to @code{fftwnd_create_plan} for two and three
		2003	dimensions, respectively.
		2004
		2005	@code{fftwnd_create_plan} returns a valid plan, or @code{NULL} if, for
		2006	some reason, the plan can't be created. This can happen if memory runs
		2007	out or if the arguments are invalid in some way (e.g. if @code{rank} <
		2008	0).
		2009
		2010	The @code{create_plan_specific} variants take as additional arguments
		2011	specific input/output arrays and their strides. For the last four
		2012	arguments, you should pass the arrays and strides that you will
		2013	eventually be passing to @code{fftwnd}. The resulting plans will be
		2014	optimized for those arrays and strides, although they may be used on
		2015	other arrays as well. Note: the contents of the in and out arrays are
		2016	@emph{destroyed} by the specific planner (the initial contents are
		2017	ignored, so the arrays need not have been initialized).
		2018	@xref{Discussion on Specific Plans}, for a discussion on specific plans.
		2019
		2020	@subsubheading Arguments
		2021	@itemize @bullet
		2022	@item
		2023	@code{rank} is the dimensionality of the arrays to be transformed. It
		2024	can be any non-negative integer.
		2025
		2026	@item
		2027	@code{n} is a pointer to an array of @code{rank} integers, giving the
		2028	size of each dimension of the arrays to be transformed. These sizes,
		2029	which must be positive integers, correspond to the dimensions of
		2030	@cindex row-major
		2031	row-major arrays---i.e. @code{n[0]} is the size of the dimension whose
		2032	indices vary most slowly, and so on. (@xref{Multi-dimensional Array
		2033	Format}, for more information on row-major storage.)
		2034	@xref{fftw_create_plan,,Plan Creation for One-dimensional Transforms},
		2035	for more information regarding optimal array sizes.
		2036
		2037	@item
		2038	@code{nx} and @code{ny} in @code{fftw2d_create_plan} are positive
		2039	integers specifying the dimensions of the rank 2 array to be
		2040	transformed. i.e. they specify that the transform will operate on
		2041	@code{nx x ny} arrays in row-major order, where @code{nx} is the number
		2042	of rows and @code{ny} is the number of columns.
		2043
		2044	@item
		2045	@code{nx}, @code{ny} and @code{nz} in @code{fftw3d_create_plan} are
		2046	positive integers specifying the dimensions of the rank 3 array to be
		2047	transformed. i.e. they specify that the transform will operate on
		2048	@code{nx x ny x nz} arrays in row-major order.
		2049
		2050	@item
		2051	@code{dir} is the sign of the exponent in the formula that defines the
		2052	Fourier transform. It can be @math{-1} or @math{+1}. The aliases
		2053	@code{FFTW_FORWARD} and @code{FFTW_BACKWARD} are provided, where
		2054	@code{FFTW_FORWARD} stands for @math{-1}.
		2055
		2056	@item
		2057	@cindex flags
		2058	@code{flags} is a boolean OR (@samp{\|}) of zero or more of the following:
		2059	@itemize @minus
		2060	@item
		2061	@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
		2062	actually @emph{computing} several FFTs and measuring their execution
		2063	time.
		2064
		2065	@item
		2066	@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
		2067	plan (for a RISC processor with many registers). If neither
		2068	@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
		2069	@code{FFTW_ESTIMATE}.
		2070
		2071	@item
		2072	@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
		2073	and output arrays will be distinct (this is the default).
		2074
		2075	@item
		2076	@code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform
		2077	the transform in-place. (Unlike the one-dimensional transform, this
		2078	``really'' @footnote{@code{fftwnd} actually may use some temporary
		2079	storage (hidden in the plan), but this storage space is only the size of
		2080	the largest dimension of the array, rather than being as big as the
		2081	entire array. (Unless you use @code{fftwnd} to perform one-dimensional
		2082	transforms, in which case the temporary storage required for in-place
		2083	transforms @emph{is} as big as the entire array.)} performs the
		2084	transform in-place.) Note that, if you want to perform in-place
		2085	transforms, you @emph{must} use a plan created with this option.
		2086
		2087	The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
		2088
		2089	@item
		2090	@cindex wisdom
		2091	@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
		2092	in the creation of the plan. (@xref{Words of Wisdom}.) This can greatly
		2093	speed the creation of plans, especially with the @code{FFTW_MEASURE}
		2094	option. @code{FFTW_ESTIMATE} plans can also take advantage of
		2095	@code{wisdom} to produce a more optimal plan (based on past
		2096	measurements) than the estimation heuristic would normally
		2097	generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
		2098	will also be generated if the current transform size is not completely
		2099	understood by existing @code{wisdom}. Note that the same @code{wisdom}
		2100	is shared between one-dimensional and multi-dimensional transforms.
		2101
		2102	@end itemize
		2103
		2104	@item
		2105	@code{in}, @code{out}, @code{istride}, @code{ostride} (only for the
		2106	@code{_create_plan_specific} variants): see corresponding arguments in
		2107	the description of @code{fftwnd}. (@xref{fftwnd,,Computing the
		2108	Multi-dimensional Transform}.)
		2109
		2110	@end itemize
		2111
		2112	@node fftwnd, fftwnd_destroy_plan, fftwnd_create_plan, Multi-dimensional Transforms Reference
		2113	@subsection Computing the Multi-dimensional Transform
		2114
		2115	@example
		2116	#include <fftw.h>
		2117
		2118	void fftwnd(fftwnd_plan plan, int howmany,
		2119	fftw_complex *in, int istride, int idist,
		2120	fftw_complex *out, int ostride, int odist);
		2121
		2122	void fftwnd_one(fftwnd_plan p, fftw_complex *in,
		2123	fftw_complex *out);
		2124	@end example
		2125	@findex fftwnd
		2126	@findex fftwnd_one
		2127
		2128	The function @code{fftwnd} computes the multi-dimensional Fourier
		2129	Transform, using a plan created by @code{fftwnd_create_plan}
		2130	(@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional
		2131	Transforms}). (Note that the plan determines the rank and dimensions of
		2132	the array to be transformed.) The function @code{fftwnd_one} provides a
		2133	simplified interface for the common case of single input array of stride
		2134	1.
		2135	@cindex stride
		2136
		2137	@subsubheading Arguments
		2138	@itemize @bullet
		2139	@item
		2140	@code{plan} is the plan created by @code{fftwnd_create_plan}.
		2141	(@pxref{fftwnd_create_plan,,Plan Creation for Multi-dimensional
		2142	Transforms}). In the case of two and three-dimensional transforms, it
		2143	could also have been created by @code{fftw2d_create_plan} or
		2144	@code{fftw3d_create_plan}, respectively.
		2145
		2146	@item
		2147	@code{howmany} is the number of transforms @code{fftwnd} will compute.
		2148
		2149	@item
		2150	@code{in}, @code{istride} and @code{idist} describe the input array(s).
		2151	There are @code{howmany} input arrays; the first one is pointed to by
		2152	@code{in}, the second one is pointed to by @code{in + idist}, and so on,
		2153	up to @code{in + (howmany - 1) * idist}. Each input array consists of
		2154	complex numbers (@pxref{Data Types}), stored in row-major format
		2155	(@pxref{Multi-dimensional Array Format}), which are not necessarily
		2156	contiguous in memory. Specifically, @code{in[0]} is the first element
		2157	of the first array, @code{in[istride]} is the second element of the
		2158	first array, and so on. In general, the @code{i}-th element of the
		2159	@code{j}-th input array will be in position @code{in[i * istride + j *
		2160	idist]}. Note that, here, @code{i} refers to an index into the row-major
		2161	format for the multi-dimensional array, rather than an index in any
		2162	particular dimension.
		2163
		2164	@itemize @minus
		2165	@item @emph{In-place transforms}:
		2166	@cindex in-place transform
		2167	For plans created with the @code{FFTW_IN_PLACE} option, the transform is
		2168	computed in-place---the output is returned in the @code{in} array, using
		2169	the same strides, etcetera, as were used in the input.
		2170	@end itemize
		2171
		2172	@item
		2173	@code{out}, @code{ostride} and @code{odist} describe the output array(s).
		2174	The format is the same as for the input array.
		2175
		2176	@itemize @minus
		2177	@item @emph{In-place transforms}:
		2178	These parameters are ignored for plans created with the
		2179	@code{FFTW_IN_PLACE} option.
		2180	@end itemize
		2181
		2182	@end itemize
		2183
		2184	The function @code{fftwnd_one} transforms a single, contiguous input
		2185	array to a contiguous output array. By definition, the call
		2186	@example
		2187	fftwnd_one(plan, in, out)
		2188	@end example
		2189	is equivalent to
		2190	@example
		2191	fftwnd(plan, 1, in, 1, 1, out, 1, 1)
		2192	@end example
		2193
		2194	@node fftwnd_destroy_plan, What FFTWND Really Computes, fftwnd, Multi-dimensional Transforms Reference
		2195	@subsection Destroying a Multi-dimensional Plan
		2196
		2197	@example
		2198	#include <fftw.h>
		2199
		2200	void fftwnd_destroy_plan(fftwnd_plan plan);
		2201	@end example
		2202	@findex fftwnd_destroy_plan
		2203
		2204	The function @code{fftwnd_destroy_plan} frees the plan @code{plan}
		2205	and releases all the memory associated with it. After destruction,
		2206	a plan is no longer valid.
		2207
		2208	@node What FFTWND Really Computes, , fftwnd_destroy_plan, Multi-dimensional Transforms Reference
		2209	@subsection What FFTWND Really Computes
		2210	@cindex Discrete Fourier Transform
		2211
		2212	The conventions that we follow for the multi-dimensional transform are
		2213	analogous to those for the one-dimensional transform. In particular, the
		2214	forward transform has a negative sign in the exponent and neither the
		2215	forward nor the backward transforms will perform any normalization.
		2216	Computing the backward transform of the forward transform will multiply
		2217	the array by the product of its dimensions. The output is in-order, and
		2218	the zeroth element of the output is the amplitude of the zero frequency
		2219	component.
		2220
		2221	@iftex
		2222	@tex
		2223	The exact mathematical definition of our multi-dimensional transform
		2224	follows. Let $X$ be a $d$-dimensional complex array whose elements are
		2225	$X[j_1, j_2, \ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{
		2226	1, 2, \ldots, d \}$. Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for
		2227	all ~$s \in \{ 1, 2, \ldots, d \}$.
		2228
		2229	The forward transform computes a complex array~$Y$, whose
		2230	structure is the same as that of~$X$, defined by
		2231
		2232	$$
		2233	Y[i_1, i_2, \ldots, i_d] =
		2234	\sum_{j_1 = 0}^{n_1 - 1}
		2235	\sum_{j_2 = 0}^{n_2 - 1}
		2236	\cdots
		2237	\sum_{j_d = 0}^{n_d - 1}
		2238	X[j_1, j_2, \ldots, j_d]
		2239	\omega_1^{-i_1 j_1}
		2240	\omega_2^{-i_2 j_2}
		2241	\cdots
		2242	\omega_d^{-i_d j_d} \ .
		2243	$$
		2244
		2245	The backward transform computes
		2246	$$
		2247	Y[i_1, i_2, \ldots, i_d] =
		2248	\sum_{j_1 = 0}^{n_1 - 1}
		2249	\sum_{j_2 = 0}^{n_2 - 1}
		2250	\cdots
		2251	\sum_{j_d = 0}^{n_d - 1}
		2252	X[j_1, j_2, \ldots, j_d]
		2253	\omega_1^{i_1 j_1}
		2254	\omega_2^{i_2 j_2}
		2255	\cdots
		2256	\omega_d^{i_d j_d} \ .
		2257	$$
		2258
		2259	Computing the forward transform followed by the backward transform
		2260	will multiply the array by $\prod_{s=1}^{d} n_d$.
		2261	@end tex
		2262	@end iftex
		2263	@ifinfo
		2264	The @TeX{} version of this manual contains the exact definition of the
		2265	@math{n}-dimensional transform FFTW uses. It is not possible to
		2266	display the definition on a ASCII terminal properly.
		2267	@end ifinfo
		2268	@ifhtml
		2269	The Gods forbade using HTML to display mathematical formulas. Please
		2270	see the TeX or Postscript version of this manual for the proper
		2271	definition of the n-dimensional Fourier transform that FFTW
		2272	uses. For completeness, we include a bitmap of the TeX output below:
		2273	<P><center><IMG SRC="equation-3.gif" ALIGN="top"></center>
		2274	@end ifhtml
		2275
		2276	@c -------------------------------------------------------
		2277	@node Real One-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference, Multi-dimensional Transforms Reference, FFTW Reference
		2278	@section Real One-dimensional Transforms Reference
		2279
		2280	The one-dimensional real routines are generally prefixed with
		2281	@code{rfftw_}. @footnote{The etymologically-correct spelling would be
		2282	@code{frftw_}, but it is hard to remember.} Programs using RFFTW
		2283	should be linked with @code{-lrfftw -lfftw -lm} on Unix systems, or with
		2284	the RFFTW, the FFTW, and the standard math libraries in general.
		2285	@cindex RFFTW
		2286	@cindex real transform
		2287	@cindex complex to real transform
		2288
		2289	@menu
		2290	* rfftw_create_plan:: Plan Creation
		2291	* rfftw:: Plan Execution
		2292	* rfftw_destroy_plan:: Plan Destruction
		2293	* What RFFTW Really Computes::
		2294	@end menu
		2295
		2296	@node rfftw_create_plan, rfftw, Real One-dimensional Transforms Reference, Real One-dimensional Transforms Reference
		2297	@subsection Plan Creation for Real One-dimensional Transforms
		2298
		2299	@example
		2300	#include <rfftw.h>
		2301
		2302	rfftw_plan rfftw_create_plan(int n, fftw_direction dir, int flags);
		2303
		2304	rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir,
		2305	int flags, fftw_real *in, int istride,
		2306	fftw_real *out, int ostride);
		2307	@end example
		2308	@tindex rfftw_plan
		2309	@findex rfftw_create_plan
		2310	@findex rfftw_create_plan_specific
		2311
		2312	The function @code{rfftw_create_plan} creates a plan, which is a data
		2313	structure containing all the information that @code{rfftw} needs in
		2314	order to compute the 1D real Fourier transform. You can create as many
		2315	plans as you need, but only one plan for a given array size is required
		2316	(a plan can be reused many times).
		2317
		2318	@code{rfftw_create_plan} returns a valid plan, or @code{NULL} if, for
		2319	some reason, the plan can't be created. In the default installation,
		2320	this cannot happen, but it is possible to configure RFFTW in such a way
		2321	that some input sizes are forbidden, and RFFTW cannot create a plan.
		2322
		2323	The @code{rfftw_create_plan_specific} variant takes as additional
		2324	arguments specific input/output arrays and their strides. For the last
		2325	four arguments, you should pass the arrays and strides that you will
		2326	eventually be passing to @code{rfftw}. The resulting plans will be
		2327	optimized for those arrays and strides, although they may be used on
		2328	other arrays as well. Note: the contents of the in and out arrays are
		2329	@emph{destroyed} by the specific planner (the initial contents are
		2330	ignored, so the arrays need not have been initialized).
		2331	@xref{Discussion on Specific Plans}, for a discussion on specific plans.
		2332
		2333	@subsubheading Arguments
		2334	@itemize @bullet
		2335	@item
		2336	@code{n} is the size of the transform. It can be
		2337	any positive integer.
		2338
		2339	@itemize @minus
		2340	@item
		2341	RFFTW is best at handling sizes of the form
		2342	@ifinfo
		2343	@math{2^a 3^b 5^c 7^d 11^e 13^f},
		2344	@end ifinfo
		2345	@iftex
		2346	@tex
		2347	$2^a 3^b 5^c 7^d 11^e 13^f$,
		2348	@end tex
		2349	@end iftex
		2350	@ifhtml
		2351	2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
		2352	11<SUP>e</SUP> 13<SUP>f</SUP>,
		2353	@end ifhtml
		2354	where @math{e+f} is either @math{0} or
		2355	@math{1}, and the other exponents are arbitrary. Other sizes are
		2356	computed by means of a slow, general-purpose routine (reducing to
		2357	@ifinfo
		2358	@math{O(n^2)}
		2359	@end ifinfo
		2360	@iftex
		2361	@tex
		2362	$O(n^2)$
		2363	@end tex
		2364	@end iftex
		2365	@ifhtml
		2366	O(n<sup>2</sup>)
		2367	@end ifhtml
		2368	performance for prime sizes). (It is possible to customize RFFTW for
		2369	different array sizes. @xref{Installation and Customization}, for more
		2370	information.) Transforms whose sizes are powers of @math{2} are
		2371	especially fast.
		2372	@end itemize
		2373
		2374	@item
		2375	@code{dir} is the direction of the desired transform, either
		2376	@code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL},
		2377	corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD},
		2378	respectively.
		2379	@ctindex FFTW_REAL_TO_COMPLEX
		2380	@ctindex FFTW_COMPLEX_TO_REAL
		2381
		2382	@item
		2383	@cindex flags
		2384	@code{flags} is a boolean OR (@samp{\|}) of zero or more of the following:
		2385	@itemize @minus
		2386	@item
		2387	@code{FFTW_MEASURE}: this flag tells RFFTW to find the optimal plan by
		2388	actually @emph{computing} several FFTs and measuring their
		2389	execution time. Depending on the installation, this can take some
		2390	time.
		2391
		2392	@item
		2393	@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
		2394	plan (for a RISC processor with many registers). If neither
		2395	@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
		2396	@code{FFTW_ESTIMATE}.
		2397
		2398	@item
		2399	@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
		2400	and output arrays will be distinct (this is the default).
		2401
		2402	@item
		2403	@code{FFTW_IN_PLACE}: produce a plan assuming that you want the output
		2404	in the input array. The algorithm used is not necessarily in place:
		2405	RFFTW is able to compute true in-place transforms only for small values
		2406	of @code{n}. If RFFTW is not able to compute the transform in-place, it
		2407	will allocate a temporary array (unless you provide one yourself),
		2408	compute the transform out of place, and copy the result back.
		2409	@emph{Warning: This option changes the meaning of some parameters of
		2410	@code{rfftw}} (@pxref{rfftw,,Computing the Real One-dimensional Transform}).
		2411
		2412	The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
		2413
		2414	@item
		2415	@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
		2416	in the creation of the plan. (@xref{Words of Wisdom}.)
		2417	This can greatly speed the creation of plans, especially with the
		2418	@code{FFTW_MEASURE} option. @code{FFTW_ESTIMATE} plans can also take
		2419	advantage of @code{wisdom} to produce a more optimal plan (based on past
		2420	measurements) than the estimation heuristic would normally
		2421	generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
		2422	will also be generated if the current transform size is not completely
		2423	understood by existing @code{wisdom}.
		2424
		2425	@end itemize
		2426
		2427	@item
		2428	@code{in}, @code{out}, @code{istride}, @code{ostride} (only for
		2429	@code{rfftw_create_plan_specific}): see corresponding arguments in the
		2430	description of @code{rfftw}. (@xref{rfftw,,Computing the Real
		2431	One-dimensional Transform}.) In particular, the @code{out} and
		2432	@code{ostride} parameters have the same special meaning for
		2433	@code{FFTW_IN_PLACE} transforms as they have for @code{rfftw}.
		2434
		2435	@end itemize
		2436
		2437	@node rfftw, rfftw_destroy_plan, rfftw_create_plan, Real One-dimensional Transforms Reference
		2438	@subsection Computing the Real One-dimensional Transform
		2439
		2440	@example
		2441	#include <rfftw.h>
		2442
		2443	void rfftw(rfftw_plan plan, int howmany,
		2444	fftw_real *in, int istride, int idist,
		2445	fftw_real *out, int ostride, int odist);
		2446
		2447	void rfftw_one(rfftw_plan plan, fftw_real in, fftw_real out);
		2448	@end example
		2449	@findex rfftw
		2450	@findex rfftw_one
		2451
		2452	The function @code{rfftw} computes the Real One-dimensional Fourier
		2453	Transform, using a plan created by @code{rfftw_create_plan}
		2454	(@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional
		2455	Transforms}). The function @code{rfftw_one} provides a simplified
		2456	interface for the common case of single input array of stride 1.
		2457	@cindex stride
		2458
		2459	@emph{Important:} When invoked for an out-of-place,
		2460	@code{FFTW_COMPLEX_TO_REAL} transform, the input array is overwritten
		2461	with scratch values by these routines. The input array is not modified
		2462	for @code{FFTW_REAL_TO_COMPLEX} transforms.
		2463
		2464	@subsubheading Arguments
		2465	@itemize @bullet
		2466	@item
		2467	@code{plan} is the plan created by @code{rfftw_create_plan}
		2468	(@pxref{rfftw_create_plan,,Plan Creation for Real One-dimensional
		2469	Transforms}).
		2470
		2471	@item
		2472	@code{howmany} is the number of transforms @code{rfftw} will compute.
		2473	It is faster to tell RFFTW to compute many transforms, instead of
		2474	simply calling @code{rfftw} many times.
		2475
		2476	@item
		2477	@code{in}, @code{istride} and @code{idist} describe the input array(s).
		2478	There are two cases. If the @code{plan} defines a
		2479	@code{FFTW_REAL_TO_COMPLEX} transform, @code{in} is a real array.
		2480	Otherwise, for @code{FFTW_COMPLEX_TO_REAL} transforms, @code{in} is a
		2481	halfcomplex array @emph{whose contents will be destroyed}.
		2482
		2483	@item
		2484	@code{out}, @code{ostride} and @code{odist} describe the output
		2485	array(s), and have the same meaning as the corresponding parameters for
		2486	the input array.
		2487
		2488	@itemize @minus
		2489	@item @emph{In-place transforms}:
		2490	If the @code{plan} specifies an in-place transform, @code{ostride} and
		2491	@code{odist} are always ignored. If @code{out} is @code{NULL},
		2492	@code{out} is ignored, too. Otherwise, @code{out} is interpreted as a
		2493	pointer to an array of @code{n} complex numbers, that FFTW will use as
		2494	temporary space to perform the in-place computation. @code{out} is used
		2495	as scratch space and its contents destroyed. In this case, @code{out}
		2496	must be an ordinary array whose elements are contiguous in memory (no
		2497	striding).
		2498	@end itemize
		2499
		2500	@end itemize
		2501
		2502	The function @code{rfftw_one} transforms a single, contiguous input array
		2503	to a contiguous output array. By definition, the call
		2504	@example
		2505	rfftw_one(plan, in, out)
		2506	@end example
		2507	is equivalent to
		2508	@example
		2509	rfftw(plan, 1, in, 1, 1, out, 1, 1)
		2510	@end example
		2511
		2512	@node rfftw_destroy_plan, What RFFTW Really Computes, rfftw, Real One-dimensional Transforms Reference
		2513	@subsection Destroying a Real One-dimensional Plan
		2514
		2515	@example
		2516	#include <rfftw.h>
		2517
		2518	void rfftw_destroy_plan(rfftw_plan plan);
		2519	@end example
		2520	@findex rfftw_destroy_plan
		2521
		2522	The function @code{rfftw_destroy_plan} frees the plan @code{plan} and
		2523	releases all the memory associated with it. After destruction, a plan
		2524	is no longer valid.
		2525
		2526	@node What RFFTW Really Computes, , rfftw_destroy_plan, Real One-dimensional Transforms Reference
		2527	@subsection What RFFTW Really Computes
		2528	@cindex Discrete Fourier Transform
		2529	In this section, we define precisely what RFFTW computes.
		2530
		2531	The real to complex (@code{FFTW_REAL_TO_COMPLEX}) transform of a real
		2532	array @math{X} of size @math{n} computes an hermitian array @math{Y},
		2533	where
		2534	@iftex
		2535	@tex
		2536	$$
		2537	Y_i = \sum_{j = 0}^{n - 1} X_j e^{-2\pi i j \sqrt{-1}/n}
		2538	$$
		2539	@end tex
		2540	@end iftex
		2541	@ifinfo
		2542	@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi i j sqrt(-1)/n)
		2543	@end ifinfo
		2544	@ifhtml
		2545	<center><IMG SRC="equation-1.gif" ALIGN="top"></center>
		2546	@end ifhtml
		2547	(That @math{Y} is a hermitian array is not intended to be obvious,
		2548	although the proof is easy.) The hermitian array @math{Y} is stored in
		2549	halfcomplex order (@pxref{Data Types}). Currently, RFFTW provides no
		2550	way to compute a real to complex transform with a positive sign in the
		2551	exponent.
		2552
		2553	The complex to real (@code{FFTW_COMPLEX_TO_REAL}) transform of a hermitian
		2554	array @math{X} of size @math{n} computes a real array @math{Y}, where
		2555	@iftex
		2556	@tex
		2557	$$
		2558	Y_i = \sum_{j = 0}^{n - 1} X_j e^{2\pi i j \sqrt{-1}/n}
		2559	$$
		2560	@end tex
		2561	@end iftex
		2562	@ifinfo
		2563	@center Y[i] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi i j sqrt(-1)/n)
		2564	@end ifinfo
		2565	@ifhtml
		2566	<center><IMG SRC="equation-2.gif" ALIGN="top"></center>
		2567	@end ifhtml
		2568	(That @math{Y} is a real array is not intended to be obvious, although
		2569	the proof is easy.) The hermitian input array @math{X} is stored in
		2570	halfcomplex order (@pxref{Data Types}). Currently, RFFTW provides no
		2571	way to compute a complex to real transform with a negative sign in the
		2572	exponent.
		2573
		2574	@cindex normalization
		2575	Like FFTW, RFFTW computes an unnormalized transform. In other words,
		2576	applying the real to complex (forward) and then the complex to real
		2577	(backward) transform will multiply the input by @math{n}.
		2578
		2579	@c -------------------------------------------------------
		2580	@node Real Multi-dimensional Transforms Reference, Wisdom Reference, Real One-dimensional Transforms Reference, FFTW Reference
		2581	@section Real Multi-dimensional Transforms Reference
		2582	@cindex real multi-dimensional transform
		2583	@cindex multi-dimensional transform
		2584
		2585	The multi-dimensional real routines are generally prefixed with
		2586	@code{rfftwnd_}. Programs using RFFTWND should be linked with
		2587	@code{-lrfftw -lfftw -lm} on Unix systems, or with the FFTW, RFFTW, and
		2588	standard math libraries in general.
		2589	@cindex RFFTWND
		2590
		2591	@menu
		2592	* rfftwnd_create_plan:: Plan Creation
		2593	* rfftwnd:: Plan Execution
		2594	* Array Dimensions for Real Multi-dimensional Transforms::
		2595	* Strides in In-place RFFTWND::
		2596	* rfftwnd_destroy_plan:: Plan Destruction
		2597	* What RFFTWND Really Computes::
		2598	@end menu
		2599
		2600	@node rfftwnd_create_plan, rfftwnd, Real Multi-dimensional Transforms Reference, Real Multi-dimensional Transforms Reference
		2601	@subsection Plan Creation for Real Multi-dimensional Transforms
		2602
		2603	@example
		2604	#include <rfftw.h>
		2605
		2606	rfftwnd_plan rfftwnd_create_plan(int rank, const int *n,
		2607	fftw_direction dir, int flags);
		2608
		2609	rfftwnd_plan rfftw2d_create_plan(int nx, int ny,
		2610	fftw_direction dir, int flags);
		2611
		2612	rfftwnd_plan rfftw3d_create_plan(int nx, int ny, int nz,
		2613	fftw_direction dir, int flags);
		2614	@end example
		2615	@tindex rfftwnd_plan
		2616	@tindex fftw_direction
		2617	@findex rfftwnd_create_plan
		2618	@findex rfftw2d_create_plan
		2619	@findex rfftw3d_create_plan
		2620
		2621	The function @code{rfftwnd_create_plan} creates a plan, which is a data
		2622	structure containing all the information that @code{rfftwnd} needs in
		2623	order to compute a multi-dimensional real Fourier transform. You can
		2624	create as many plans as you need, but only one plan for a given array
		2625	size is required (a plan can be reused many times). The functions
		2626	@code{rfftw2d_create_plan} and @code{rfftw3d_create_plan} are optional,
		2627	alternative interfaces to @code{rfftwnd_create_plan} for two and three
		2628	dimensions, respectively.
		2629
		2630	@code{rfftwnd_create_plan} returns a valid plan, or @code{NULL} if, for
		2631	some reason, the plan can't be created. This can happen if the
		2632	arguments are invalid in some way (e.g. if @code{rank} < 0).
		2633
		2634	@subsubheading Arguments
		2635	@itemize @bullet
		2636	@item
		2637	@code{rank} is the dimensionality of the arrays to be transformed. It
		2638	can be any non-negative integer.
		2639
		2640	@item
		2641	@code{n} is a pointer to an array of @code{rank} integers, giving the
		2642	size of each dimension of the arrays to be transformed. Note that these
		2643	are always the dimensions of the @emph{real} arrays; the complex arrays
		2644	have different dimensions (@pxref{Array Dimensions for Real
		2645	Multi-dimensional Transforms}). These sizes, which must be positive
		2646	integers, correspond to the dimensions of row-major
		2647	arrays---i.e. @code{n[0]} is the size of the dimension whose indices
		2648	vary most slowly, and so on. (@xref{Multi-dimensional Array Format}, for
		2649	more information.)
		2650	@itemize @minus
		2651	@item
		2652	@xref{rfftw_create_plan,,Plan Creation for Real One-dimensional Transforms},
		2653	for more information regarding optimal array sizes.
		2654	@end itemize
		2655
		2656	@item
		2657	@code{nx} and @code{ny} in @code{rfftw2d_create_plan} are positive
		2658	integers specifying the dimensions of the rank 2 array to be
		2659	transformed. i.e. they specify that the transform will operate on
		2660	@code{nx x ny} arrays in row-major order, where @code{nx} is the number
		2661	of rows and @code{ny} is the number of columns.
		2662
		2663	@item
		2664	@code{nx}, @code{ny} and @code{nz} in @code{rfftw3d_create_plan} are
		2665	positive integers specifying the dimensions of the rank 3 array to be
		2666	transformed. i.e. they specify that the transform will operate on
		2667	@code{nx x ny x nz} arrays in row-major order.
		2668
		2669	@item
		2670	@code{dir} is the direction of the desired transform, either
		2671	@code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL},
		2672	corresponding to @code{FFTW_FORWARD} or @code{FFTW_BACKWARD},
		2673	respectively.
		2674
		2675	@item
		2676	@cindex flags
		2677	@code{flags} is a boolean OR (@samp{\|}) of zero or more of the following:
		2678	@itemize @minus
		2679	@item
		2680	@code{FFTW_MEASURE}: this flag tells FFTW to find the optimal plan by
		2681	actually @emph{computing} several FFTs and measuring their execution
		2682	time.
		2683
		2684	@item
		2685	@code{FFTW_ESTIMATE}: do not run any FFT and provide a ``reasonable''
		2686	plan (for a RISC processor with many registers). If neither
		2687	@code{FFTW_ESTIMATE} nor @code{FFTW_MEASURE} is provided, the default is
		2688	@code{FFTW_ESTIMATE}.
		2689
		2690	@item
		2691	@code{FFTW_OUT_OF_PLACE}: produce a plan assuming that the input
		2692	and output arrays will be distinct (this is the default).
		2693
		2694	@item
		2695	@cindex in-place transform
		2696	@code{FFTW_IN_PLACE}: produce a plan assuming that you want to perform
		2697	the transform in-place. (Unlike the one-dimensional transform, this
		2698	``really'' performs the transform in-place.) Note that, if you want to
		2699	perform in-place transforms, you @emph{must} use a plan created with
		2700	this option. The use of this option has important implications for the
		2701	size of the input/output array (@pxref{rfftwnd,,Computing the Real
		2702	Multi-dimensional Transform}).
		2703
		2704	The default mode of operation is @code{FFTW_OUT_OF_PLACE}.
		2705
		2706	@item
		2707	@cindex wisdom
		2708	@code{FFTW_USE_WISDOM}: use any @code{wisdom} that is available to help
		2709	in the creation of the plan. (@xref{Words of Wisdom}.) This can greatly
		2710	speed the creation of plans, especially with the @code{FFTW_MEASURE}
		2711	option. @code{FFTW_ESTIMATE} plans can also take advantage of
		2712	@code{wisdom} to produce a more optimal plan (based on past
		2713	measurements) than the estimation heuristic would normally
		2714	generate. When the @code{FFTW_MEASURE} option is used, new @code{wisdom}
		2715	will also be generated if the current transform size is not completely
		2716	understood by existing @code{wisdom}. Note that the same @code{wisdom}
		2717	is shared between one-dimensional and multi-dimensional transforms.
		2718
		2719	@end itemize
		2720
		2721	@end itemize
		2722
		2723	@node rfftwnd, Array Dimensions for Real Multi-dimensional Transforms, rfftwnd_create_plan, Real Multi-dimensional Transforms Reference
		2724	@subsection Computing the Real Multi-dimensional Transform
		2725
		2726	@example
		2727	#include <rfftw.h>
		2728
		2729	void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany,
		2730	fftw_real *in, int istride, int idist,
		2731	fftw_complex *out, int ostride, int odist);
		2732	void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany,
		2733	fftw_complex *in, int istride, int idist,
		2734	fftw_real *out, int ostride, int odist);
		2735
		2736	void rfftwnd_one_real_to_complex(rfftwnd_plan p, fftw_real *in,
		2737	fftw_complex *out);
		2738	void rfftwnd_one_complex_to_real(rfftwnd_plan p, fftw_complex *in,
		2739	fftw_real *out);
		2740	@end example
		2741	@findex rfftwnd_real_to_complex
		2742	@findex rfftwnd_complex_to_real
		2743	@findex rfftwnd_one_real_to_complex
		2744	@findex rfftwnd_one_complex_to_real
		2745
		2746	These functions compute the real multi-dimensional Fourier Transform,
		2747	using a plan created by @code{rfftwnd_create_plan}
		2748	(@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional
		2749	Transforms}). (Note that the plan determines the rank and dimensions of
		2750	the array to be transformed.) The @samp{@code{rfftwnd_one_}} functions
		2751	provide a simplified interface for the common case of single input array
		2752	of stride 1. Unlike other transform routines in FFTW, we here use
		2753	separate functions for the two directions of the transform in order to
		2754	correctly express the datatypes of the parameters.
		2755
		2756	@emph{Important:} When invoked for an out-of-place,
		2757	@code{FFTW_COMPLEX_TO_REAL} transform with @code{rank > 1}, the input
		2758	array is overwritten with scratch values by these routines. The input
		2759	array is not modified for @code{FFTW_REAL_TO_COMPLEX} transforms or for
		2760	@code{FFTW_COMPLEX_TO_REAL} with @code{rank == 1}.
		2761
		2762	@subsubheading Arguments
		2763	@itemize @bullet
		2764	@item
		2765	@code{plan} is the plan created by @code{rfftwnd_create_plan}.
		2766	(@pxref{rfftwnd_create_plan,,Plan Creation for Real Multi-dimensional
		2767	Transforms}). In the case of two and three-dimensional transforms, it
		2768	could also have been created by @code{rfftw2d_create_plan} or
		2769	@code{rfftw3d_create_plan}, respectively.
		2770
		2771	@code{FFTW_REAL_TO_COMPLEX} plans must be used with the
		2772	@samp{@code{real_to_complex}} functions, and @code{FFTW_COMPLEX_TO_REAL}
		2773	plans must be used with the @samp{@code{complex_to_real}} functions. It
		2774	is an error to mismatch the plan direction and the transform function.
		2775
		2776	@item
		2777	@code{howmany} is the number of transforms to be computed.
		2778
		2779	@item
		2780	@cindex stride
		2781	@code{in}, @code{istride} and @code{idist} describe the input array(s).
		2782	There are @code{howmany} input arrays; the first one is pointed to by
		2783	@code{in}, the second one is pointed to by @code{in + idist}, and so on,
		2784	up to @code{in + (howmany - 1) * idist}. Each input array is stored in
		2785	row-major format (@pxref{Multi-dimensional Array Format}), and is not
		2786	necessarily contiguous in memory. Specifically, @code{in[0]} is the
		2787	first element of the first array, @code{in[istride]} is the second
		2788	element of the first array, and so on. In general, the @code{i}-th
		2789	element of the @code{j}-th input array will be in position @code{in[i *
		2790	istride + j * idist]}. Note that, here, @code{i} refers to an index into
		2791	the row-major format for the multi-dimensional array, rather than an
		2792	index in any particular dimension.
		2793
		2794	The dimensions of the arrays are different for real and complex data,
		2795	and are discussed in more detail below (@pxref{Array Dimensions for Real
		2796	Multi-dimensional Transforms}).
		2797
		2798	@itemize @minus
		2799	@item @emph{In-place transforms}:
		2800	For plans created with the @code{FFTW_IN_PLACE} option, the transform is
		2801	computed in-place---the output is returned in the @code{in} array. The
		2802	meaning of the @code{stride} and @code{dist} parameters in this case is
		2803	subtle and is discussed below (@pxref{Strides in In-place RFFTWND}).
		2804	@end itemize
		2805
		2806	@item
		2807	@code{out}, @code{ostride} and @code{odist} describe the output
		2808	array(s). The format is the same as that for the input array. See
		2809	below for a discussion of the dimensions of the output array for real
		2810	and complex data.
		2811
		2812	@itemize @minus
		2813	@item @emph{In-place transforms}:
		2814	These parameters are ignored for plans created with the
		2815	@code{FFTW_IN_PLACE} option.
		2816	@end itemize
		2817
		2818	@end itemize
		2819
		2820	The function @code{rfftwnd_one} transforms a single, contiguous input
		2821	array to a contiguous output array. By definition, the call
		2822	@example
		2823	rfftwnd_one_...(plan, in, out)
		2824	@end example
		2825	is equivalent to
		2826	@example
		2827	rfftwnd_...(plan, 1, in, 1, 1, out, 1, 1)
		2828	@end example
		2829
		2830	@node Array Dimensions for Real Multi-dimensional Transforms, Strides in In-place RFFTWND, rfftwnd, Real Multi-dimensional Transforms Reference
		2831	@subsection Array Dimensions for Real Multi-dimensional Transforms
		2832
		2833	@cindex rfftwnd array format
		2834	The output of a multi-dimensional transform of real data contains
		2835	symmetries that, in principle, make half of the outputs redundant
		2836	(@pxref{What RFFTWND Really Computes}). In practice, it is not
		2837	possible to entirely realize these savings in an efficient and
		2838	understandable format. Instead, the output of the rfftwnd transforms is
		2839	@emph{slightly} over half of the output of the corresponding complex
		2840	transform. We do not ``pack'' the data in any way, but store it as an
		2841	ordinary array of @code{fftw_complex} values. In fact, this data is
		2842	simply a subsection of what would be the array in the corresponding
		2843	complex transform.
		2844
		2845	Specifically, for a real transform of dimensions
		2846	@iftex
		2847	@tex
		2848	$n_1 \times n_2 \times \cdots \times n_d$,
		2849	@end tex
		2850	@end iftex
		2851	@ifinfo
		2852	n1 x n2 x ... x nd,
		2853	@end ifinfo
		2854	@ifhtml
		2855	n<sub>1</sub> x n<sub>2</sub> x ... x n<sub>d</sub>,
		2856	@end ifhtml
		2857	the complex data is an
		2858	@iftex
		2859	@tex
		2860	$n_1 \times n_2 \times \cdots \times (n_d/2+1)$
		2861	@end tex
		2862	@end iftex
		2863	@ifinfo
		2864	n1 x n2 x ... x (nd/2+1)
		2865	@end ifinfo
		2866	@ifhtml
		2867	n<sub>1</sub> x n<sub>2</sub> x ... x (n<sub>d</sub>/2+1)
		2868	@end ifhtml
		2869	array of @code{fftw_complex} values in row-major order (with the
		2870	division rounded down). That is, we only store the lower half (plus one
		2871	element) of the last dimension of the data from the ordinary complex
		2872	transform. (We could have instead taken half of any other dimension,
		2873	but implementation turns out to be simpler if the last, contiguous,
		2874	dimension is used.)
		2875
		2876	@cindex in-place transform
		2877	@cindex padding
		2878	Since the complex data is slightly larger than the real data, some
		2879	complications arise for in-place transforms. In this case, the final
		2880	dimension of the real data must be padded with extra values to
		2881	accommodate the size of the complex data---two extra if the last
		2882	dimension is even and one if it is odd. That is, the last dimension of
		2883	the real data must physically contain
		2884	@iftex
		2885	@tex
		2886	$2 (n_d/2+1)$
		2887	@end tex
		2888	@end iftex
		2889	@ifinfo
		2890	2 * (nd/2+1)
		2891	@end ifinfo
		2892	@ifhtml
		2893	2 * (n<sub>d</sub>/2+1)
		2894	@end ifhtml
		2895	@code{fftw_real} values (exactly enough to hold the complex data).
		2896	This physical array size does not, however, change the @emph{logical}
		2897	array size---only
		2898	@iftex
		2899	@tex
		2900	$n_d$
		2901	@end tex
		2902	@end iftex
		2903	@ifinfo
		2904	nd
		2905	@end ifinfo
		2906	@ifhtml
		2907	n<sub>d</sub>
		2908	@end ifhtml
		2909	values are actually stored in the last dimension, and
		2910	@iftex
		2911	@tex
		2912	$n_d$
		2913	@end tex
		2914	@end iftex
		2915	@ifinfo
		2916	nd
		2917	@end ifinfo
		2918	@ifhtml
		2919	n<sub>d</sub>
		2920	@end ifhtml
		2921	is the last dimension passed to @code{rfftwnd_create_plan}.
		2922
		2923	@node Strides in In-place RFFTWND, rfftwnd_destroy_plan, Array Dimensions for Real Multi-dimensional Transforms, Real Multi-dimensional Transforms Reference
		2924	@subsection Strides in In-place RFFTWND
		2925
		2926	@cindex rfftwnd array format
		2927	@cindex stride
		2928	The fact that the input and output datatypes are different for rfftwnd
		2929	complicates the meaning of the @code{stride} and @code{dist} parameters
		2930	of in-place transforms---are they in units of @code{fftw_real} or
		2931	@code{fftw_complex} elements? When reading the input, they are
		2932	interpreted in units of the datatype of the input data. When writing
		2933	the output, the @code{istride} and @code{idist} are translated to the
		2934	output datatype's ``units'' in one of two ways, corresponding to the two
		2935	most common situations in which @code{stride} and @code{dist} parameters
		2936	are useful. Below, we refer to these ``translated'' parameters as
		2937	@code{ostride_t} and @code{odist_t}. (Note that these are computed
		2938	internally by rfftwnd; the actual @code{ostride} and @code{odist}
		2939	parameters are ignored for in-place transforms.)
		2940
		2941	First, there is the case where you are transforming a number of
		2942	contiguous arrays located one after another in memory. In this
		2943	situation, @code{istride} is @code{1} and @code{idist} is the product of
		2944	the physical dimensions of the array. @code{ostride_t} and
		2945	@code{odist_t} are then chosen so that the output arrays are contiguous
		2946	and lie on top of the input arrays. @code{ostride_t} is therefore
		2947	@code{1}. For a real-to-complex transform, @code{odist_t} is
		2948	@code{idist/2}; for a complex-to-real transform, @code{odist_t} is
		2949	@code{idist*2}.
		2950
		2951	The second case is when you have an array in which each element has
		2952	@code{nc} components (e.g. a structure with @code{nc} numeric fields),
		2953	and you want to transform all of the components at once. Here,
		2954	@code{istride} is @code{nc} and @code{idist} is @code{1}. For this
		2955	case, it is natural to want the output to also have @code{nc}
		2956	consecutive components, now of the output data type; this is exactly
		2957	what rfftwnd does. Specifically, it uses an @code{ostride_t} equal to
		2958	@code{istride}, and an @code{odist_t} of @code{1}. (Astute readers will
		2959	realize that some extra buffer space is required in order to perform
		2960	such a transform; this is handled automatically by rfftwnd.)
		2961
		2962	The general rule is as follows. @code{ostride_t} equals @code{istride}.
		2963	If @code{idist} is @code{1}, then @code{odist_t} is @code{1}.
		2964	Otherwise, for a real-to-complex transform @code{odist_t} is
		2965	@code{idist/2} and for a complex-to-real transform @code{odist_t} is
		2966	@code{idist*2}.
		2967
		2968	@node rfftwnd_destroy_plan, What RFFTWND Really Computes, Strides in In-place RFFTWND, Real Multi-dimensional Transforms Reference
		2969	@subsection Destroying a Multi-dimensional Plan
		2970
		2971	@example
		2972	#include <rfftw.h>
		2973
		2974	void rfftwnd_destroy_plan(rfftwnd_plan plan);
		2975	@end example
		2976	@findex rfftwnd_destroy_plan
		2977
		2978	The function @code{rfftwnd_destroy_plan} frees the plan @code{plan}
		2979	and releases all the memory associated with it. After destruction,
		2980	a plan is no longer valid.
		2981
		2982	@node What RFFTWND Really Computes, , rfftwnd_destroy_plan, Real Multi-dimensional Transforms Reference
		2983	@subsection What RFFTWND Really Computes
		2984	@cindex Discrete Fourier Transform
		2985
		2986	The conventions that we follow for the real multi-dimensional transform
		2987	are analogous to those for the complex multi-dimensional transform. In
		2988	particular, the forward transform has a negative sign in the exponent
		2989	and neither the forward nor the backward transforms will perform any
		2990	normalization. Computing the backward transform of the forward
		2991	transform will multiply the array by the product of its dimensions (that
		2992	is, the logical dimensions of the real data). The forward transform is
		2993	real-to-complex and the backward transform is complex-to-real.
		2994
		2995	@cindex Discrete Fourier Transform
		2996	@cindex hermitian array
		2997	@iftex
		2998	@tex
		2999	The exact mathematical definition of our real multi-dimensional
		3000	transform follows.
		3001
		3002	@noindent@emph{Real to complex (forward) transform.}
		3003	Let $X$ be a $d$-dimensional real array whose elements are $X[j_1, j_2,
		3004	\ldots, j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots,
		3005	d \}$. Let also $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{
		3006	1, 2, \ldots, d \}$.
		3007
		3008	The real to complex transform computes a complex array~$Y$, whose
		3009	structure is the same as that of~$X$, defined by
		3010
		3011	$$
		3012	Y[i_1, i_2, \ldots, i_d] =
		3013	\sum_{j_1 = 0}^{n_1 - 1}
		3014	\sum_{j_2 = 0}^{n_2 - 1}
		3015	\cdots
		3016	\sum_{j_d = 0}^{n_d - 1}
		3017	X[j_1, j_2, \ldots, j_d]
		3018	\omega_1^{-i_1 j_1}
		3019	\omega_2^{-i_2 j_2}
		3020	\cdots
		3021	\omega_d^{-i_d j_d} \ .
		3022	$$
		3023
		3024	The output array $Y$ enjoys a multidimensional hermitian symmetry, that
		3025	is, the identity $Y[i_1, i_2, \ldots, i_d] = Y[n_1-i_1, n_2-i_2, \ldots,
		3026	n_d - i_d]^{*}$ holds for all $0 \leq i_s < n_s$. Because of this
		3027	symmetry, $Y$ is stored in the peculiar way described in @ref{Array
		3028	Dimensions for Real Multi-dimensional Transforms}.
		3029	@cindex hermitian array
		3030
		3031	@noindent@emph{Complex to real (backward) transform.} Let $X$ be a
		3032	$d$-dimensional complex array whose elements are $X[j_1, j_2, \ldots,
		3033	j_d]$, where $0 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, d \}$.
		3034	The array $X$ must be hermitian, that is, the identity $X[j_1, j_2,
		3035	\ldots, j_d] = X[n_1-j_1, n_2-j_2, \ldots, n_d - j_d]^{*}$ must hold for all
		3036	$0 \leq j_s < n_s$. Moreover, $X$ must be stored in memory in the
		3037	peculiar way described in @ref{Array Dimensions for Real
		3038	Multi-dimensional Transforms}.
		3039
		3040	Let $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ 1, 2,
		3041	\ldots, d \}$. The complex to real transform computes a real array~$Y$, whose
		3042	structure is the same as that of~$X$, defined by
		3043
		3044	$$
		3045	Y[i_1, i_2, \ldots, i_d] =
		3046	\sum_{j_1 = 0}^{n_1 - 1}
		3047	\sum_{j_2 = 0}^{n_2 - 1}
		3048	\cdots
		3049	\sum_{j_d = 0}^{n_d - 1}
		3050	X[j_1, j_2, \ldots, j_d]
		3051	\omega_1^{i_1 j_1}
		3052	\omega_2^{i_2 j_2}
		3053	\cdots
		3054	\omega_d^{i_d j_d} \ .
		3055	$$
		3056
		3057	(That $Y$ is real is not meant to be obvious, although the proof is
		3058	easy.)
		3059
		3060	Computing the forward transform followed by the backward transform
		3061	will multiply the array by $\prod_{s=1}^{d} n_d$.
		3062	@end tex
		3063	@end iftex
		3064	@ifinfo
		3065	The @TeX{} version of this manual contains the exact definition of the
		3066	@math{n}-dimensional transform RFFTWND uses. It is not possible to
		3067	display the definition on a ASCII terminal properly.
		3068	@end ifinfo
		3069	@ifhtml
		3070	The Gods forbade using HTML to display mathematical formulas. Please
		3071	see the TeX or Postscript version of this manual for the proper
		3072	definition of the n-dimensional real Fourier transform that RFFTW
		3073	uses. For completeness, we include a bitmap of the TeX output below:
		3074	<P><center><IMG SRC="equation-4.gif" ALIGN="top"></center>
		3075	@end ifhtml
		3076
		3077
		3078	@c -------------------------------------------------------
		3079	@node Wisdom Reference, Memory Allocator Reference, Real Multi-dimensional Transforms Reference, FFTW Reference
		3080	@section Wisdom Reference
		3081	@menu
		3082	* fftw_export_wisdom::
		3083	* fftw_import_wisdom::
		3084	* fftw_forget_wisdom::
		3085	@end menu
		3086
		3087	@cindex wisdom
		3088	@node fftw_export_wisdom, fftw_import_wisdom, Wisdom Reference, Wisdom Reference
		3089	@subsection Exporting Wisdom
		3090
		3091	@example
		3092	#include <fftw.h>
		3093
		3094	void fftw_export_wisdom(void (emitter)(char c, void ), void *data);
		3095	void fftw_export_wisdom_to_file(FILE *output_file);
		3096	char *fftw_export_wisdom_to_string(void);
		3097	@end example
		3098	@findex fftw_export_wisdom
		3099	@findex fftw_export_wisdom_to_file
		3100	@findex fftw_export_wisdom_to_string
		3101
		3102	These functions allow you to export all currently accumulated
		3103	@code{wisdom} in a form from which it can be later imported and
		3104	restored, even during a separate run of the program. (@xref{Words of Wisdom}.) The current store of @code{wisdom} is not
		3105	affected by calling any of these routines.
		3106
		3107	@code{fftw_export_wisdom} exports the @code{wisdom} to any output
		3108	medium, as specified by the callback function
		3109	@code{emitter}. @code{emitter} is a @code{putc}-like function that
		3110	writes the character @code{c} to some output; its second parameter is
		3111	the @code{data} pointer passed to @code{fftw_export_wisdom}. For
		3112	convenience, the following two ``wrapper'' routines are provided:
		3113
		3114	@code{fftw_export_wisdom_to_file} writes the @code{wisdom} to the
		3115	current position in @code{output_file}, which should be open with write
		3116	permission. Upon exit, the file remains open and is positioned at the
		3117	end of the @code{wisdom} data.
		3118
		3119	@code{fftw_export_wisdom_to_string} returns a pointer to a
		3120	@code{NULL}-terminated string holding the @code{wisdom} data. This
		3121	string is dynamically allocated, and it is the responsibility of the
		3122	caller to deallocate it with @code{fftw_free} when it is no longer
		3123	needed.
		3124
		3125	All of these routines export the wisdom in the same format, which we
		3126	will not document here except to say that it is LISP-like ASCII text
		3127	that is insensitive to white space.
		3128
		3129	@node fftw_import_wisdom, fftw_forget_wisdom, fftw_export_wisdom, Wisdom Reference
		3130	@subsection Importing Wisdom
		3131
		3132	@example
		3133	#include <fftw.h>
		3134
		3135	fftw_status fftw_import_wisdom(int (get_input)(void ), void *data);
		3136	fftw_status fftw_import_wisdom_from_file(FILE *input_file);
		3137	fftw_status fftw_import_wisdom_from_string(const char *input_string);
		3138	@end example
		3139	@findex fftw_import_wisdom
		3140	@findex fftw_import_wisdom_from_file
		3141	@findex fftw_import_wisdom_from_string
		3142
		3143	These functions import @code{wisdom} into a program from data stored by
		3144	the @code{fftw_export_wisdom} functions above. (@xref{Words of Wisdom}.)
		3145	The imported @code{wisdom} supplements rather than replaces any
		3146	@code{wisdom} already accumulated by the running program (except when
		3147	there is conflicting @code{wisdom}, in which case the existing wisdom is
		3148	replaced).
		3149
		3150	@code{fftw_import_wisdom} imports @code{wisdom} from any input medium,
		3151	as specified by the callback function @code{get_input}. @code{get_input}
		3152	is a @code{getc}-like function that returns the next character in the
		3153	input; its parameter is the @code{data} pointer passed to
		3154	@code{fftw_import_wisdom}. If the end of the input data is reached
		3155	(which should never happen for valid data), it may return either
		3156	@code{NULL} (ASCII 0) or @code{EOF} (as defined in @code{<stdio.h>}).
		3157	For convenience, the following two ``wrapper'' routines are provided:
		3158
		3159	@code{fftw_import_wisdom_from_file} reads @code{wisdom} from the
		3160	current position in @code{input_file}, which should be open with read
		3161	permission. Upon exit, the file remains open and is positioned at the
		3162	end of the @code{wisdom} data.
		3163
		3164	@code{fftw_import_wisdom_from_string} reads @code{wisdom} from the
		3165	@code{NULL}-terminated string @code{input_string}.
		3166
		3167	The return value of these routines is @code{FFTW_SUCCESS} if the wisdom
		3168	was read successfully, and @code{FFTW_FAILURE} otherwise. Note that, in
		3169	all of these functions, any data in the input stream past the end of the
		3170	@code{wisdom} data is simply ignored (it is not even read if the
		3171	@code{wisdom} data is well-formed).
		3172
		3173	@node fftw_forget_wisdom, , fftw_import_wisdom, Wisdom Reference
		3174	@subsection Forgetting Wisdom
		3175
		3176	@example
		3177	#include <fftw.h>
		3178
		3179	void fftw_forget_wisdom(void);
		3180	@end example
		3181	@findex fftw_forget_wisdom
		3182
		3183	Calling @code{fftw_forget_wisdom} causes all accumulated @code{wisdom}
		3184	to be discarded and its associated memory to be freed. (New
		3185	@code{wisdom} can still be gathered subsequently, however.)
		3186
		3187	@c -------------------------------------------------------
		3188	@node Memory Allocator Reference, Thread safety, Wisdom Reference, FFTW Reference
		3189	@section Memory Allocator Reference
		3190
		3191	@example
		3192	#include <fftw.h>
		3193
		3194	void (fftw_malloc_hook) (size_t n);
		3195	void (fftw_free_hook) (void p);
		3196	@end example
		3197	@vindex fftw_malloc_hook
		3198	@findex fftw_malloc
		3199	@ffindex malloc
		3200	@vindex fftw_free_hook
		3201
		3202	Whenever it has to allocate and release memory, FFTW ordinarily calls
		3203	@code{malloc} and @code{free}.
		3204	If @code{malloc} fails, FFTW prints an error message and exits. This
		3205	behavior may be undesirable in some applications. Also, special
		3206	memory-handling functions may be necessary in certain
		3207	environments. Consequently, FFTW provides means by which you can install
		3208	your own memory allocator and take whatever error-correcting action you
		3209	find appropriate. The variables @code{fftw_malloc_hook} and
		3210	@code{fftw_free_hook} are pointers to functions, and they are normally
		3211	@code{NULL}. If you set those variables to point to other functions,
		3212	then FFTW will use your routines instead of @code{malloc} and
		3213	@code{free}. @code{fftw_malloc_hook} must point to a @code{malloc}-like
		3214	function, and @code{fftw_free_hook} must point to a @code{free}-like
		3215	function.
		3216
		3217	@c -------------------------------------------------------
		3218	@node Thread safety, , Memory Allocator Reference, FFTW Reference
		3219	@section Thread safety
		3220
		3221	@cindex threads
		3222	@cindex thread safety
		3223	Users writing multi-threaded programs must concern themselves with the
		3224	@dfn{thread safety} of the libraries they use---that is, whether it is
		3225	safe to call routines in parallel from multiple threads. FFTW can be
		3226	used in such an environment, but some care must be taken because certain
		3227	parts of FFTW use private global variables to share data between calls.
		3228	In particular, the plan-creation functions share trigonometric tables
		3229	and accumulated @code{wisdom}. (Users should note that these comments
		3230	only apply to programs using shared-memory threads. Parallelism using
		3231	MPI or forked processes involves a separate address-space and global
		3232	variables for each process, and is not susceptible to problems of this
		3233	sort.)
		3234
		3235	The central restriction of FFTW is that it is not safe to create
		3236	multiple plans in parallel. You must either create all of your plans
		3237	from a single thread, or instead use a semaphore, mutex, or other
		3238	mechanism to ensure that different threads don't attempt to create plans
		3239	at the same time. The same restriction also holds for destruction of
		3240	plans and importing/forgetting @code{wisdom}. Once created, a plan may
		3241	safely be used in any thread.
		3242
		3243	The actual transform routines in FFTW (@code{fftw_one}, etcetera) are
		3244	re-entrant and thread-safe, so it is fine to call them simultaneously
		3245	from multiple threads. Another question arises, however---is it safe to
		3246	use the @emph{same plan} for multiple transforms in parallel? (It would
		3247	be unsafe if, for example, the plan were modified in some way by the
		3248	transform.) We address this question by defining an additional planner
		3249	flag, @code{FFTW_THREADSAFE}.
		3250	@ctindex FFTW_THREADSAFE
		3251	When included in the flags for any of the plan-creation routines,
		3252	@code{FFTW_THREADSAFE} guarantees that the resulting plan will be
		3253	read-only and safe to use in parallel by multiple threads.
		3254
		3255	@c ************************************************************
		3256	@node Parallel FFTW, Calling FFTW from Fortran, FFTW Reference, Top
		3257	@chapter Parallel FFTW
		3258
		3259	@cindex parallel transform
		3260	In this chapter we discuss the use of FFTW in a parallel environment,
		3261	documenting the different parallel libraries that we have provided.
		3262	(Users calling FFTW from a multi-threaded program should also consult
		3263	@ref{Thread safety}.) The FFTW package currently contains three parallel
		3264	transform implementations that leverage the uniprocessor FFTW code:
		3265
		3266	@itemize @bullet
		3267
		3268	@item
		3269	@cindex threads
		3270	The first set of routines utilizes shared-memory threads for parallel
		3271	one- and multi-dimensional transforms of both real and complex data.
		3272	Any program using FFTW can be trivially modified to use the
		3273	multi-threaded routines. This code can use any common threads
		3274	implementation, including POSIX threads. (POSIX threads are available
		3275	on most Unix variants, including Linux.) These routines are located in
		3276	the @code{threads} directory, and are documented in @ref{Multi-threaded
		3277	FFTW}.
		3278
		3279	@item
		3280	@cindex MPI
		3281	@cindex distributed memory
		3282	The @code{mpi} directory contains multi-dimensional transforms
		3283	of real and complex data for parallel machines supporting MPI. It also
		3284	includes parallel one-dimensional transforms for complex data. The main
		3285	feature of this code is that it supports distributed-memory transforms,
		3286	so it runs on everything from workstation clusters to massively-parallel
		3287	supercomputers. More information on MPI can be found at the
		3288	@uref{http://www.mcs.anl.gov/mpi, MPI home page}. The FFTW MPI routines
		3289	are documented in @ref{MPI FFTW}.
		3290
		3291	@item
		3292	@cindex Cilk
		3293	We also have an experimental parallel implementation written in Cilk, a
		3294	C-like parallel language developed at MIT and currently available for
		3295	several SMP platforms. For more information on Cilk see
		3296	@uref{http://supertech.lcs.mit.edu/cilk, the Cilk home page}. The FFTW
		3297	Cilk code can be found in the @code{cilk} directory, with parallelized
		3298	one- and multi-dimensional transforms of complex data. The Cilk FFTW
		3299	routines are documented in @code{cilk/README}.
		3300
		3301	@end itemize
		3302
		3303	@menu
		3304	* Multi-threaded FFTW::
		3305	* MPI FFTW::
		3306	@end menu
		3307
		3308	@c ------------------------------------------------------------
		3309	@node Multi-threaded FFTW, MPI FFTW, Parallel FFTW, Parallel FFTW
		3310	@section Multi-threaded FFTW
		3311
		3312	@cindex threads
		3313	In this section we document the parallel FFTW routines for shared-memory
		3314	threads on SMP hardware. These routines, which support parellel one-
		3315	and multi-dimensional transforms of both real and complex data, are the
		3316	easiest way to take advantage of multiple processors with FFTW. They
		3317	work just like the corresponding uniprocessor transform routines, except
		3318	that they take the number of parallel threads to use as an extra
		3319	parameter. Any program that uses the uniprocessor FFTW can be trivially
		3320	modified to use the multi-threaded FFTW.
		3321
		3322	@menu
		3323	* Installation and Supported Hardware/Software::
		3324	* Usage of Multi-threaded FFTW::
		3325	* How Many Threads to Use?::
		3326	* Using Multi-threaded FFTW in a Multi-threaded Program::
		3327	* Tips for Optimal Threading::
		3328	@end menu
		3329
		3330	@c -------------------------------------------------------
		3331	@node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
		3332	@subsection Installation and Supported Hardware/Software
		3333
		3334	All of the FFTW threads code is located in the @code{threads}
		3335	subdirectory of the FFTW package. On Unix systems, the FFTW threads
		3336	libraries and header files can be automatically configured, compiled,
		3337	and installed along with the uniprocessor FFTW libraries simply by
		3338	including @code{--enable-threads} in the flags to the @code{configure}
		3339	script (@pxref{Installation on Unix}). (Note also that the threads
		3340	routines, when enabled, are automatically tested by the @samp{@code{make
		3341	check}} self-tests.)
		3342	@fpindex configure
		3343
		3344	The threads routines require your operating system to have some sort of
		3345	shared-memory threads support. Specifically, the FFTW threads package
		3346	works with POSIX threads (available on most Unix variants, including
		3347	Linux), Solaris threads, @uref{http://www.be.com,BeOS} threads (tested
		3348	on BeOS DR8.2), Mach C threads (reported to work by users), and Win32
		3349	threads (reported to work by users). (There is also untested code to
		3350	use MacOS MP threads.) If you have a shared-memory machine that uses a
		3351	different threads API, it should be a simple matter of programming to
		3352	include support for it; see the file @code{fftw_threads-int.h} for more
		3353	detail.
		3354
		3355	SMP hardware is not required, although of course you need multiple
		3356	processors to get any benefit from the multithreaded transforms.
		3357
		3358	@c -------------------------------------------------------
		3359	@node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
		3360	@subsection Usage of Multi-threaded FFTW
		3361
		3362	Here, it is assumed that the reader is already familiar with the usage
		3363	of the uniprocessor FFTW routines, described elsewhere in this manual.
		3364	We only describe what one has to change in order to use the
		3365	multi-threaded routines.
		3366
		3367	First, instead of including @code{<fftw.h>} or @code{<rfftw.h>}, you
		3368	should include the files @code{<fftw_threads.h>} or
		3369	@code{<rfftw_threads.h>}, respectively.
		3370
		3371	Second, before calling any FFTW routines, you should call the function:
		3372
		3373	@example
		3374	int fftw_threads_init(void);
		3375	@end example
		3376	@findex fftw_threads_init
		3377
		3378	This function, which should only be called once (probably in your
		3379	@code{main()} function), performs any one-time initialization required
		3380	to use threads on your system. It returns zero if successful, and a
		3381	non-zero value if there was an error (in which case, something is
		3382	seriously wrong and you should probably exit the program).
		3383
		3384	Third, when you want to actually compute the transform, you should use
		3385	one of the following transform routines instead of the ordinary FFTW
		3386	functions:
		3387
		3388	@example
		3389	fftw_threads(nthreads, plan, howmany, in, istride,
		3390	idist, out, ostride, odist);
		3391	@findex fftw_threads
		3392
		3393	fftw_threads_one(nthreads, plan, in, out);
		3394	@findex fftw_threads_one
		3395
		3396	fftwnd_threads(nthreads, plan, howmany, in, istride,
		3397	idist, out, ostride, odist);
		3398	@findex fftwnd_threads
		3399
		3400	fftwnd_threads_one(nthreads, plan, in, out);
		3401	@findex fftwnd_threads_one
		3402
		3403	rfftw_threads(nthreads, plan, howmany, in, istride,
		3404	idist, out, ostride, odist);
		3405	@findex rfftw_threads
		3406
		3407	rfftw_threads_one(nthreads, plan, in, out);
		3408	@findex rfftw_threads_one
		3409
		3410	rfftwnd_threads_real_to_complex(nthreads, plan, howmany, in,
		3411	istride, idist, out, ostride, odist);
		3412	@findex rfftwnd_threads_real_to_complex
		3413
		3414	rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);
		3415	@findex rfftwnd_threads_one_real_to_complex
		3416
		3417	rfftwnd_threads_complex_to_real(nthreads, plan, howmany, in,
		3418	istride, idist, out, ostride, odist);
		3419	@findex rfftwnd_threads_complex_to_real
		3420
		3421	rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);
		3422	@findex rfftwnd_threads_one_real_to_complex
		3423
		3424	rfftwnd_threads_one_complex_to_real(nthreads, plan, in, out);
		3425	@findex rfftwnd_threads_one_complex_to_real
		3426	@end example
		3427
		3428	All of these routines take exactly the same arguments and have exactly
		3429	the same effects as their uniprocessor counterparts (i.e. without the
		3430	@samp{@code{_threads}}) @emph{except} that they take one extra
		3431	parameter, @code{nthreads} (of type @code{int}), before the normal
		3432	parameters.@footnote{There is one exception: when performing
		3433	one-dimensional in-place transforms, the @code{out} parameter is always
		3434	ignored by the multi-threaded routines, instead of being used as a
		3435	workspace if it is non-@code{NULL} as in the uniprocessor routines. The
		3436	multi-threaded routines always allocate their own workspace (the size of
		3437	which depends upon the number of threads).} The @code{nthreads}
		3438	parameter specifies the number of threads of execution to use when
		3439	performing the transform (actually, the maximum number of threads).
		3440	@cindex number of threads
		3441
		3442	For example, to parallelize a single one-dimensional transform of
		3443	complex data, instead of calling the uniprocessor @code{fftw_one(plan,
		3444	in, out)}, you would call @code{fftw_threads_one(nthreads, plan, in,
		3445	out)}. Passing an @code{nthreads} of @code{1} means to use only one
		3446	thread (the main thread), and is equivalent to calling the uniprocessor
		3447	routine. Passing an @code{nthreads} of @code{2} means that the
		3448	transform is potentially parallelized over two threads (and two
		3449	processors, if you have them), and so on.
		3450
		3451	These are the only changes you need to make to your source code. Calls
		3452	to all other FFTW routines (plan creation, destruction, wisdom,
		3453	etcetera) are not parallelized and remain the same. (The same plans and
		3454	wisdom are used by both uniprocessor and multi-threaded transforms.)
		3455	Your arrays are allocated and formatted in the same way, and so on.
		3456
		3457	Programs using the parallel complex transforms should be linked with
		3458	@code{-lfftw_threads -lfftw -lm} on Unix. Programs using the parallel
		3459	real transforms should be linked with @code{-lrfftw_threads
		3460	-lfftw_threads -lrfftw -lfftw -lm}. You will also need to link with
		3461	whatever library is responsible for threads on your system
		3462	(e.g. @code{-lpthread} on Linux).
		3463	@cindex linking on Unix
		3464
		3465	@c -------------------------------------------------------
		3466	@node How Many Threads to Use?, Using Multi-threaded FFTW in a Multi-threaded Program, Usage of Multi-threaded FFTW, Multi-threaded FFTW
		3467	@subsection How Many Threads to Use?
		3468
		3469	@cindex number of threads
		3470	There is a fair amount of overhead involved in spawning and synchronizing
		3471	threads, so the optimal number of threads to use depends upon the size
		3472	of the transform as well as on the number of processors you have.
		3473
		3474	As a general rule, you don't want to use more threads than you have
		3475	processors. (Using more threads will work, but there will be extra
		3476	overhead with no benefit.) In fact, if the problem size is too small,
		3477	you may want to use fewer threads than you have processors.
		3478
		3479	You will have to experiment with your system to see what level of
		3480	parallelization is best for your problem size. Useful tools to help you
		3481	do this are the test programs that are automatically compiled along with
		3482	the threads libraries, @code{fftw_threads_test} and
		3483	@code{rfftw_threads_test} (in the @code{threads} subdirectory). These
		3484	@pindex fftw_threads_test
		3485	@pindex rfftw_threads_test
		3486	take the same arguments as the other FFTW test programs (see
		3487	@code{tests/README}), except that they also take the number of threads
		3488	to use as a first argument, and report the parallel speedup in speed
		3489	tests. For example,
		3490
		3491	@example
		3492	fftw_threads_test 2 -s 128x128
		3493	@end example
		3494
		3495	will benchmark complex 128x128 transforms using two threads and report
		3496	the speedup relative to the uniprocessor transform.
		3497	@cindex benchmark
		3498
		3499	For instance, on a 4-processor 200MHz Pentium Pro system running Linux
		3500	2.2.0, we found that the "crossover" point at which 2 threads became
		3501	beneficial for complex transforms was about 4k points, while 4 threads
		3502	became beneficial at 8k points.
		3503
		3504	@c -------------------------------------------------------
		3505	@node Using Multi-threaded FFTW in a Multi-threaded Program, Tips for Optimal Threading, How Many Threads to Use?, Multi-threaded FFTW
		3506	@subsection Using Multi-threaded FFTW in a Multi-threaded Program
		3507
		3508	@cindex thread safety
		3509	It is perfectly possible to use the multi-threaded FFTW routines from a
		3510	multi-threaded program (e.g. have multiple threads computing
		3511	multi-threaded transforms simultaneously). If you have the processors,
		3512	more power to you! However, the same restrictions apply as for the
		3513	uniprocessor FFTW routines (@pxref{Thread safety}). In particular, you
		3514	should recall that you may not create or destroy plans in parallel.
		3515
		3516	@c -------------------------------------------------------
		3517	@node Tips for Optimal Threading, , Using Multi-threaded FFTW in a Multi-threaded Program, Multi-threaded FFTW
		3518	@subsection Tips for Optimal Threading
		3519
		3520	Not all transforms are equally well-parallelized by the multi-threaded
		3521	FFTW routines. (This is merely a consequence of laziness on the part of
		3522	the implementors, and is not inherent to the algorithms employed.)
		3523	Mainly, the limitations are in the parallel one-dimensional transforms.
		3524	The things to avoid if you want optimal parallelization are as follows:
		3525
		3526	@subsection Parallelization deficiencies in one-dimensional transforms
		3527
		3528	@itemize @bullet
		3529
		3530	@item
		3531	Large prime factors can sometimes parallelize poorly. Of course, you
		3532	should avoid these anyway if you want high performance.
		3533
		3534	@item
		3535	@cindex in-place transform
		3536	Single in-place transforms don't parallelize completely. (Multiple
		3537	in-place transforms, i.e. @code{howmany > 1}, are fine.) Again, you
		3538	should avoid these in any case if you want high performance, as they
		3539	require transforming to a scratch array and copying back.
		3540
		3541	@item
		3542	Single real-complex (@code{rfftw}) transforms don't parallelize
		3543	completely. This is unfortunate, but parallelizing this correctly would
		3544	have involved a lot of extra code (and a much larger library). You
		3545	still get some benefit from additional processors, but if you have a
		3546	very large number of processors you will probably be better off using
		3547	the parallel complex (@code{fftw}) transforms. Note that
		3548	multi-dimensional real transforms or multiple one-dimensional real
		3549	transforms are fine.
		3550
		3551	@end itemize
		3552
		3553	@c ------------------------------------------------------------
		3554	@node MPI FFTW, , Multi-threaded FFTW, Parallel FFTW
		3555	@section MPI FFTW
		3556
		3557	@cindex MPI
		3558	This section describes the MPI FFTW routines for distributed-memory (and
		3559	shared-memory) machines supporting MPI (Message Passing Interface). The
		3560	MPI routines are significantly different from the ordinary FFTW because
		3561	the transform data here are @emph{distributed} over multiple processes,
		3562	so that each process gets only a portion of the array.
		3563	@cindex distributed memory
		3564	Currently, multi-dimensional transforms of both real and complex data,
		3565	as well as one-dimensional transforms of complex data, are supported.
		3566
		3567	@menu
		3568	* MPI FFTW Installation::
		3569	* Usage of MPI FFTW for Complex Multi-dimensional Transforms::
		3570	* MPI Data Layout::
		3571	* Usage of MPI FFTW for Real Multi-dimensional Transforms::
		3572	* Usage of MPI FFTW for Complex One-dimensional Transforms::
		3573	* MPI Tips::
		3574	@end menu
		3575
		3576	@c -------------------------------------------------------
		3577	@node MPI FFTW Installation, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW, MPI FFTW
		3578	@subsection MPI FFTW Installation
		3579
		3580	The FFTW MPI library code is all located in the @code{mpi} subdirectoy
		3581	of the FFTW package (along with source code for test programs). On Unix
		3582	systems, the FFTW MPI libraries and header files can be automatically
		3583	configured, compiled, and installed along with the uniprocessor FFTW
		3584	libraries simply by including @code{--enable-mpi} in the flags to the
		3585	@code{configure} script (@pxref{Installation on Unix}).
		3586	@fpindex configure
		3587
		3588	The only requirement of the FFTW MPI code is that you have the standard
		3589	MPI 1.1 (or later) libraries and header files installed on your system.
		3590	A free implementation of MPI is available from
		3591	@uref{http://www-unix.mcs.anl.gov/mpi/mpich/,the MPICH home page}.
		3592
		3593	Previous versions of the FFTW MPI routines have had an unfortunate
		3594	tendency to expose bugs in MPI implementations. The current version has
		3595	been largely rewritten, and hopefully avoids some of the problems. If
		3596	you run into difficulties, try passing the optional workspace to
		3597	@code{(r)fftwnd_mpi} (see below), as this allows us to use the standard
		3598	(and hopefully well-tested) @code{MPI_Alltoall} primitive for
		3599	@ffindex MPI_Alltoall
		3600	communications. Please let us know (@email{fftw@@theory.lcs.mit.edu})
		3601	how things work out.
		3602
		3603	@pindex fftw_mpi_test
		3604	@pindex rfftw_mpi_test
		3605	Several test programs are included in the @code{mpi} directory. The
		3606	ones most useful to you are probably the @code{fftw_mpi_test} and
		3607	@code{rfftw_mpi_test} programs, which are run just like an ordinary MPI
		3608	program and accept the same parameters as the other FFTW test programs
		3609	(c.f. @code{tests/README}). For example, @code{mpirun @i{...params...}
		3610	fftw_mpi_test -r 0} will run non-terminating complex-transform
		3611	correctness tests of random dimensions. They can also do performance
		3612	benchmarks.
		3613
		3614	@c -------------------------------------------------------
		3615	@node Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI Data Layout, MPI FFTW Installation, MPI FFTW
		3616	@subsection Usage of MPI FFTW for Complex Multi-dimensional Transforms
		3617
		3618	Usage of the MPI FFTW routines is similar to that of the uniprocessor
		3619	FFTW. We assume that the reader already understands the usage of the
		3620	uniprocessor FFTW routines, described elsewhere in this manual. Some
		3621	familiarity with MPI is also helpful.
		3622
		3623	A typical program performing a complex two-dimensional MPI transform
		3624	might look something like:
		3625
		3626	@example
		3627	#include <fftw_mpi.h>
		3628
		3629	int main(int argc, char **argv)
		3630	@{
		3631	const int NX = ..., NY = ...;
		3632	fftwnd_mpi_plan plan;
		3633	fftw_complex *data;
		3634
		3635	MPI_Init(&argc,&argv);
		3636
		3637	plan = fftw2d_mpi_create_plan(MPI_COMM_WORLD,
		3638	NX, NY,
		3639	FFTW_FORWARD, FFTW_ESTIMATE);
		3640
		3641	...allocate and initialize data...
		3642
		3643	fftwnd_mpi(p, 1, data, NULL, FFTW_NORMAL_ORDER);
		3644
		3645	...
		3646
		3647	fftwnd_mpi_destroy_plan(plan);
		3648	MPI_Finalize();
		3649	@}
		3650	@end example
		3651
		3652	The calls to @code{MPI_Init} and @code{MPI_Finalize} are required in all
		3653	@ffindex MPI_Init
		3654	@ffindex MPI_Finalize
		3655	MPI programs; see the @uref{http://www.mcs.anl.gov/mpi/,MPI home page}
		3656	for more information. Note that all of your processes run the program
		3657	in parallel, as a group; there is no explicit launching of
		3658	threads/processes in an MPI program.
		3659
		3660	@cindex plan
		3661	As in the ordinary FFTW, the first thing we do is to create a plan (of
		3662	type @code{fftwnd_mpi_plan}), using:
		3663
		3664	@example
		3665	fftwnd_mpi_plan fftw2d_mpi_create_plan(MPI_Comm comm,
		3666	int nx, int ny,
		3667	fftw_direction dir, int flags);
		3668	@end example
		3669	@findex fftw2d_mpi_create_plan
		3670	@tindex fftwnd_mpi_plan
		3671
		3672	Except for the first argument, the parameters are identical to those of
		3673	@code{fftw2d_create_plan}. (There are also analogous
		3674	@code{fftwnd_mpi_create_plan} and @code{fftw3d_mpi_create_plan}
		3675	functions. Transforms of any rank greater than one are supported.)
		3676	@findex fftwnd_mpi_create_plan
		3677	@findex fftw3d_mpi_create_plan
		3678	The first argument is an MPI @dfn{communicator}, which specifies the
		3679	group of processes that are to be involved in the transform; the
		3680	standard constant @code{MPI_COMM_WORLD} indicates all available
		3681	processes.
		3682	@fcindex MPI_COMM_WORLD
		3683
		3684	Next, one has to allocate and initialize the data. This is somewhat
		3685	tricky, because the transform data is distributed across the processes
		3686	involved in the transform. It is discussed in detail by the next
		3687	section (@pxref{MPI Data Layout}).
		3688
		3689	The actual computation of the transform is performed by the function
		3690	@code{fftwnd_mpi}, which differs somewhat from its uniprocessor
		3691	equivalent and is described by:
		3692
		3693	@example
		3694	void fftwnd_mpi(fftwnd_mpi_plan p,
		3695	int n_fields,
		3696	fftw_complex local_data, fftw_complex work,
		3697	fftwnd_mpi_output_order output_order);
		3698	@end example
		3699	@findex fftwnd_mpi
		3700
		3701	There are several things to notice here:
		3702
		3703	@itemize @bullet
		3704
		3705	@item
		3706	@cindex in-place transform
		3707	First of all, all @code{fftw_mpi} transforms are in-place: the output is
		3708	in the @code{local_data} parameter, and there is no need to specify
		3709	@code{FFTW_IN_PLACE} in the plan flags.
		3710
		3711	@item
		3712	@cindex n_fields
		3713	@cindex stride
		3714	The MPI transforms also only support a limited subset of the
		3715	@code{howmany}/@code{stride}/@code{dist} functionality of the
		3716	uniprocessor routines: the @code{n_fields} parameter is equivalent to
		3717	@code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}.
		3718	(Conceptually, the @code{n_fields} parameter allows you to transform an
		3719	array of contiguous vectors, each with length @code{n_fields}.)
		3720	@code{n_fields} is @code{1} if you are only transforming a single,
		3721	ordinary array.
		3722
		3723	@item
		3724	The @code{work} parameter is an optional workspace. If it is not
		3725	@code{NULL}, it should be exactly the same size as the @code{local_data}
		3726	array. If it is provided, FFTW is able to use the built-in
		3727	@code{MPI_Alltoall} primitive for (often) greater efficiency at the
		3728	@ffindex MPI_Alltoall
		3729	expense of extra storage space.
		3730
		3731	@item
		3732	Finally, the last parameter specifies whether the output data has the
		3733	same ordering as the input data (@code{FFTW_NORMAL_ORDER}), or if it is
		3734	transposed (@code{FFTW_TRANSPOSED_ORDER}). Leaving the data transposed
		3735	@ctindex FFTW_NORMAL_ORDER
		3736	@ctindex FFTW_TRANSPOSED_ORDER
		3737	results in significant performance improvements due to a saved
		3738	communication step (needed to un-transpose the data). Specifically, the
		3739	first two dimensions of the array are transposed, as is described in
		3740	more detail by the next section.
		3741
		3742	@end itemize
		3743
		3744	@cindex normalization
		3745	The output of @code{fftwnd_mpi} is identical to that of the
		3746	corresponding uniprocessor transform. In particular, you should recall
		3747	our conventions for normalization and the sign of the transform
		3748	exponent.
		3749
		3750	The same plan can be used to compute many transforms of the same size.
		3751	After you are done with it, you should deallocate it by calling
		3752	@code{fftwnd_mpi_destroy_plan}.
		3753	@findex fftwnd_mpi_destroy_plan
		3754
		3755	@cindex blocking
		3756	@ffindex MPI_Barrier
		3757	@b{Important:} The FFTW MPI routines must be called in the same order by
		3758	all processes involved in the transform. You should assume that they
		3759	all are blocking, as if each contained a call to @code{MPI_Barrier}.
		3760
		3761	Programs using the FFTW MPI routines should be linked with
		3762	@code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries
		3763	are required for MPI.
		3764	@cindex linking on Unix
		3765
		3766	@c -------------------------------------------------------
		3767	@node MPI Data Layout, Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex Multi-dimensional Transforms, MPI FFTW
		3768	@subsection MPI Data Layout
		3769
		3770	@cindex distributed memory
		3771	@cindex distributed array format
		3772	The transform data used by the MPI FFTW routines is @dfn{distributed}: a
		3773	distinct portion of it resides with each process involved in the
		3774	transform. This allows the transform to be parallelized, for example,
		3775	over a cluster of workstations, each with its own separate memory, so
		3776	that you can take advantage of the total memory of all the processors
		3777	you are parallelizing over.
		3778
		3779	In particular, the array is divided according to the rows (first
		3780	dimension) of the data: each process gets a subset of the rows of the
		3781	@cindex slab decomposition
		3782	data. (This is sometimes called a ``slab decomposition.'') One
		3783	consequence of this is that you can't take advantage of more processors
		3784	than you have rows (e.g. @code{64x64x64} matrix can at most use 64
		3785	processors). This isn't usually much of a limitation, however, as each
		3786	processor needs a fair amount of data in order for the
		3787	parallel-computation benefits to outweight the communications costs.
		3788
		3789	Below, the first dimension of the data will be referred to as
		3790	@samp{@code{x}} and the second dimension as @samp{@code{y}}.
		3791
		3792	FFTW supplies a routine to tell you exactly how much data resides on the
		3793	current process:
		3794
		3795	@example
		3796	void fftwnd_mpi_local_sizes(fftwnd_mpi_plan p,
		3797	int *local_nx,
		3798	int *local_x_start,
		3799	int *local_ny_after_transpose,
		3800	int *local_y_start_after_transpose,
		3801	int *total_local_size);
		3802	@end example
		3803	@findex fftwnd_mpi_local_sizes
		3804
		3805	Given a plan @code{p}, the other parameters of this routine are set to
		3806	values describing the required data layout, described below.
		3807
		3808	@code{total_local_size} is the number of @code{fftw_complex} elements
		3809	that you must allocate for your local data (and workspace, if you
		3810	choose). (This value should, of course, be multiplied by
		3811	@code{n_fields} if that parameter to @code{fftwnd_mpi} is not @code{1}.)
		3812
		3813	The data on the current process has @code{local_nx} rows, starting at
		3814	row @code{local_x_start}. If @code{fftwnd_mpi} is called with
		3815	@code{FFTW_TRANSPOSED_ORDER} output, then @code{y} will be the first
		3816	dimension of the output, and the local @code{y} extent will be given by
		3817	@code{local_ny_after_transpose} and
		3818	@code{local_y_start_after_transpose}. Otherwise, the output has the
		3819	same dimensions and layout as the input.
		3820
		3821	For instance, suppose you want to transform three-dimensional data of
		3822	size @code{nx x ny x nz}. Then, the current process will store a subset
		3823	of this data, of size @code{local_nx x ny x nz}, where the @code{x}
		3824	indices correspond to the range @code{local_x_start} to
		3825	@code{local_x_start+local_nx-1} in the ``real'' (i.e. logical) array.
		3826	If @code{fftwnd_mpi} is called with @code{FFTW_TRANSPOSED_ORDER} output,
		3827	@ctindex FFTW_TRANSPOSED_ORDER
		3828	then the result will be a @code{ny x nx x nz} array, of which a
		3829	@code{local_ny_after_transpose x nx x nz} subset is stored on the
		3830	current process (corresponding to @code{y} values starting at
		3831	@code{local_y_start_after_transpose}).
		3832
		3833	The following is an example of allocating such a three-dimensional array
		3834	array (@code{local_data}) before the transform and initializing it to
		3835	some function @code{f(x,y,z)}:
		3836
		3837	@example
		3838	fftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start,
		3839	&local_ny_after_transpose,
		3840	&local_y_start_after_transpose,
		3841	&total_local_size);
		3842
		3843	local_data = (fftw_complex) malloc(sizeof(fftw_complex)
		3844	total_local_size);
		3845
		3846	for (x = 0; x < local_nx; ++x)
		3847	for (y = 0; y < ny; ++y)
		3848	for (z = 0; z < nz; ++z)
		3849	local_data[(xny + y)nz + z]
		3850	= f(x + local_x_start, y, z);
		3851	@end example
		3852
		3853	Some important things to remember:
		3854
		3855	@itemize @bullet
		3856
		3857	@item
		3858	Although the local data is of dimensions @code{local_nx x ny x nz} in
		3859	the above example, do @emph{not} allocate the array to be of size
		3860	@code{local_nxnynz}. Use @code{total_local_size} instead.
		3861
		3862	@item
		3863	The amount of data on each process will not necessarily be the same; in
		3864	fact, @code{local_nx} may even be zero for some processes. (For
		3865	example, suppose you are doing a @code{6x6} transform on four
		3866	processors. There is no way to effectively use the fourth processor in
		3867	a slab decomposition, so we leave it empty. Proof left as an exercise
		3868	for the reader.)
		3869
		3870	@item
		3871	@cindex row-major
		3872	All arrays are, of course, in row-major order (@pxref{Multi-dimensional
		3873	Array Format}).
		3874
		3875	@item
		3876	If you want to compute the inverse transform of the output of
		3877	@code{fftwnd_mpi}, the dimensions of the inverse transform are given by
		3878	the dimensions of the output of the forward transform. For example, if
		3879	you are using @code{FFTW_TRANSPOSED_ORDER} output in the above example,
		3880	then the inverse plan should be created with dimensions @code{ny x nx x
		3881	nz}.
		3882
		3883	@item
		3884	The data layout only depends upon the dimensions of the array, not on
		3885	the plan, so you are guaranteed that different plans for the same size
		3886	(or inverse plans) will use the same (consistent) data layouts.
		3887
		3888	@end itemize
		3889
		3890	@c -------------------------------------------------------
		3891	@node Usage of MPI FFTW for Real Multi-dimensional Transforms, Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Data Layout, MPI FFTW
		3892	@subsection Usage of MPI FFTW for Real Multi-dimensional Transforms
		3893
		3894	MPI transforms specialized for real data are also available, similiar to
		3895	the uniprocessor @code{rfftwnd} transforms. Just as in the uniprocessor
		3896	case, the real-data MPI functions gain roughly a factor of two in speed
		3897	(and save a factor of two in space) at the expense of more complicated
		3898	data formats in the calling program. Before reading this section, you
		3899	should definitely understand how to call the uniprocessor @code{rfftwnd}
		3900	functions and also the complex MPI FFTW functions.
		3901
		3902	The following is an example of a program using @code{rfftwnd_mpi}. It
		3903	computes the size @code{nx x ny x nz} transform of a real function
		3904	@code{f(x,y,z)}, multiplies the imaginary part by @code{2} for fun, then
		3905	computes the inverse transform. (We'll also use
		3906	@code{FFTW_TRANSPOSED_ORDER} output for the transform, and additionally
		3907	supply the optional workspace parameter to @code{rfftwnd_mpi}, just to
		3908	add a little spice.)
		3909
		3910	@example
		3911	#include <rfftw_mpi.h>
		3912
		3913	int main(int argc, char **argv)
		3914	@{
		3915	const int nx = ..., ny = ..., nz = ...;
		3916	int local_nx, local_x_start, local_ny_after_transpose,
		3917	local_y_start_after_transpose, total_local_size;
		3918	int x, y, z;
		3919	rfftwnd_mpi_plan plan, iplan;
		3920	fftw_real data, work;
		3921	fftw_complex *cdata;
		3922
		3923	MPI_Init(&argc,&argv);
		3924
		3925	/* create the forward and backward plans: */
		3926	plan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD,
		3927	nx, ny, nz,
		3928	FFTW_REAL_TO_COMPLEX,
		3929	FFTW_ESTIMATE);
		3930	@findex rfftw3d_mpi_create_plan
		3931	iplan = rfftw3d_mpi_create_plan(MPI_COMM_WORLD,
		3932	/* dim.'s of REAL data --> */ nx, ny, nz,
		3933	FFTW_COMPLEX_TO_REAL,
		3934	FFTW_ESTIMATE);
		3935
		3936	rfftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start,
		3937	&local_ny_after_transpose,
		3938	&local_y_start_after_transpose,
		3939	&total_local_size);
		3940	@findex rfftwnd_mpi_local_sizes
		3941
		3942	data = (fftw_real) malloc(sizeof(fftw_real) total_local_size);
		3943
		3944	/* workspace is the same size as the data: */
		3945	work = (fftw_real) malloc(sizeof(fftw_real) total_local_size);
		3946
		3947	/* initialize data to f(x,y,z): */
		3948	for (x = 0; x < local_nx; ++x)
		3949	for (y = 0; y < ny; ++y)
		3950	for (z = 0; z < nz; ++z)
		3951	data[(xny + y) (2*(nz/2+1)) + z]
		3952	= f(x + local_x_start, y, z);
		3953
		3954	/* Now, compute the forward transform: */
		3955	rfftwnd_mpi(plan, 1, data, work, FFTW_TRANSPOSED_ORDER);
		3956	@findex rfftwnd_mpi
		3957
		3958	/* the data is now complex, so typecast a pointer: */
		3959	cdata = (fftw_complex*) data;
		3960
		3961	/* multiply imaginary part by 2, for fun:
		3962	(note that the data is transposed) */
		3963	for (y = 0; y < local_ny_after_transpose; ++y)
		3964	for (x = 0; x < nx; ++x)
		3965	for (z = 0; z < (nz/2+1); ++z)
		3966	cdata[(ynx + x) (nz/2+1) + z].im
		3967	*= 2.0;
		3968
		3969	/* Finally, compute the inverse transform; the result
		3970	is transposed back to the original data layout: */
		3971	rfftwnd_mpi(iplan, 1, data, work, FFTW_TRANSPOSED_ORDER);
		3972
		3973	free(data);
		3974	free(work);
		3975	rfftwnd_mpi_destroy_plan(plan);
		3976	@findex rfftwnd_mpi_destroy_plan
		3977	rfftwnd_mpi_destroy_plan(iplan);
		3978	MPI_Finalize();
		3979	@}
		3980	@end example
		3981
		3982	There's a lot of stuff in this example, but it's all just what you would
		3983	have guessed, right? We replaced all the @code{fftwnd_mpi*} functions
		3984	by @code{rfftwnd_mpi*}, but otherwise the parameters were pretty much
		3985	the same. The data layout distributed among the processes just like for
		3986	the complex transforms (@pxref{MPI Data Layout}), but in addition the
		3987	final dimension is padded just like it is for the uniprocessor in-place
		3988	real transforms (@pxref{Array Dimensions for Real Multi-dimensional
		3989	Transforms}).
		3990	@cindex padding
		3991	In particular, the @code{z} dimension of the real input data is padded
		3992	to a size @code{2*(nz/2+1)}, and after the transform it contains
		3993	@code{nz/2+1} complex values.
		3994	@cindex distributed array format
		3995	@cindex rfftwnd array format
		3996
		3997	Some other important things to know about the real MPI transforms:
		3998
		3999	@itemize @bullet
		4000
		4001	@item
		4002	As for the uniprocessor @code{rfftwnd_create_plan}, the dimensions
		4003	passed for the @code{FFTW_COMPLEX_TO_REAL} plan are those of the
		4004	@emph{real} data. In particular, even when @code{FFTW_TRANSPOSED_ORDER}
		4005	@ctindex FFTW_COMPLEX_TO_REAL
		4006	@ctindex FFTW_TRANSPOSED_ORDER
		4007	is used as in this case, the dimensions are those of the (untransposed)
		4008	real output, not the (transposed) complex input. (For the complex MPI
		4009	transforms, on the other hand, the dimensions are always those of the
		4010	input array.)
		4011
		4012	@item
		4013	The output ordering of the transform (@code{FFTW_TRANSPOSED_ORDER} or
		4014	@code{FFTW_TRANSPOSED_ORDER}) @emph{must} be the same for both forward
		4015	and backward transforms. (This is not required in the complex case.)
		4016
		4017	@item
		4018	@code{total_local_size} is the required size in @code{fftw_real} values,
		4019	not @code{fftw_complex} values as it is for the complex transforms.
		4020
		4021	@item
		4022	@code{local_ny_after_transpose} and @code{local_y_start_after_transpose}
		4023	describe the portion of the array after the transform; that is, they are
		4024	indices in the complex array for an @code{FFTW_REAL_TO_COMPLEX} transform
		4025	and in the real array for an @code{FFTW_COMPLEX_TO_REAL} transform.
		4026
		4027	@item
		4028	@code{rfftwnd_mpi} always expects @code{fftw_real*} array arguments, but
		4029	of course these pointers can refer to either real or complex arrays,
		4030	depending upon which side of the transform you are on. Just as for
		4031	in-place uniprocessor real transforms (and also in the example above),
		4032	this is most easily handled by typecasting to a complex pointer when
		4033	handling the complex data.
		4034
		4035	@item
		4036	As with the complex transforms, there are also
		4037	@code{rfftwnd_create_plan} and @code{rfftw2d_create_plan} functions, and
		4038	any rank greater than one is supported.
		4039	@findex rfftwnd_create_plan
		4040	@findex rfftw2d_create_plan
		4041
		4042	@end itemize
		4043
		4044	Programs using the MPI FFTW real transforms should link with
		4045	@code{-lrfftw_mpi -lfftw_mpi -lrfftw -lfftw -lm} on Unix.
		4046	@cindex linking on Unix
		4047
		4048	@c -------------------------------------------------------
		4049	@node Usage of MPI FFTW for Complex One-dimensional Transforms, MPI Tips, Usage of MPI FFTW for Real Multi-dimensional Transforms, MPI FFTW
		4050	@subsection Usage of MPI FFTW for Complex One-dimensional Transforms
		4051
		4052	The MPI FFTW also includes routines for parallel one-dimensional
		4053	transforms of complex data (only). Although the speedup is generally
		4054	worse than it is for the multi-dimensional routines,@footnote{The 1D
		4055	transforms require much more communication. All the communication in
		4056	our FFT routines takes the form of an all-to-all communication: the
		4057	multi-dimensional transforms require two all-to-all communications (or
		4058	one, if you use @code{FFTW_TRANSPOSED_ORDER}), while the one-dimensional
		4059	transforms require @emph{three} (or two, if you use scrambled input or
		4060	output).} these distributed-memory one-dimensional transforms are
		4061	especially useful for performing one-dimensional transforms that don't
		4062	fit into the memory of a single machine.
		4063
		4064	The usage of these routines is straightforward, and is similar to that
		4065	of the multi-dimensional MPI transform functions. You first include the
		4066	header @code{<fftw_mpi.h>} and then create a plan by calling:
		4067
		4068	@example
		4069	fftw_mpi_plan fftw_mpi_create_plan(MPI_Comm comm, int n,
		4070	fftw_direction dir, int flags);
		4071	@end example
		4072	@findex fftw_mpi_create_plan
		4073	@tindex fftw_mpi_plan
		4074
		4075	The last three arguments are the same as for @code{fftw_create_plan}
		4076	(except that all MPI transforms are automatically @code{FFTW_IN_PLACE}).
		4077	The first argument specifies the group of processes you are using, and
		4078	is usually @code{MPI_COMM_WORLD} (all processes).
		4079	@fcindex MPI_COMM_WORLD
		4080	A plan can be used for many transforms of the same size, and is
		4081	destroyed when you are done with it by calling
		4082	@code{fftw_mpi_destroy_plan(plan)}.
		4083	@findex fftw_mpi_destroy_plan
		4084
		4085	If you don't care about the ordering of the input or output data of the
		4086	transform, you can include @code{FFTW_SCRAMBLED_INPUT} and/or
		4087	@code{FFTW_SCRAMBLED_OUTPUT} in the @code{flags}.
		4088	@ctindex FFTW_SCRAMBLED_INPUT
		4089	@ctindex FFTW_SCRAMBLED_OUTPUT
		4090	@cindex flags
		4091	These save some communications at the expense of having the input and/or
		4092	output reordered in an undocumented way. For example, if you are
		4093	performing an FFT-based convolution, you might use
		4094	@code{FFTW_SCRAMBLED_OUTPUT} for the forward transform and
		4095	@code{FFTW_SCRAMBLED_INPUT} for the inverse transform.
		4096
		4097	The transform itself is computed by:
		4098
		4099	@example
		4100	void fftw_mpi(fftw_mpi_plan p, int n_fields,
		4101	fftw_complex local_data, fftw_complex work);
		4102	@end example
		4103	@findex fftw_mpi
		4104
		4105	@cindex n_fields
		4106	@code{n_fields}, as in @code{fftwnd_mpi}, is equivalent to
		4107	@code{howmany=n_fields}, @code{stride=n_fields}, and @code{dist=1}, and
		4108	should be @code{1} when you are computing the transform of a single
		4109	array. @code{local_data} contains the portion of the array local to the
		4110	current process, described below. @code{work} is either @code{NULL} or
		4111	an array exactly the same size as @code{local_data}; in the latter case,
		4112	FFTW can use the @code{MPI_Alltoall} communications primitive which is
		4113	(usually) faster at the expense of extra storage. Upon return,
		4114	@code{local_data} contains the portion of the output local to the
		4115	current process (see below).
		4116	@ffindex MPI_Alltoall
		4117
		4118	@cindex distributed array format
		4119	To find out what portion of the array is stored local to the current
		4120	process, you call the following routine:
		4121
		4122	@example
		4123	void fftw_mpi_local_sizes(fftw_mpi_plan p,
		4124	int local_n, int local_start,
		4125	int *local_n_after_transform,
		4126	int *local_start_after_transform,
		4127	int *total_local_size);
		4128	@end example
		4129	@findex fftw_mpi_local_sizes
		4130
		4131	@code{total_local_size} is the number of @code{fftw_complex} elements
		4132	you should actually allocate for @code{local_data} (and @code{work}).
		4133	@code{local_n} and @code{local_start} indicate that the current process
		4134	stores @code{local_n} elements corresponding to the indices
		4135	@code{local_start} to @code{local_start+local_n-1} in the ``real''
		4136	array. @emph{After the transform, the process may store a different
		4137	portion of the array.} The portion of the data stored on the process
		4138	after the transform is given by @code{local_n_after_transform} and
		4139	@code{local_start_after_transform}. This data is exactly the same as a
		4140	contiguous segment of the corresponding uniprocessor transform output
		4141	(i.e. an in-order sequence of sequential frequency bins).
		4142
		4143	Note that, if you compute both a forward and a backward transform of the
		4144	same size, the local sizes are guaranteed to be consistent. That is,
		4145	the local size after the forward transform will be the same as the local
		4146	size before the backward transform, and vice versa.
		4147
		4148	Programs using the FFTW MPI routines should be linked with
		4149	@code{-lfftw_mpi -lfftw -lm} on Unix, in addition to whatever libraries
		4150	are required for MPI.
		4151	@cindex linking on Unix
		4152
		4153	@c -------------------------------------------------------
		4154	@node MPI Tips, , Usage of MPI FFTW for Complex One-dimensional Transforms, MPI FFTW
		4155	@subsection MPI Tips
		4156
		4157	There are several things you should consider in order to get the best
		4158	performance out of the MPI FFTW routines.
		4159
		4160	@cindex load-balancing
		4161	First, if possible, the first and second dimensions of your data should
		4162	be divisible by the number of processes you are using. (If only one can
		4163	be divisible, then you should choose the first dimension.) This allows
		4164	the computational load to be spread evenly among the processes, and also
		4165	reduces the communications complexity and overhead. In the
		4166	one-dimensional transform case, the size of the transform should ideally
		4167	be divisible by the @emph{square} of the number of processors.
		4168
		4169	@ctindex FFTW_TRANSPOSED_ORDER
		4170	Second, you should consider using the @code{FFTW_TRANSPOSED_ORDER}
		4171	output format if it is not too burdensome. The speed gains from
		4172	communications savings are usually substantial.
		4173
		4174	Third, you should consider allocating a workspace for
		4175	@code{(r)fftw(nd)_mpi}, as this can often
		4176	(but not always) improve performance (at the cost of extra storage).
		4177
		4178	Fourth, you should experiment with the best number of processors to use
		4179	for your problem. (There comes a point of diminishing returns, when the
		4180	communications costs outweigh the computational benefits.@footnote{An
		4181	FFT is particularly hard on communications systems, as it requires an
		4182	@dfn{all-to-all} communication, which is more or less the worst possible
		4183	case.}) The @code{fftw_mpi_test} program can output helpful performance
		4184	benchmarks.
		4185	@pindex fftw_mpi_test
		4186	@cindex benchmark
		4187	It accepts the same parameters as the uniprocessor test programs
		4188	(c.f. @code{tests/README}) and is run like an ordinary MPI program. For
		4189	example, @code{mpirun -np 4 fftw_mpi_test -s 128x128x128} will benchmark
		4190	a @code{128x128x128} transform on four processors, reporting timings and
		4191	parallel speedups for all variants of @code{fftwnd_mpi} (transposed,
		4192	with workspace, etcetera). (Note also that there is the
		4193	@code{rfftw_mpi_test} program for the real transforms.)
		4194	@pindex rfftw_mpi_test
		4195
		4196
		4197	@c ************************************************************
		4198	@node Calling FFTW from Fortran, Installation and Customization, Parallel FFTW, Top
		4199	@chapter Calling FFTW from Fortran
		4200
		4201	@cindex Fortran-callable wrappers
		4202	The standard FFTW libraries include special wrapper functions that allow
		4203	Fortran programs to call FFTW subroutines. This chapter describes how
		4204	those functions may be employed to use FFTW from Fortran. We assume
		4205	here that the reader is already familiar with the usage of FFTW in C, as
		4206	described elsewhere in this manual.
		4207
		4208	In general, it is not possible to call C functions directly from
		4209	Fortran, due to Fortran's inability to pass arguments by value and also
		4210	because Fortran compilers typically expect identifiers to be mangled
		4211	@cindex compiler
		4212	somehow for linking. However, if C functions are written in a special
		4213	way, they @emph{are} callable from Fortran, and we have employed this
		4214	technique to create Fortran-callable ``wrapper'' functions around the
		4215	main FFTW routines. These wrapper functions are included in the FFTW
		4216	libraries by default, unless a Fortran compiler isn't found on your
		4217	system or @code{--disable-fortran} is included in the @code{configure}
		4218	flags.
		4219
		4220	As a result, calling FFTW from Fortran requires little more than
		4221	appending @samp{@code{_f77}} to the function names and then linking
		4222	normally with the FFTW libraries. There are a few wrinkles, however, as
		4223	we shall discuss below.
		4224
		4225	@menu
		4226	* Wrapper Routines::
		4227	* FFTW Constants in Fortran::
		4228	* Fortran Examples::
		4229	@end menu
		4230
		4231	@c -------------------------------------------------------
		4232	@node Wrapper Routines, FFTW Constants in Fortran, Calling FFTW from Fortran, Calling FFTW from Fortran
		4233	@section Wrapper Routines
		4234
		4235	All of the uniprocessor and multi-threaded transform routines have
		4236	Fortran-callable wrappers, except for the wisdom import/export functions
		4237	(since it is not possible to exchange string and file arguments portably
		4238	with Fortran). The name of the wrapper routine is the same as that of
		4239	the corresponding C routine, but with @code{fftw/fftwnd/rfftw/rfftwnd}
		4240	replaced by @code{fftw_f77/fftwnd_f77/rfftw_f77/rfftwnd_f77}. For
		4241	example, in Fortran, instead of calling @code{fftw_one} you would call
		4242	@code{fftw_f77_one}.@footnote{Technically, Fortran 77 identifiers are
		4243	not allowed to have more than 6 characters, nor may they contain
		4244	underscores. Any compiler that enforces this limitation doesn't deserve
		4245	to link to FFTW.}
		4246	@findex fftw_f77_one
		4247	For the most part, all of the arguments to the functions are the same,
		4248	with the following exceptions:
		4249
		4250	@itemize @bullet
		4251
		4252	@item
		4253	Any function that returns a value (e.g. @code{fftw_create_plan}) is
		4254	converted into a subroutine. The return value is converted into an
		4255	additional (first) parameter of the wrapper subroutine. (The reason for
		4256	this is that some Fortran implementations seem to have trouble with C
		4257	function return values.)
		4258
		4259	@item
		4260	@cindex in-place transform
		4261	When performing one-dimensional @code{FFTW_IN_PLACE} transforms, you
		4262	don't have the option of passing @code{NULL} for the @code{out} argument
		4263	(since there is no way to pass @code{NULL} from Fortran). Therefore,
		4264	when performing such transforms, you @emph{must} allocate and pass a
		4265	contiguous scratch array of the same size as the transform. Note that
		4266	for in-place multi-dimensional (@code{(r)fftwnd}) transforms, the
		4267	@code{out} argument is ignored, so you can pass anything for that
		4268	parameter.
		4269
		4270	@item
		4271	@cindex column-major
		4272	The wrapper routines expect multi-dimensional arrays to be in
		4273	column-major order, which is the ordinary format of Fortran arrays.
		4274	They do this transparently and costlessly simply by reversing the order
		4275	of the dimensions passed to FFTW, but this has one important consequence
		4276	for multi-dimensional real-complex transforms, discussed below.
		4277
		4278	@item
		4279	@code{plan} variables (what would be of type @code{fftw_plan},
		4280	@code{rfftwnd_plan}, etcetera, in C), must be declared as a type that is
		4281	the same size as a pointer (address) on your machine. (Fortran has no
		4282	generic pointer type.) The Fortran @code{integer} type is usually the
		4283	same size as a pointer, but you need to be wary (especially on 64-bit
		4284	machines). (You could also use @code{integer*4} on a 32-bit machine and
		4285	@code{integer*8} on a 64-bit machine.) Ugh. (@code{g77} has a special
		4286	type, @code{integer(kind=7)}, that is defined to be the same size as a
		4287	pointer.)
		4288
		4289	@end itemize
		4290
		4291	@cindex floating-point precision
		4292	In general, you should take care to use Fortran data types that
		4293	correspond to (i.e. are the same size as) the C types used by FFTW. If
		4294	your C and Fortran compilers are made by the same vendor, the
		4295	correspondence is usually straightforward (i.e. @code{integer}
		4296	corresponds to @code{int}, @code{real} corresponds to @code{float},
		4297	etcetera). Such simple correspondences are assumed in the examples
		4298	below. The examples also assume that FFTW was compiled in
		4299	double precision (the default).
		4300
		4301	@c -------------------------------------------------------
		4302	@node FFTW Constants in Fortran, Fortran Examples, Wrapper Routines, Calling FFTW from Fortran
		4303	@section FFTW Constants in Fortran
		4304
		4305	When creating plans in FFTW, a number of constants are used to specify
		4306	options, such as @code{FFTW_FORWARD} or @code{FFTW_USE_WISDOM}. The
		4307	same constants must be used with the wrapper routines, but of course the
		4308	C header files where the constants are defined can't be incorporated
		4309	directly into Fortran code.
		4310
		4311	Instead, we have placed Fortran equivalents of the FFTW constant
		4312	definitions in the file @code{fortran/fftw_f77.i} of the FFTW package.
		4313	If your Fortran compiler supports a preprocessor, you can use that to
		4314	incorporate this file into your code whenever you need to call FFTW.
		4315	Otherwise, you will have to paste the constant definitions in directly.
		4316	They are:
		4317
		4318	@example
		4319	integer FFTW_FORWARD,FFTW_BACKWARD
		4320	parameter (FFTW_FORWARD=-1,FFTW_BACKWARD=1)
		4321
		4322	integer FFTW_REAL_TO_COMPLEX,FFTW_COMPLEX_TO_REAL
		4323	parameter (FFTW_REAL_TO_COMPLEX=-1,FFTW_COMPLEX_TO_REAL=1)
		4324
		4325	integer FFTW_ESTIMATE,FFTW_MEASURE
		4326	parameter (FFTW_ESTIMATE=0,FFTW_MEASURE=1)
		4327
		4328	integer FFTW_OUT_OF_PLACE,FFTW_IN_PLACE,FFTW_USE_WISDOM
		4329	parameter (FFTW_OUT_OF_PLACE=0)
		4330	parameter (FFTW_IN_PLACE=8,FFTW_USE_WISDOM=16)
		4331
		4332	integer FFTW_THREADSAFE
		4333	parameter (FFTW_THREADSAFE=128)
		4334	@end example
		4335
		4336	@cindex flags
		4337	In C, you combine different flags (like @code{FFTW_USE_WISDOM} and
		4338	@code{FFTW_MEASURE}) using the @samp{@code{\|}} operator; in Fortran you
		4339	should just use @samp{@code{+}}.
		4340
		4341	@c -------------------------------------------------------
		4342	@node Fortran Examples, , FFTW Constants in Fortran, Calling FFTW from Fortran
		4343	@section Fortran Examples
		4344
		4345	In C you might have something like the following to transform a
		4346	one-dimensional complex array:
		4347
		4348	@example
		4349	fftw_complex in[N], *out[N];
		4350	fftw_plan plan;
		4351
		4352	plan = fftw_create_plan(N,FFTW_FORWARD,FFTW_ESTIMATE);
		4353	fftw_one(plan,in,out);
		4354	fftw_destroy_plan(plan);
		4355	@end example
		4356
		4357	In Fortran, you use the following to accomplish the same thing:
		4358
		4359	@example
		4360	double complex in, out
		4361	dimension in(N), out(N)
		4362	integer plan
		4363
		4364	call fftw_f77_create_plan(plan,N,FFTW_FORWARD,FFTW_ESTIMATE)
		4365	call fftw_f77_one(plan,in,out)
		4366	call fftw_f77_destroy_plan(plan)
		4367	@end example
		4368	@findex fftw_f77_create_plan
		4369	@findex fftw_f77_one
		4370	@findex fftw_f77_destroy_plan
		4371
		4372	Notice how all routines are called as Fortran subroutines, and the plan
		4373	is returned via the first argument to @code{fftw_f77_create_plan}. To
		4374	do the same thing, but using 8 threads in parallel
		4375	(@pxref{Multi-threaded FFTW}), you would simply replace the call to
		4376	@code{fftw_f77_one} with:
		4377
		4378	@example
		4379	call fftw_f77_threads_one(8,plan,in,out)
		4380	@end example
		4381	@findex fftw_f77_threads_one
		4382
		4383	To transform a three-dimensional array in-place with C, you might do:
		4384
		4385	@example
		4386	fftw_complex arr[L][M][N];
		4387	fftwnd_plan plan;
		4388	int n[3] = @{L,M,N@};
		4389
		4390	plan = fftwnd_create_plan(3,n,FFTW_FORWARD,
		4391	FFTW_ESTIMATE \| FFTW_IN_PLACE);
		4392	fftwnd_one(plan, arr, 0);
		4393	fftwnd_destroy_plan(plan);
		4394	@end example
		4395
		4396	In Fortran, you would use this instead:
		4397
		4398	@example
		4399	double complex arr
		4400	dimension arr(L,M,N)
		4401	integer n
		4402	dimension n(3)
		4403	integer plan
		4404
		4405	n(1) = L
		4406	n(2) = M
		4407	n(3) = N
		4408	call fftwnd_f77_create_plan(plan,3,n,FFTW_FORWARD,
		4409	+ FFTW_ESTIMATE + FFTW_IN_PLACE)
		4410	call fftwnd_f77_one(plan, arr, 0)
		4411	call fftwnd_f77_destroy_plan(plan)
		4412	@end example
		4413	@findex fftwnd_f77_create_plan
		4414	@findex fftwnd_f77_one
		4415	@findex fftwnd_f77_destroy_plan
		4416
		4417	Instead of calling @code{fftwnd_f77_create_plan(plan,3,n,...)}, we could
		4418	also have called @code{fftw3d_f77_create_plan(plan,L,M,N,...)}.
		4419	@findex fftw3d_f77_create_plan
		4420
		4421	Note that we pass the array dimensions in the "natural" order; also
		4422	note that the last argument to @code{fftwnd_f77} is ignored since the
		4423	transform is @code{FFTW_IN_PLACE}.
		4424
		4425	To transform a one-dimensional real array in Fortran, you might do:
		4426
		4427	@example
		4428	double precision in, out
		4429	dimension in(N), out(N)
		4430	integer plan
		4431
		4432	call rfftw_f77_create_plan(plan,N,FFTW_REAL_TO_COMPLEX,
		4433	+ FFTW_ESTIMATE)
		4434	call rfftw_f77_one(plan,in,out)
		4435	call rfftw_f77_destroy_plan(plan)
		4436	@end example
		4437	@findex rfftw_f77_create_plan
		4438	@findex rfftw_f77_one
		4439	@findex rfftw_f77_destroy_plan
		4440
		4441	To transform a two-dimensional real array, out of place, you might use
		4442	the following:
		4443
		4444	@example
		4445	double precision in
		4446	double complex out
		4447	dimension in(M,N), out(M/2 + 1, N)
		4448	integer plan
		4449
		4450	call rfftw2d_f77_create_plan(plan,M,N,FFTW_REAL_TO_COMPLEX,
		4451	+ FFTW_ESTIMATE)
		4452	call rfftwnd_f77_one_real_to_complex(plan, in, out)
		4453	call rfftwnd_f77_destroy_plan(plan)
		4454	@end example
		4455	@findex rfftw2d_f77_create_plan
		4456	@findex rfftwnd_f77_one_real_to_complex
		4457	@findex rfftwnd_f77_destroy_plan
		4458
		4459	@b{Important:} Notice that it is the @emph{first} dimension of the
		4460	complex output array that is cut in half in Fortran, rather than the
		4461	last dimension as in C. This is a consequence of the wrapper routines
		4462	reversing the order of the array dimensions passed to FFTW so that the
		4463	Fortran program can use its ordinary column-major order.
		4464	@cindex column-major
		4465	@cindex rfftwnd array format
		4466
		4467
		4468	@c ************************************************************
		4469	@node Installation and Customization, Acknowledgments, Calling FFTW from Fortran, Top
		4470	@chapter Installation and Customization
		4471
		4472	This chapter describes the installation and customization of FFTW, the
		4473	latest version of which may be downloaded from
		4474	@uref{http://theory.lcs.mit.edu/~fftw, the FFTW home page}.
		4475
		4476	As distributed, FFTW makes very few assumptions about your system. All
		4477	you need is an ANSI C compiler (@code{gcc} is fine, although
		4478	vendor-provided compilers often produce faster code).
		4479	@cindex compiler
		4480	However, installation of FFTW is somewhat simpler if you have a Unix or
		4481	a GNU system, such as Linux. In this chapter, we first describe the
		4482	installation of FFTW on Unix and non-Unix systems. We then describe how
		4483	you can customize FFTW to achieve better performance. Specifically, you
		4484	can I) enable @code{gcc}/x86-specific hacks that improve performance on
		4485	Pentia and PentiumPro's; II) adapt FFTW to use the high-resolution clock
		4486	of your machine, if any; III) produce code (@emph{codelets}) to support
		4487	fast transforms of sizes that are not supported efficiently by the
		4488	standard FFTW distribution.
		4489	@cindex installation
		4490
		4491	@menu
		4492	* Installation on Unix::
		4493	* Installation on non-Unix Systems::
		4494	* Installing FFTW in both single and double precision::
		4495	* gcc and Pentium/PentiumPro hacks::
		4496	* Customizing the timer::
		4497	* Generating your own code::
		4498	@end menu
		4499
		4500	@node Installation on Unix, Installation on non-Unix Systems, Installation and Customization, Installation and Customization
		4501	@section Installation on Unix
		4502
		4503	FFTW comes with a @code{configure} program in the GNU style.
		4504	Installation can be as simple as:
		4505	@fpindex configure
		4506
		4507	@example
		4508	./configure
		4509	make
		4510	make install
		4511	@end example
		4512
		4513	This will build the uniprocessor complex and real transform libraries
		4514	along with the test programs. We strongly recommend that you use GNU
		4515	@code{make} if it is available; on some systems it is called
		4516	@code{gmake}. The ``@code{make install}'' command installs the fftw and
		4517	rfftw libraries in standard places, and typically requires root
		4518	privileges (unless you specify a different install directory with the
		4519	@code{--prefix} flag to @code{configure}). You can also type
		4520	``@code{make check}'' to put the FFTW test programs through their paces.
		4521	If you have problems during configuration or compilation, you may want
		4522	to run ``@code{make distclean}'' before trying again; this ensures that
		4523	you don't have any stale files left over from previous compilation
		4524	attempts.
		4525
		4526	The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
		4527	@cindex compiler flags
		4528	for a few systems. If your system is not known, the @code{configure}
		4529	script will print out a warning. @footnote{Each version of @code{cc}
		4530	seems to have its own magic incantation to get the fastest code most of
		4531	the time---you'd think that people would have agreed upon some
		4532	convention, e.g. "@code{-Omax}", by now.} In this case, you can compile
		4533	FFTW with the command
		4534	@example
		4535	make CFLAGS="<write your CFLAGS here>"
		4536	@end example
		4537	If you do find an optimal set of @code{CFLAGS} for your system, please
		4538	let us know what they are (along with the output of @code{config.guess})
		4539	so that we can include them in future releases.
		4540
		4541	The @code{configure} program supports all the standard flags defined by
		4542	the GNU Coding Standards; see the @code{INSTALL} file in FFTW or
		4543	@uref{http://www.gnu.org/prep/standards_toc.html, the GNU web page}.
		4544	Note especially @code{--help} to list all flags and
		4545	@code{--enable-shared} to create shared, rather than static, libraries.
		4546	@code{configure} also accepts a few FFTW-specific flags, particularly:
		4547
		4548	@itemize @bullet
		4549
		4550	@item
		4551	@cindex floating-point precision
		4552	@code{--enable-float} Produces a single-precision version of FFTW
		4553	(@code{float}) instead of the default double-precision (@code{double}).
		4554	@xref{Installing FFTW in both single and double precision}.
		4555
		4556	@item
		4557	@code{--enable-type-prefix} Adds a @samp{d} or @samp{s} prefix to all
		4558	installed libraries and header files to indicate the floating-point
		4559	precision. @xref{Installing FFTW in both single and double
		4560	precision}. (@code{--enable-type-prefix=<prefix>} lets you add an
		4561	arbitrary prefix.) By default, no prefix is used.
		4562
		4563	@item
		4564	@cindex threads
		4565	@code{--enable-threads} Enables compilation and installation of the FFTW
		4566	threads library (@pxref{Multi-threaded FFTW}), which provides a
		4567	simple interface to parallel transforms for SMP systems. (By default,
		4568	the threads routines are not compiled.)
		4569
		4570	@item
		4571	@cindex MPI
		4572	@code{--enable-mpi} Enables compilation and installation of the FFTW MPI
		4573	library (@pxref{MPI FFTW}), which provides parallel transforms for
		4574	distributed-memory systems with MPI. (By default, the MPI routines are
		4575	not compiled.)
		4576
		4577	@item
		4578	@cindex Fortran-callable wrappers
		4579	@code{--disable-fortran} Disables inclusion of Fortran-callable wrapper
		4580	routines (@pxref{Calling FFTW from Fortran}) in the standard FFTW
		4581	libraries. These wrapper routines increase the library size by only a
		4582	negligible amount, so they are included by default as long as the
		4583	@code{configure} script finds a Fortran compiler on your system.
		4584
		4585	@item
		4586	@code{--with-gcc} Enables the use of @code{gcc}. By default, FFTW uses
		4587	the vendor-supplied @code{cc} compiler if present. Unfortunately,
		4588	@code{gcc} produces slower code than @code{cc} on many systems.
		4589
		4590	@item
		4591	@code{--enable-i386-hacks} See below.
		4592
		4593	@item
		4594	@code{--enable-pentium-timer} See below.
		4595
		4596	@end itemize
		4597
		4598	To force @code{configure} to use a particular C compiler (instead of the
		4599	@cindex compiler
		4600	default, usually @code{cc}), set the environment variable @code{CC} to
		4601	the name of the desired compiler before running @code{configure}; you
		4602	may also need to set the flags via the variable @code{CFLAGS}.
		4603	@cindex compiler flags
		4604
		4605	@node Installation on non-Unix Systems, Installing FFTW in both single and double precision, Installation on Unix, Installation and Customization
		4606	@section Installation on non-Unix Systems
		4607
		4608	It is quite straightforward to install FFTW even on non-Unix systems
		4609	lacking the niceties of the @code{configure} script. The FFTW Home Page
		4610	may include some FFTW packages preconfigured for particular
		4611	systems/compilers, and also contains installation notes sent in by
		4612	@cindex compiler
		4613	users. All you really need to do, though, is to compile all of the
		4614	@code{.c} files in the appropriate directories of the FFTW package.
		4615	(You needn't worry about the many extraneous files lying around.)
		4616
		4617	For the complex transforms, compile all of the @code{.c} files in the
		4618	@code{fftw} directory and link them into a library. Similarly, for the
		4619	real transforms, compile all of the @code{.c} files in the @code{rfftw}
		4620	directory into a library. Note that these sources @code{#include}
		4621	various files in the @code{fftw} and @code{rfftw} directories, so you
		4622	may need to set up the @code{#include} paths for your compiler
		4623	appropriately. Be sure to enable the highest-possible level of
		4624	optimization in your compiler.
		4625
		4626	@cindex floating-point precision
		4627	By default, FFTW is compiled for double-precision transforms. To work
		4628	in single precision rather than double precision, @code{#define} the
		4629	symbol @code{FFTW_ENABLE_FLOAT} in @code{fftw.h} (in the @code{fftw}
		4630	directory) and (re)compile FFTW.
		4631
		4632	These libraries should be linked with any program that uses the
		4633	corresponding transforms. The required header files, @code{fftw.h} and
		4634	@code{rfftw.h}, are located in the @code{fftw} and @code{rfftw}
		4635	directories respectively; you may want to put them with the libraries,
		4636	or wherever header files normally go on your system.
		4637
		4638	FFTW includes test programs, @code{fftw_test} and @code{rfftw_test}, in
		4639	@pindex fftw_test
		4640	@pindex rfftw_test
		4641	the @code{tests} directory. These are compiled and linked like any
		4642	program using FFTW, except that they use additional header files located
		4643	in the @code{fftw} and @code{rfftw} directories, so you will need to set
		4644	your compiler @code{#include} paths appropriately. @code{fftw_test} is
		4645	compiled from @code{fftw_test.c} and @code{test_main.c}, while
		4646	@code{rfftw_test} is compiled from @code{rfftw_test.c} and
		4647	@code{test_main.c}. When you run these programs, you will be prompted
		4648	interactively for various possible tests to perform; see also
		4649	@code{tests/README} for more information.
		4650
		4651	@node Installing FFTW in both single and double precision, gcc and Pentium/PentiumPro hacks, Installation on non-Unix Systems, Installation and Customization
		4652	@section Installing FFTW in both single and double precision
		4653
		4654	@cindex floating-point precision
		4655	It is often useful to install both single- and double-precision versions
		4656	of the FFTW libraries on the same machine, and we provide a convenient
		4657	mechanism for achieving this on Unix systems.
		4658
		4659	@fpindex configure
		4660	When the @code{--enable-type-prefix} option of configure is used, the
		4661	FFTW libraries and header files are installed with a prefix of @samp{d}
		4662	or @samp{s}, depending upon whether you compiled in double or single
		4663	precision. Then, instead of linking your program with @code{-lrfftw
		4664	-lfftw}, for example, you would link with @code{-ldrfftw -ldfftw} to use
		4665	the double-precision version or with @code{-lsrfftw -lsfftw} to use the
		4666	single-precision version. Also, you would @code{#include}
		4667	@code{<drfftw.h>} or @code{<srfftw.h>} instead of @code{<rfftw.h>}, and
		4668	so on.
		4669
		4670	@emph{The names of FFTW functions, data types, and constants remain
		4671	unchanged!} You still call, for instance, @code{fftw_one} and not
		4672	@code{dfftw_one}. Only the names of header files and libraries are
		4673	modified. One consequence of this is that @emph{you @b{cannot} use both
		4674	the single- and double-precision FFTW libraries in the same program,
		4675	simultaneously,} as the function names would conflict.
		4676
		4677	So, to install both the single- and double-precision libraries on the
		4678	same machine, you would do:
		4679
		4680	@example
		4681	./configure --enable-type-prefix @i{[ other options ]}
		4682	make
		4683	make install
		4684	make clean
		4685	./configure --enable-float --enable-type-prefix @i{[ other options ]}
		4686	make
		4687	make install
		4688	@end example
		4689
		4690	@node gcc and Pentium/PentiumPro hacks, Customizing the timer, Installing FFTW in both single and double precision, Installation and Customization
		4691	@section @code{gcc} and Pentium/PentiumPro hacks
		4692	@cindex Pentium hack
		4693	The @code{configure} option @code{--enable-i386-hacks} enables specific
		4694	optimizations for @code{gcc} and Pentium/PentiumPro, which can
		4695	significantly improve performance of double-precision transforms.
		4696	Specifically, we have tested these hacks on Linux with @code{gcc}
		4697	2.[78] and versions of @code{egcs} since 1.0.3. These optimizations
		4698	only affect the performance, not the correctness of FFTW (i.e. it is
		4699	always safe to try them out).
		4700
		4701	These hacks provide a workaround to the incorrect alignment of local
		4702	@code{double} variables in @code{gcc}. The compiler aligns these
		4703	@cindex compiler
		4704	variables to multiples of 4 bytes, but execution is much faster (on
		4705	Pentium and PentiumPro) if @code{double}s are aligned to a multiple of 8
		4706	bytes. By carefully counting the number of variables allocated by the
		4707	compiler in performance-critical regions of the code, we have been able
		4708	to introduce dummy allocations (using @code{alloca}) that align the
		4709	stack properly. The hack depends crucially on the compiler flags that
		4710	are used. For example, it won't work without
		4711	@code{-fomit-frame-pointer}.
		4712
		4713	The @code{fftw_test} program outputs speed measurements that you can use
		4714	to see if these hacks are beneficial.
		4715	@pindex fftw_test
		4716	@cindex benchmark
		4717
		4718	The @code{configure} option @code{--enable-pentium-timer} enables the
		4719	use of the Pentium and PentiumPro cycle counter for timing purposes. In
		4720	order to get correct results, you must define @code{FFTW_CYCLES_PER_SEC}
		4721	in @code{fftw/config.h} to be the clock speed of your processor; the
		4722	resulting FFTW library will be nonportable. The use of this option is
		4723	deprecated. On serious operating systems (such as Linux), FFTW uses
		4724	@code{gettimeofday()}, which has enough resolution and is portable.
		4725	(Note that Win32 has its own high-resolution timing routines as well.
		4726	FFTW contains unsupported code to use these routines.)
		4727
		4728	@node Customizing the timer, Generating your own code, gcc and Pentium/PentiumPro hacks, Installation and Customization
		4729	@section Customizing the timer
		4730	@cindex timer, customization of
		4731
		4732	FFTW needs a reasonably-precise clock in order to find the optimal way
		4733	to compute a transform. On Unix systems, @code{configure} looks for
		4734	@code{gettimeofday} and other system-specific timers. If it does not
		4735	find any high resolution clock, it defaults to using the @code{clock()}
		4736	function, which is very portable, but forces FFTW to run for a long time
		4737	in order to get reliable measurements.
		4738	@ffindex gettimeofday
		4739	@ffindex clock
		4740
		4741	If your machine supports a high-resolution clock not recognized by FFTW,
		4742	it is therefore advisable to use it. You must edit
		4743	@code{fftw/fftw-int.h}. There are a few macros you must redefine. The
		4744	code is documented and should be self-explanatory. (By the way,
		4745	@code{fftw-int} stands for @code{fftw-internal}, but for some
		4746	inexplicable reason people are still using primitive systems with 8.3
		4747	filenames.)
		4748
		4749	Even if you don't install high-resolution timing code, we still
		4750	recommend that you look at the @code{FFTW_TIME_MIN} constant in
		4751	@ctindex FFTW_TIME_MIN
		4752	@code{fftw/fftw-int.h}. This constant holds the minimum time interval (in
		4753	seconds) required to get accurate timing measurements, and should be (at
		4754	least) several hundred times the resolution of your clock. The default
		4755	constants are on the conservative side, and may cause FFTW to take
		4756	longer than necessary when you create a plan. Set @code{FFTW_TIME_MIN}
		4757	to whatever is appropriate on your system (be sure to set the
		4758	@emph{right} @code{FFTW_TIME_MIN}@dots{}there are several definitions in
		4759	@code{fftw-int.h}, corresponding to different platforms and timers).
		4760
		4761	As an aid in checking the resolution of your clock, you can use the
		4762	@code{tests/fftw_test} program with the @code{-t} option
		4763	(c.f. @code{tests/README}). Remember, the mere fact that your clock
		4764	reports times in, say, picoseconds, does not mean that it is actually
		4765	@emph{accurate} to that resolution.
		4766
		4767	@node Generating your own code, , Customizing the timer, Installation and Customization
		4768	@section Generating your own code
		4769	@cindex Caml
		4770	@cindex ML
		4771	@cindex code generator
		4772
		4773	If you know that you will only use transforms of a certain size (say,
		4774	powers of @math{2}) and want to reduce the size of the library, you can
		4775	reconfigure FFTW to support only those sizes you are interested in. You
		4776	may even generate code to enable efficient transforms of a size not
		4777	supported by the default distribution. The default distribution
		4778	supports transforms of any size, but not all sizes are equally fast.
		4779	The default installation of FFTW is best at handling sizes of the form
		4780	@ifinfo
		4781	@math{2^a 3^b 5^c 7^d 11^e 13^f},
		4782	@end ifinfo
		4783	@iftex
		4784	@tex
		4785	$2^a 3^b 5^c 7^d 11^e 13^f$,
		4786	@end tex
		4787	@end iftex
		4788	@ifhtml
		4789	2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP>
		4790	11<SUP>e</SUP> 13<SUP>f</SUP>,
		4791	@end ifhtml
		4792	where @math{e+f} is either @math{0} or
		4793	@math{1}, and the other exponents are arbitrary. Other sizes are
		4794	computed by means of a slow, general-purpose routine. However, if you
		4795	have an application that requires fast transforms of size, say,
		4796	@code{17}, there is a way to generate specialized code to handle that.
		4797
		4798	The directory @code{gensrc} contains all the programs and scripts that
		4799	were used to generate FFTW. In particular, the program
		4800	@code{gensrc/genfft.ml} was used to generate the code that FFTW uses to
		4801	compute the transforms. We do not expect casual users to use it.
		4802	@code{genfft} is a rather sophisticated program that generates directed
		4803	acyclic graphs of FFT algorithms and performs algebraic simplifications
		4804	on them. @code{genfft} is written in Objective Caml, a dialect of ML.
		4805	Objective Caml is described at @uref{http://pauillac.inria.fr/ocaml/}
		4806	and can be downloaded from from @uref{ftp://ftp.inria.fr/lang/caml-light}.
		4807	@pindex genfft
		4808	@cindex Caml
		4809
		4810	If you have Objective Caml installed, you can type @code{sh
		4811	bootstrap.sh} in the top-level directory to re-generate the files. If
		4812	you change the @code{gensrc/config} file, you can optimize FFTW for
		4813	sizes that are not currently supported efficiently (say, 17 or 19).
		4814
		4815	We do not provide more details about the code-generation process, since
		4816	we do not expect that users will need to generate their own code.
		4817	However, feel free to contact us at @email{fftw@@theory.lcs.mit.edu} if
		4818	you are interested in the subject.
		4819
		4820	@cindex monadic programming
		4821	You might find it interesting to learn Caml and/or some modern
		4822	programming techniques that we used in the generator (including monadic
		4823	programming), especially if you heard the rumor that Java and
		4824	object-oriented programming are the latest advancement in the field.
		4825	The internal operation of the codelet generator is described in the
		4826	paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
		4827	available from the @uref{http://theory.lcs.mit.edu/~fftw,FFTW home page}
		4828	and will appear in the @cite{Proceedings of the 1999 ACM SIGPLAN
		4829	Conference on Programming Language Design and Implementation (PLDI)}.
		4830
		4831	@c ************************************************************
		4832	@node Acknowledgments, License and Copyright, Installation and Customization, Top
		4833	@chapter Acknowledgments
		4834
		4835	Matteo Frigo was supported in part by the Defense Advanced Research
		4836	Projects Agency (DARPA) under Grants N00014-94-1-0985 and
		4837	F30602-97-1-0270, and by a Digital Equipment Corporation Fellowship.
		4838	Steven G. Johnson was supported in part by a DoD NDSEG Fellowship, an
		4839	MIT Karl Taylor Compton Fellowship, and by the Materials Research
		4840	Science and Engineering Center program of the National Science
		4841	Foundation under award DMR-9400334.
		4842
		4843	Both authors were also supported in part by their respective
		4844	girlfriends, by the letters ``Q'' and ``R'', and by the number 12.
		4845	@cindex girlfriends
		4846
		4847	We are grateful to SUN Microsystems Inc.@ for its donation of a cluster
		4848	of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These machines
		4849	served as the primary platform for the development of earlier versions
		4850	of FFTW.
		4851
		4852	We thank Intel Corporation for donating a four-processor Pentium Pro
		4853	machine. We thank the Linux community for giving us a decent OS to run
		4854	on that machine.
		4855
		4856	The @code{genfft} program was written using Objective Caml, a dialect of
		4857	ML. Objective Caml is a small and elegant language developed by Xavier
		4858	Leroy. The implementation is available from @code{ftp.inria.fr} in the
		4859	directory @code{lang/caml-light}. We used versions 1.07 and 2.00 of the
		4860	software. In previous releases of FFTW, @code{genfft} was written in
		4861	Caml Light, by the same authors. An even earlier implementation of
		4862	@code{genfft} was written in Scheme, but Caml is definitely better for
		4863	this kind of application.
		4864	@pindex genfft
		4865	@cindex Caml
		4866	@cindex LISP
		4867
		4868	FFTW uses many tools from the GNU project, including @code{automake},
		4869	@code{texinfo}, and @code{libtool}.
		4870
		4871	Prof.@ Charles E.@ Leiserson of MIT provided continuous support and
		4872	encouragement. This program would not exist without him. Charles also
		4873	proposed the name ``codelets'' for the basic FFT blocks.
		4874
		4875	Prof.@ John D.@ Joannopoulos of MIT demonstrated continuing tolerance of
		4876	Steven's ``extra-curricular'' computer-science activities. Steven's
		4877	chances at a physics degree would not exist without him.
		4878
		4879	Andrew Sterian contributed the Windows timing code.
		4880
		4881	Didier Miras reported a bug in the test procedure used in FFTW 1.2. We
		4882	now use a completely different test algorithm by Funda Ergun that does
		4883	not require a separate FFT program to compare against.
		4884
		4885	Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
		4886	that help portability.
		4887
		4888	Ming-Chang Liu uncovered a well-hidden bug in the complex transforms of
		4889	FFTW 2.0 and supplied a patch to correct it.
		4890
		4891	The FFTW FAQ was written in @code{bfnn} (Bizarre Format With No Name)
		4892	and formatted using the tools developed by Ian Jackson for the Linux
		4893	FAQ.
		4894
		4895	@emph{We are especially thankful to all of our users for their
		4896	continuing support, feedback, and interest during our development of
		4897	FFTW.}
		4898
		4899	@c ************************************************************
		4900	@node License and Copyright, Concept Index, Acknowledgments, Top
		4901	@chapter License and Copyright
		4902
		4903	FFTW is copyright @copyright{} 1997--1999 Massachusetts Institute of
		4904	Technology.
		4905
		4906	FFTW is free software; you can redistribute it and/or modify
		4907	it under the terms of the GNU General Public License as published by
		4908	the Free Software Foundation; either version 2 of the License, or
		4909	(at your option) any later version.
		4910
		4911	This program is distributed in the hope that it will be useful,
		4912	but WITHOUT ANY WARRANTY; without even the implied warranty of
		4913	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
		4914	GNU General Public License for more details.
		4915
		4916	You should have received a copy of the GNU General Public License along
		4917	with this program; if not, write to the Free Software Foundation, Inc.,
		4918	59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. You can also
		4919	find the @uref{http://www.gnu.org/copyleft/gpl.html, GPL on the GNU web
		4920	site}.
		4921
		4922	In addition, we kindly ask you to acknowledge FFTW and its authors in
		4923	any program or publication in which you use FFTW. (You are not
		4924	@emph{required} to do so; it is up to your common sense to decide
		4925	whether you want to comply with this request or not.)
		4926
		4927	Non-free versions of FFTW are available under terms different than the
		4928	General Public License. (e.g. they do not require you to accompany any
		4929	object code using FFTW with the corresponding source code.) For these
		4930	alternate terms you must purchase a license from MIT's Technology
		4931	Licensing Office. Users interested in such a license should contact us
		4932	(@email{fftw@@theory.lcs.mit.edu}) for more information.
		4933
		4934	@node Concept Index, Library Index, License and Copyright, Top
		4935	@chapter Concept Index
		4936	@printindex cp
		4937
		4938	@node Library Index, , Concept Index, Top
		4939	@chapter Library Index
		4940	@printindex fn
		4941
		4942	@c ************************************************************
		4943	@contents
		4944
		4945	@bye
		4946	@c LocalWords: texinfo fftw texi Exp setfilename settitle setchapternewpage
		4947	@c LocalWords: syncodeindex fn cp vr pg tp ifinfo titlepage sp Matteo Frigo
		4948	@c LocalWords: vskip pt filll dir detailmenu halfcomplex DFT fftwnd rfftw gcc
		4949	@c LocalWords: rfftwnd Pentium PentiumPro NJ FFT cindex ESSL FFTW's emph uref
		4950	@c LocalWords: http lcs mit edu benchfft benchFFT pindex dfn iftex tex cdot
		4951	@c LocalWords: ifhtml nbsp TR Sep Proc ICASSP Tukey Rader's ref xref int FFTs
		4952	@c LocalWords: findex tindex vindex im strided pxref DC lfftw lm samp const
		4953	@c LocalWords: nx ny nz ldots rk ik hc freq lrfftw datatypes cdots pointwise
		4954	@c LocalWords: pinv ij rote increaseth Ecclesiastes nerd stdout fopen printf
		4955	@c LocalWords: fclose dimension's malloc sizeof comp lang www eskimo com scs
		4956	@c LocalWords: faq html README Cilk SMP cilk POSIX Solaris BeOS MacOS mpi mcs
		4957	@c LocalWords: anl gov mutex THREADSAFE struct leq conj lt hbox istride lg rt
		4958	@c LocalWords: ostride subsubheading howmany idist odist exp IMG SRC gif IFFT
		4959	@c LocalWords: frftw dist datatype datatype's nc noindent callback putc getc
		4960	@c LocalWords: EOF Pentia PentiumPro's codelets CFLAGS cc Omax config org toc
		4961	@c LocalWords: pentium preconfigured egcs alloca fomit SEC gettimeofday MIN
		4962	@c LocalWords: picoseconds Caml gensrc genfft pauillac inria fr ocaml ftp sh
		4963	@c LocalWords: caml DoD NDSEG DMR HPC SMPs Gflops automake libtool Leiserson
		4964	@c LocalWords: Sterian Didier Miras Funda Ergun Reimer bfnn copyleft gpl
		4965	@c LocalWords: printindex LocalWords

Subversion Repositories shark

(root)/shark/branches/xen/ports/fftw/doc/fftw.tex - Rev 1618