WebSVN - shark - Blame - Rev 1618 - /shark/branches/xen/ports/fftw/todo

Rev	Author	Line No.	Line
2	pj	1	The following are a number of ideas for future work that we have
		2	thought of, or which have been suggested to us. Let us know
		3	(fftw@theory.lcs.mit.edu) if you have other proposals, or if there is
		4	something that you want to work on.
		5
		6	* Implement some sort of Prime Factor algorithm (Temperton's?) (PFA is
		7	now used in the codelets.)
		8
		9	* Try the Winograd blocks for the base cases. (We now use Rader's
		10	algorithm for prime size codelets.)
		11
		12	* Try on-the-fly generation of twiddle factors, to save space and
		13	cache. (Done. However, not yet enabled in the standard distribution.
		14	The codelet generator is capable of generating code that either loads
		15	or computes the twiddle factors, and the FFTW C code supports both
		16	ways. We do not have enough experimental numbers to determine which
		17	way is faster, however)
		18
		19	* Since we now have "strided wisdom," it would be nice to keep the
		20	stride into account when planning 1D transform recursively. We should
		21	eliminate the planner table altogether, and just use the wisdom table
		22	for planning.
		23
		24	* Implement fast DCT and DST codes (cosine and sine transforms);
		25	equivalently, implement fast algorithms for transforms of real/even
		26	and real/odd data. There are two parts to this: (i) modify the
		27	codelet generator to output hard-coded transforms of small sizes [this
		28	is done], and (ii) figure out & implement a recursive framework for
		29	combining these codelets to achieve transforms of general lengths.
		30	(Once this is done, implement multi-dimensional transforms, etcetera.)
		31
		32	* Implement a library of convolution routines, windowing, filters,
		33	etcetera based on FFTW. As DSP isn't our field (we are interested in
		34	FFTs for other reasons), this sort of thing is probably best left to
		35	others. Let us know if you're interested in writing such a thing,
		36	though, and we'll be happy to link to your site and give you feedback.
		37
		38	* Generate multi-dimensional codelets for use in two/three-dimensional
		39	transforms. (i.e. implement what is sometimes called a "vector-radix"
		40	algorithm.) There are potential cache benefits to this.
		41
		42	* Take advantage of the vector instructions on the Pentium-III and
		43	forthcoming PowerPC architectures. (Coming from the old Cray vector
		44	supercomputers and the horrible coding they encouraged, this seems
		45	suspiciously like a giant step backwards in computer architectures...)
		46	We'd like to see better gcc support before we do anything along these
		47	lines, though.
		48
		49	* In rfftw, implement a fast O(n lg n) algorithm for prime sizes and
		50	large prime factors (currently, only the complex FFTW has fast
		51	algorithms for prime sizes). The basic problem is that we don't know
		52	of any such algorithm specialized for real data; suggestions and/or
		53	references are welcome.
		54
		55	* In the MPI transforms, implement a parallel 1D transform for real
		56	data (i.e. rfftw_mpi). (Currently, there are only parallel 1D
		57	transforms for complex data in the MPI code.)
		58
		59	* In the MPI transforms, implement more sophisticated (i.e. faster)
		60	in-place and out-of-place transpose routines for the in-process
		61	transposes (used as subroutines by the distributed transpose). The
		62	current routines are quite simplistic, although it is not clear how
		63	much they hurt performance.

Subversion Repositories shark

(root)/shark/branches/xen/ports/fftw/todo - Rev 1618