# Subversion Repositoriesshark

Rev
Rev Author Line No. Line
2 pj 1
`The following are a number of ideas for future work that we have`
2
`thought of, or which have been suggested to us.  Let us know`
3
`(fftw@theory.lcs.mit.edu) if you have other proposals, or if there is`
4
`something that you want to work on.`
5
` `
6
`* Implement some sort of Prime Factor algorithm (Temperton's?)  (PFA is`
7
`now used in the codelets.)`
8
` `
9
`* Try the Winograd blocks for the base cases. (We now use Rader's`
10
`algorithm for prime size codelets.)`
11
` `
12
`* Try on-the-fly generation of twiddle factors, to save space and`
13
`cache. (Done.  However, not yet enabled in the standard distribution.`
14
`The codelet generator is capable of generating code that either loads`
15
`or computes the twiddle factors, and the FFTW C code supports both`
16
`ways.  We do not have enough experimental numbers to determine which`
17
`way is faster, however)`
18
` `
19
`* Since we now have "strided wisdom," it would be nice to keep the`
20
`stride into account when planning 1D transform recursively.  We should`
21
`eliminate the planner table altogether, and just use the wisdom table`
22
`for planning.`
23
` `
24
`* Implement fast DCT and DST codes (cosine and sine transforms);`
25
`equivalently, implement fast algorithms for transforms of real/even`
26
`and real/odd data.  There are two parts to this: (i) modify the`
27
`codelet generator to output hard-coded transforms of small sizes [this`
28
`is done], and (ii) figure out & implement a recursive framework for`
29
`combining these codelets to achieve transforms of general lengths.`
30
`(Once this is done, implement multi-dimensional transforms, etcetera.)`
31
` `
32
`* Implement a library of convolution routines, windowing, filters,`
33
`etcetera based on FFTW.  As DSP isn't our field (we are interested in`
34
`FFTs for other reasons), this sort of thing is probably best left to`
35
`others.  Let us know if you're interested in writing such a thing,`
36
`though, and we'll be happy to link to your site and give you feedback.`
37
` `
38
`* Generate multi-dimensional codelets for use in two/three-dimensional`
39
`transforms.  (i.e. implement what is sometimes called a "vector-radix"`
40
`algorithm.)  There are potential cache benefits to this.`
41
` `
42
`* Take advantage of the vector instructions on the Pentium-III and`
43
`forthcoming PowerPC architectures.  (Coming from the old Cray vector`
44
`supercomputers and the horrible coding they encouraged, this seems`
45
`suspiciously like a giant step backwards in computer architectures...)`
46
`We'd like to see better gcc support before we do anything along these`
47
`lines, though.`
48
` `
49
`* In rfftw, implement a fast O(n lg n) algorithm for prime sizes and`
50
`large prime factors (currently, only the complex FFTW has fast`
51
`algorithms for prime sizes).  The basic problem is that we don't know`
52
`of any such algorithm specialized for real data; suggestions and/or`
53
`references are welcome.`
54
` `
55
`* In the MPI transforms, implement a parallel 1D transform for real`
56
`data (i.e. rfftw_mpi).  (Currently, there are only parallel 1D`
57
`transforms for complex data in the MPI code.)`
58
` `
59
`* In the MPI transforms, implement more sophisticated (i.e. faster)`
60
`in-place and out-of-place transpose routines for the in-process`
61
`transposes (used as subroutines by the distributed transpose).  The`
62
`current routines are quite simplistic, although it is not clear how`
63
`much they hurt performance.`