andrei at erdani.com
Mon Aug 2 10:52:05 PDT 2010
I'll defer answer to this to others, as I haven't used FFT for a long time.
I do remember, however, that the discrete cosine transform was actually
more popular in the circles I frequented. Would it be difficult to adapt
your implementation to offer dct?
David Simcha wrote:
> BTW, I've started thinking a little more about big picture issues here,
> and I'm debating whether it's a higher priority to improve performance
> on power of 2 sizes, or to try to support other radix values.
> There are two use cases for an FFT that I'm familiar with. The
> power-of-two limitation isn't severe in either of them.
> 1. Getting an idea of what the spectrum of a signal looks like. Here,
> it's common to pad with zeros because the plots become clearer looking,
> even if your FFT lib doesn't require it.
> 2. Computing a convolution. Here, padding with zeros is necessary
> anyhow to prevent the signal from being "interpreted" as periodic.
> Are there any major use cases where the power of two limitation is
> severe, or should I just focus on optimizing powers of 2 and call it a day?
> On Mon, Aug 2, 2010 at 10:23 AM, Don Clugston <dclugston at googlemail.com
> <mailto:dclugston at googlemail.com>> wrote:
> On 2 August 2010 15:41, David Simcha <dsimcha at gmail.com
> <mailto:dsimcha at gmail.com>> wrote:
> > Unfortunately I just downloaded the benchmark program for FFTW
> and my FFT is
> > a ton slower, depending on how you look at it. Using size 2^20 as my
> > benchmark, FFTW takes about 131 seconds to create its plan, even
> > -oestimate, the fastest planner. However, the plan can be reused if
> > performing multiple FFTs of the same size, and once the plan is
> created, it
> > can do an FFT of size 2^20 in about 53 milliseconds (which I find
> > unbelievable because even sorting an array of size 2^20 using a
> > well-optimized quick sort takes almost that long, and FFT seems
> like it
> > should should have a much larger constant than quick sort),
> compared to my
> > latest improvements to around 730 on the hardware I'm
> benchmarking on.
> > On the other hand, for one-off use cases, the lack of needing to
> create a
> > plan is a big win, both from a speed and a simplicity of API
> point of view.
> > Benchmarking against R, which doesn't appear to use plans,
> making the
> > comparison somewhat more relevant, things look better for my FFT:
> R takes
> > about 610 milliseconds for a size 2^20 pure real FFT.
> All you're seeing is the L2 cache. Did you see the link I posted to
> the NG about the internals of FFTW? The graph at the top shows FFTW is
> 40 times faster than the 'numerical recipes' code for 2^^20. So your
> factor of 13 isn't so bad. Based on that graph, if you reduce it to
> say 2^^15, the factor should drop significantly. Adding a little bit
> of cache awareness (using core.cpuid) should be able to avoid the
> performance cliff.
> Also, DMD's floating point optimiser is so primitive, you lose up to a
> factor of two immediately.
> phobos mailing list
> phobos at puremagic.com <mailto:phobos at puremagic.com>
> phobos mailing list
> phobos at puremagic.com
More information about the phobos