[phobos] FFT

Mon Aug 2 10:43:10 PDT 2010

BTW, I've started thinking a little more about big picture issues here, and
I'm debating whether it's a higher priority to improve performance on power
of 2 sizes, or to try to support other radix values.

There are two use cases for an FFT that I'm familiar with.  The power-of-two
limitation isn't severe in either of them.

1.  Getting an idea of what the spectrum of a signal looks like.  Here, it's
common to pad with zeros because the plots become clearer looking, even if
your FFT lib doesn't require it.

2.  Computing a convolution.  Here, padding with zeros is necessary anyhow
to prevent the signal from being "interpreted" as periodic.

Are there any major use cases where the power of two limitation is severe,
or should I just focus on optimizing powers of 2 and call it a day?

On Mon, Aug 2, 2010 at 10:23 AM, Don Clugston <dclugston at googlemail.com>wrote:

> On 2 August 2010 15:41, David Simcha <dsimcha at gmail.com> wrote:
> > Unfortunately I just downloaded the benchmark program for FFTW and my FFT
> is
> > a ton slower, depending on how you look at it.  Using size 2^20 as my
> > benchmark, FFTW takes about 131 seconds to create its plan, even using
> > -oestimate, the fastest planner.  However, the plan can be reused if
> > performing multiple FFTs of the same size, and once the plan is created,
> it
> > can do an FFT of size 2^20 in about 53 milliseconds (which I find almost
> > unbelievable because even sorting an array of size 2^20 using a
> > well-optimized quick sort takes almost that long, and FFT seems like it
> > should should have a much larger constant than quick sort), compared to
> my
> > latest improvements to around 730 on the hardware I'm benchmarking on.
> >
> > On the other hand, for one-off use cases, the lack of needing to create a
> > plan is a big win, both from a speed and a simplicity of API point of
> view.
> >  Benchmarking against R, which doesn't appear to use plans, making the
> > comparison somewhat more relevant, things look better for my FFT:  R
> takes
> > about 610 milliseconds for a size 2^20 pure real FFT.
>
> All you're seeing is the L2 cache. Did you see the link I posted to
> the NG about the internals of FFTW? The graph at the top shows FFTW is
> 40 times faster than the 'numerical recipes' code for 2^^20. So your
> factor of 13 isn't so bad. Based on that graph, if you reduce it to
> say 2^^15, the factor should drop significantly. Adding a little bit
> of cache awareness (using core.cpuid) should be able to avoid the
> performance cliff.
> Also, DMD's floating point optimiser is so primitive, you lose up to a
> factor of two immediately.
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100802/36a79e68/attachment.html>