[phobos] FFT

Mon Aug 2 10:52:05 PDT 2010

I'll defer answer to this to others, as I haven't used FFT for a long time.

I do remember, however, that the discrete cosine transform was actually 
more popular in the circles I frequented. Would it be difficult to adapt 
your implementation to offer dct?

Andrei

David Simcha wrote:
> BTW, I've started thinking a little more about big picture issues here, 
> and I'm debating whether it's a higher priority to improve performance 
> on power of 2 sizes, or to try to support other radix values.
> 
> There are two use cases for an FFT that I'm familiar with.  The 
> power-of-two limitation isn't severe in either of them.
> 
> 1.  Getting an idea of what the spectrum of a signal looks like.  Here, 
> it's common to pad with zeros because the plots become clearer looking, 
> even if your FFT lib doesn't require it.
> 
> 2.  Computing a convolution.  Here, padding with zeros is necessary 
> anyhow to prevent the signal from being "interpreted" as periodic.
> 
> Are there any major use cases where the power of two limitation is 
> severe, or should I just focus on optimizing powers of 2 and call it a day?
> 
> On Mon, Aug 2, 2010 at 10:23 AM, Don Clugston <dclugston at googlemail.com 
> <mailto:dclugston at googlemail.com>> wrote:
> 
>     On 2 August 2010 15:41, David Simcha <dsimcha at gmail.com
>     <mailto:dsimcha at gmail.com>> wrote:
>      > Unfortunately I just downloaded the benchmark program for FFTW
>     and my FFT is
>      > a ton slower, depending on how you look at it.  Using size 2^20 as my
>      > benchmark, FFTW takes about 131 seconds to create its plan, even
>     using
>      > -oestimate, the fastest planner.  However, the plan can be reused if
>      > performing multiple FFTs of the same size, and once the plan is
>     created, it
>      > can do an FFT of size 2^20 in about 53 milliseconds (which I find
>     almost
>      > unbelievable because even sorting an array of size 2^20 using a
>      > well-optimized quick sort takes almost that long, and FFT seems
>     like it
>      > should should have a much larger constant than quick sort),
>     compared to my
>      > latest improvements to around 730 on the hardware I'm
>     benchmarking on.
>      >
>      > On the other hand, for one-off use cases, the lack of needing to
>     create a
>      > plan is a big win, both from a speed and a simplicity of API
>     point of view.
>      >  Benchmarking against R, which doesn't appear to use plans,
>     making the
>      > comparison somewhat more relevant, things look better for my FFT:
>      R takes
>      > about 610 milliseconds for a size 2^20 pure real FFT.
> 
>     All you're seeing is the L2 cache. Did you see the link I posted to
>     the NG about the internals of FFTW? The graph at the top shows FFTW is
>     40 times faster than the 'numerical recipes' code for 2^^20. So your
>     factor of 13 isn't so bad. Based on that graph, if you reduce it to
>     say 2^^15, the factor should drop significantly. Adding a little bit
>     of cache awareness (using core.cpuid) should be able to avoid the
>     performance cliff.
>     Also, DMD's floating point optimiser is so primitive, you lose up to a
>     factor of two immediately.
>     _______________________________________________
>     phobos mailing list
>     phobos at puremagic.com <mailto:phobos at puremagic.com>
>     http://lists.puremagic.com/mailman/listinfo/phobos
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos