[phobos] FFT

David Simcha dsimcha at gmail.com
Mon Aug 2 10:49:41 PDT 2010


Oh, also, I don't think that cache effects are the main bottleneck because
switching to single-precision floats for both input and output has a
negligible effect on performance even though it cuts the size of the working
set in half.

On Mon, Aug 2, 2010 at 10:23 AM, Don Clugston <dclugston at googlemail.com>wrote:

> On 2 August 2010 15:41, David Simcha <dsimcha at gmail.com> wrote:
> > Unfortunately I just downloaded the benchmark program for FFTW and my FFT
> is
> > a ton slower, depending on how you look at it.  Using size 2^20 as my
> > benchmark, FFTW takes about 131 seconds to create its plan, even using
> > -oestimate, the fastest planner.  However, the plan can be reused if
> > performing multiple FFTs of the same size, and once the plan is created,
> it
> > can do an FFT of size 2^20 in about 53 milliseconds (which I find almost
> > unbelievable because even sorting an array of size 2^20 using a
> > well-optimized quick sort takes almost that long, and FFT seems like it
> > should should have a much larger constant than quick sort), compared to
> my
> > latest improvements to around 730 on the hardware I'm benchmarking on.
> >
> > On the other hand, for one-off use cases, the lack of needing to create a
> > plan is a big win, both from a speed and a simplicity of API point of
> view.
> >  Benchmarking against R, which doesn't appear to use plans, making the
> > comparison somewhat more relevant, things look better for my FFT:  R
> takes
> > about 610 milliseconds for a size 2^20 pure real FFT.
>
> All you're seeing is the L2 cache. Did you see the link I posted to
> the NG about the internals of FFTW? The graph at the top shows FFTW is
> 40 times faster than the 'numerical recipes' code for 2^^20. So your
> factor of 13 isn't so bad. Based on that graph, if you reduce it to
> say 2^^15, the factor should drop significantly. Adding a little bit
> of cache awareness (using core.cpuid) should be able to avoid the
> performance cliff.
> Also, DMD's floating point optimiser is so primitive, you lose up to a
> factor of two immediately.
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20100802/4a96ba4f/attachment.html>


More information about the phobos mailing list