FFT in D (using SIMD) and benchmarks

Manu turkeyman at gmail.com
Wed Jan 25 04:54:44 PST 2012


Can you paste disassembly's of the GDC code and the G++ code?
I imagine there's something trivial in the scheduler that GDC has missed.
Like the other day I noticed GDC was unnecessarily generating a stack frame
for leaf functions, which Iain already fixed.

I'd also be interested to try out my experimental std.simd (portable)
library in the context of your FFT... might give that a shot, I think it'll
work well.


On 25 January 2012 02:04, a <a at a.com> wrote:

> Since SIMD types were added to D I've ported an FFT that I was writing in
> C++ to D. The code is here:
>
> https://github.com/jerro/pfft
>
> Because dmd currently doesn't have an intrinsic for the SHUFPS instruction
> I've included a version block with some GDC specific code (this gave me a
> speedup of up to 80%). I've benchmarked the scalar and SSE version of code
> compiled with both DMD and GDC and also the c++ code using SSE. The results
> are below. The left column is base two logarithm of the array size and the
> right column is GFLOPS defined as the number of floating point operations
> that the most basic FFT algorithm would perform divided by the time taken
> (the algorithm I used performs just a bit less operations):
>
> GFLOPS = 5 n log2(n) / (time for one FFT in nanoseconds)   (I took that
> definition from http://www.fftw.org/speed/ )
>
> Chart: http://cloud.github.com/**downloads/jerro/pfft/image.png<http://cloud.github.com/downloads/jerro/pfft/image.png>
>
> Results:
>
> GDC SSE:
>
> 2       0.833648
> 3       1.23383
> 4       6.92712
> 5       8.93348
> 6       10.9212
> 7       11.9306
> 8       12.5338
> 9       13.4025
> 10      13.5835
> 11      13.6992
> 12      13.493
> 13      12.7082
> 14      9.32621
> 15      9.15256
> 16      9.31431
> 17      8.38154
> 18      8.267
> 19      7.61852
> 20      7.14305
> 21      7.01786
> 22      6.58934
>
> G++ SSE:
>
> 2       1.65933
> 3       1.96071
> 4       7.09683
> 5       9.66308
> 6       11.1498
> 7       11.9315
> 8       12.5712
> 9       13.4241
> 10      13.4907
> 11      13.6524
> 12      13.4215
> 13      12.6472
> 14      9.62755
> 15      9.24289
> 16      9.64412
> 17      8.88006
> 18      8.66819
> 19      8.28623
> 20      7.74581
> 21      7.6395
> 22      7.33506
>
> GDC scalar:
>
> 2       0.808422
> 3       1.20835
> 4       2.66921
> 5       2.81166
> 6       2.99551
> 7       3.26423
> 8       3.61477
> 9       3.90741
> 10      4.04009
> 11      4.20405
> 12      4.21491
> 13      4.30896
> 14      3.79835
> 15      3.80497
> 16      3.94784
> 17      3.98417
> 18      3.58506
> 19      3.33992
> 20      3.42309
> 21      3.21923
> 22      3.25673
>
> DMD SSE:
>
> 2       0.497946
> 3       0.773551
> 4       3.79912
> 5       3.78027
> 6       3.85155
> 7       4.06491
> 8       4.30895
> 9       4.53038
> 10      4.61006
> 11      4.82098
> 12      4.7455
> 13      4.85332
> 14      3.37768
> 15      3.44962
> 16      3.54049
> 17      3.40236
> 18      3.47339
> 19      3.40212
> 20      3.15997
> 21      3.32644
> 22      3.22767
>
> DMD scalar:
>
> 2       0.478998
> 3       0.772341
> 4       1.6106
> 5       1.68516
> 6       1.7083
> 7       1.70625
> 8       1.68684
> 9       1.66931
> 10      1.66125
> 11      1.63756
> 12      1.61885
> 13      1.60459
> 14      1.402
> 15      1.39665
> 16      1.37894
> 17      1.36306
> 18      1.27189
> 19      1.21033
> 20      1.25719
> 21      1.21315
> 22      1.21606
>
> SIMD gives between 2 and 3.5 speedup for GDC compiled code and between 2.5
> and 3 for DMD. Code compiled with GDC is just a little bit slower than G++
> (and just for some values of n), which is really nice.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120125/d9f50f4e/attachment.html>


More information about the Digitalmars-d mailing list