FFT in D (using SIMD) and benchmarks
Manu
turkeyman at gmail.com
Wed Jan 25 04:54:44 PST 2012
Can you paste disassembly's of the GDC code and the G++ code?
I imagine there's something trivial in the scheduler that GDC has missed.
Like the other day I noticed GDC was unnecessarily generating a stack frame
for leaf functions, which Iain already fixed.
I'd also be interested to try out my experimental std.simd (portable)
library in the context of your FFT... might give that a shot, I think it'll
work well.
On 25 January 2012 02:04, a <a at a.com> wrote:
> Since SIMD types were added to D I've ported an FFT that I was writing in
> C++ to D. The code is here:
>
> https://github.com/jerro/pfft
>
> Because dmd currently doesn't have an intrinsic for the SHUFPS instruction
> I've included a version block with some GDC specific code (this gave me a
> speedup of up to 80%). I've benchmarked the scalar and SSE version of code
> compiled with both DMD and GDC and also the c++ code using SSE. The results
> are below. The left column is base two logarithm of the array size and the
> right column is GFLOPS defined as the number of floating point operations
> that the most basic FFT algorithm would perform divided by the time taken
> (the algorithm I used performs just a bit less operations):
>
> GFLOPS = 5 n log2(n) / (time for one FFT in nanoseconds) (I took that
> definition from http://www.fftw.org/speed/ )
>
> Chart: http://cloud.github.com/**downloads/jerro/pfft/image.png<http://cloud.github.com/downloads/jerro/pfft/image.png>
>
> Results:
>
> GDC SSE:
>
> 2 0.833648
> 3 1.23383
> 4 6.92712
> 5 8.93348
> 6 10.9212
> 7 11.9306
> 8 12.5338
> 9 13.4025
> 10 13.5835
> 11 13.6992
> 12 13.493
> 13 12.7082
> 14 9.32621
> 15 9.15256
> 16 9.31431
> 17 8.38154
> 18 8.267
> 19 7.61852
> 20 7.14305
> 21 7.01786
> 22 6.58934
>
> G++ SSE:
>
> 2 1.65933
> 3 1.96071
> 4 7.09683
> 5 9.66308
> 6 11.1498
> 7 11.9315
> 8 12.5712
> 9 13.4241
> 10 13.4907
> 11 13.6524
> 12 13.4215
> 13 12.6472
> 14 9.62755
> 15 9.24289
> 16 9.64412
> 17 8.88006
> 18 8.66819
> 19 8.28623
> 20 7.74581
> 21 7.6395
> 22 7.33506
>
> GDC scalar:
>
> 2 0.808422
> 3 1.20835
> 4 2.66921
> 5 2.81166
> 6 2.99551
> 7 3.26423
> 8 3.61477
> 9 3.90741
> 10 4.04009
> 11 4.20405
> 12 4.21491
> 13 4.30896
> 14 3.79835
> 15 3.80497
> 16 3.94784
> 17 3.98417
> 18 3.58506
> 19 3.33992
> 20 3.42309
> 21 3.21923
> 22 3.25673
>
> DMD SSE:
>
> 2 0.497946
> 3 0.773551
> 4 3.79912
> 5 3.78027
> 6 3.85155
> 7 4.06491
> 8 4.30895
> 9 4.53038
> 10 4.61006
> 11 4.82098
> 12 4.7455
> 13 4.85332
> 14 3.37768
> 15 3.44962
> 16 3.54049
> 17 3.40236
> 18 3.47339
> 19 3.40212
> 20 3.15997
> 21 3.32644
> 22 3.22767
>
> DMD scalar:
>
> 2 0.478998
> 3 0.772341
> 4 1.6106
> 5 1.68516
> 6 1.7083
> 7 1.70625
> 8 1.68684
> 9 1.66931
> 10 1.66125
> 11 1.63756
> 12 1.61885
> 13 1.60459
> 14 1.402
> 15 1.39665
> 16 1.37894
> 17 1.36306
> 18 1.27189
> 19 1.21033
> 20 1.25719
> 21 1.21315
> 22 1.21606
>
> SIMD gives between 2 and 3.5 speedup for GDC compiled code and between 2.5
> and 3 for DMD. Code compiled with GDC is just a little bit slower than G++
> (and just for some values of n), which is really nice.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120125/d9f50f4e/attachment.html>
More information about the Digitalmars-d
mailing list