Can you paste disassembly's of the GDC code and the G++ code?<div>I imagine there's something trivial in the scheduler that GDC has missed. Like the other day I noticed GDC was unnecessarily generating a stack frame for leaf functions, which Iain already fixed.</div>
<div><br></div><div>I'd also be interested to try out my experimental std.simd (portable) library in the context of your FFT... might give that a shot, I think it'll work well.</div><div><br></div><br><div class="gmail_quote">
On 25 January 2012 02:04, a <span dir="ltr"><<a href="mailto:a@a.com">a@a.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Since SIMD types were added to D I've ported an FFT that I was writing in C++ to D. The code is here:<br>
<br>
<a href="https://github.com/jerro/pfft" target="_blank">https://github.com/jerro/pfft</a><br>
<br>
Because dmd currently doesn't have an intrinsic for the SHUFPS instruction I've included a version block with some GDC specific code (this gave me a speedup of up to 80%). I've benchmarked the scalar and SSE version of code compiled with both DMD and GDC and also the c++ code using SSE. The results are below. The left column is base two logarithm of the array size and the right column is GFLOPS defined as the number of floating point operations that the most basic FFT algorithm would perform divided by the time taken (the algorithm I used performs just a bit less operations):<br>
<br>
GFLOPS = 5 n log2(n) / (time for one FFT in nanoseconds) (I took that definition from <a href="http://www.fftw.org/speed/" target="_blank">http://www.fftw.org/speed/</a> )<br>
<br>
Chart: <a href="http://cloud.github.com/downloads/jerro/pfft/image.png" target="_blank">http://cloud.github.com/<u></u>downloads/jerro/pfft/image.png</a><br>
<br>
Results:<br>
<br>
GDC SSE:<br>
<br>
2 0.833648<br>
3 1.23383<br>
4 6.92712<br>
5 8.93348<br>
6 10.9212<br>
7 11.9306<br>
8 12.5338<br>
9 <a href="tel:13.4025" value="+61134025" target="_blank">13.4025</a><br>
10 <a href="tel:13.5835" value="+61135835" target="_blank">13.5835</a><br>
11 <a href="tel:13.6992" value="+61136992" target="_blank">13.6992</a><br>
12 13.493<br>
13 12.7082<br>
14 9.32621<br>
15 9.15256<br>
16 9.31431<br>
17 8.38154<br>
18 8.267<br>
19 7.61852<br>
20 7.14305<br>
21 7.01786<br>
22 6.58934<br>
<br>
G++ SSE:<br>
<br>
2 <a href="tel:1.65933" value="+61165933" target="_blank">1.65933</a><br>
3 1.96071<br>
4 7.09683<br>
5 9.66308<br>
6 11.1498<br>
7 11.9315<br>
8 12.5712<br>
9 <a href="tel:13.4241" value="+61134241" target="_blank">13.4241</a><br>
10 <a href="tel:13.4907" value="+61134907" target="_blank">13.4907</a><br>
11 <a href="tel:13.6524" value="+61136524" target="_blank">13.6524</a><br>
12 <a href="tel:13.4215" value="+61134215" target="_blank">13.4215</a><br>
13 12.6472<br>
14 9.62755<br>
15 9.24289<br>
16 9.64412<br>
17 8.88006<br>
18 8.66819<br>
19 8.28623<br>
20 7.74581<br>
21 7.6395<br>
22 7.33506<br>
<br>
GDC scalar:<br>
<br>
2 0.808422<br>
3 1.20835<br>
4 2.66921<br>
5 2.81166<br>
6 2.99551<br>
7 3.26423<br>
8 3.61477<br>
9 3.90741<br>
10 4.04009<br>
11 4.20405<br>
12 4.21491<br>
13 4.30896<br>
14 3.79835<br>
15 3.80497<br>
16 3.94784<br>
17 3.98417<br>
18 3.58506<br>
19 3.33992<br>
20 3.42309<br>
21 3.21923<br>
22 3.25673<br>
<br>
DMD SSE:<br>
<br>
2 0.497946<br>
3 0.773551<br>
4 3.79912<br>
5 3.78027<br>
6 3.85155<br>
7 4.06491<br>
8 4.30895<br>
9 4.53038<br>
10 4.61006<br>
11 4.82098<br>
12 4.7455<br>
13 4.85332<br>
14 3.37768<br>
15 3.44962<br>
16 3.54049<br>
17 3.40236<br>
18 3.47339<br>
19 3.40212<br>
20 3.15997<br>
21 3.32644<br>
22 3.22767<br>
<br>
DMD scalar:<br>
<br>
2 0.478998<br>
3 0.772341<br>
4 1.6106<br>
5 <a href="tel:1.68516" value="+61168516" target="_blank">1.68516</a><br>
6 1.7083<br>
7 1.70625<br>
8 <a href="tel:1.68684" value="+61168684" target="_blank">1.68684</a><br>
9 <a href="tel:1.66931" value="+61166931" target="_blank">1.66931</a><br>
10 <a href="tel:1.66125" value="+61166125" target="_blank">1.66125</a><br>
11 <a href="tel:1.63756" value="+61163756" target="_blank">1.63756</a><br>
12 <a href="tel:1.61885" value="+61161885" target="_blank">1.61885</a><br>
13 <a href="tel:1.60459" value="+61160459" target="_blank">1.60459</a><br>
14 1.402<br>
15 <a href="tel:1.39665" value="+61139665" target="_blank">1.39665</a><br>
16 <a href="tel:1.37894" value="+61137894" target="_blank">1.37894</a><br>
17 <a href="tel:1.36306" value="+61136306" target="_blank">1.36306</a><br>
18 1.27189<br>
19 1.21033<br>
20 1.25719<br>
21 1.21315<br>
22 1.21606<br>
<br>
SIMD gives between 2 and 3.5 speedup for GDC compiled code and between 2.5 and 3 for DMD. Code compiled with GDC is just a little bit slower than G++ (and just for some values of n), which is really nice.<br>
</blockquote></div><br>