<div dir="ltr">On 31 May 2013 11:26, finalpatch <span dir="ltr"><<a href="mailto:fengli@gmail.com" target="_blank">fengli@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Recently I ported a simple ray tracer I wrote in C++11 to D. Thanks to the similarity between D and C++ it was almost a line by line translation, in other words, very very close. However, the D verson runs much slower than the C++11 version. On Windows, with MinGW GCC and GDC, the C++ version is twice as fast as the D version. On OSX, I used Clang++ and LDC, and the C++11 version was 4x faster than D verson. Since the comparison were between compilers that share the same codegen backends I suppose that's a relatively fair comparison. (flags used for GDC: -O3 -fno-bounds-check -frelease, flags used for LDC: -O3 -release)<br>
<br>
I really like the features offered by D but it's the raw performance that's worrying me. From what I read D should offer similar performance when doing similar things but my own test results is not consistent with this claim. I want to know whether this slowness is inherent to the language or it's something I was not doing right (very possible because I have only a few days of experience with D).<br>
<br>
Below is the link to the D and C++ code, in case anyone is interested to have a look.<br>
<br>
<a href="https://dl.dropboxusercontent.com/u/974356/raytracer.d" target="_blank">https://dl.dropboxusercontent.<u></u>com/u/974356/raytracer.d</a><br>
<a href="https://dl.dropboxusercontent.com/u/974356/raytracer.cpp" target="_blank">https://dl.dropboxusercontent.<u></u>com/u/974356/raytracer.cpp</a><br>
</blockquote><div><br></div><div style>Can you paste the disassembly of the inner loop (trace()) for each G++/GDC, Or LDC/Clang++?</div><div style><br></div><div style>That said, I can see almost innumerable red flags (on basically every line).</div>
<div style>The fact that it takes 200ms to render a frame in C++ (I would expect <10ms) suggests that your approach is amazingly slow to begin with, at which point I would start looking for much higher level problems.</div>
<div style>Once you have an implementation that's approaching optimal, then we can start making comparisons.</div><div style><br></div><div style>Here are some thoughts at first glance:</div><div>* The fact that you use STL makes me immediately concerned. Generic code for this sort of work will never run well.</div>
<div> That said, STL has decades more time spent optimising, so it stands to reason that the C++ compiler will be able to do more to improve the STL code.</div><div>* Your vector class both in C++/D are pretty nasty. Use 4d SIMD vectors.<br>
</div><div>* So many integer divisions!<br></div><div>* There are countless float <-> int casts.<br></div><div style>* Innumerable redundant loads/stores.</div><div style><div>* I would have raised the virtual-by-default travesty, but Andrei did it for me! ;)</div>
<div style>* intersect() should be __forceinline.</div><div style>* intersect() is full of if's (it's hard to predict if the optimiser can work across those if's. maybe it can...)</div><div><br></div><div style>
What's taking the most time?</div><div style>The lighting loop is so template-tastic, I can't get a feel for how fast that loop would be.</div><div style><br></div><div style>I believe the reason for the difference is not going to be so easily revealed. It's probably hidden largely in the fact that C++ has had a good decade of optimisation spent on STL over D.</div>
<div style>It's also possible that the C++ compiler hooks many of those STL functions as compiler intrinsics with internalised logic.</div><div style><br></div><div style>Frankly, this is a textbook example of why STL is the spawn of satan. For some reason people are TAUGHT that it's reasonable to write code like this.</div>
</div></div></div></div>