Slow performance compared to C++, ideas?

Manu turkeyman at gmail.com
Thu May 30 23:34:07 PDT 2013


On 31 May 2013 11:26, finalpatch <fengli at gmail.com> wrote:

> Recently I ported a simple ray tracer I wrote in C++11 to D. Thanks to the
> similarity between D and C++ it was almost a line by line translation, in
> other words, very very close. However, the D verson runs much slower than
> the C++11 version. On Windows, with MinGW GCC and GDC, the C++ version is
> twice as fast as the D version. On OSX, I used Clang++ and LDC, and the
> C++11 version was 4x faster than D verson.  Since the comparison were
> between compilers that share the same codegen backends I suppose that's a
> relatively fair comparison.  (flags used for GDC: -O3 -fno-bounds-check
> -frelease,  flags used for LDC: -O3 -release)
>
> I really like the features offered by D but it's the raw performance
> that's worrying me. From what I read D should offer similar performance
> when doing similar things but my own test results is not consistent with
> this claim. I want to know whether this slowness is inherent to the
> language or it's something I was not doing right (very possible because I
> have only a few days of experience with D).
>
> Below is the link to the D and C++ code, in case anyone is interested to
> have a look.
>
> https://dl.dropboxusercontent.**com/u/974356/raytracer.d<https://dl.dropboxusercontent.com/u/974356/raytracer.d>
> https://dl.dropboxusercontent.**com/u/974356/raytracer.cpp<https://dl.dropboxusercontent.com/u/974356/raytracer.cpp>
>

Can you paste the disassembly of the inner loop (trace()) for each G++/GDC,
Or LDC/Clang++?

That said, I can see almost innumerable red flags (on basically every line).
The fact that it takes 200ms to render a frame in C++ (I would expect
<10ms) suggests that your approach is amazingly slow to begin with, at
which point I would start looking for much higher level problems.
Once you have an implementation that's approaching optimal, then we can
start making comparisons.

Here are some thoughts at first glance:
* The fact that you use STL makes me immediately concerned. Generic code
for this sort of work will never run well.
   That said, STL has decades more time spent optimising, so it stands to
reason that the C++ compiler will be able to do more to improve the STL
code.
* Your vector class both in C++/D are pretty nasty. Use 4d SIMD vectors.
* So many integer divisions!
* There are countless float <-> int casts.
* Innumerable redundant loads/stores.
* I would have raised the virtual-by-default travesty, but Andrei did it
for me! ;)
* intersect() should be __forceinline.
* intersect() is full of if's (it's hard to predict if the optimiser can
work across those if's. maybe it can...)

What's taking the most time?
The lighting loop is so template-tastic, I can't get a feel for how fast
that loop would be.

I believe the reason for the difference is not going to be so easily
revealed. It's probably hidden largely in the fact that C++ has had a good
decade of optimisation spent on STL over D.
It's also possible that the C++ compiler hooks many of those STL functions
as compiler intrinsics with internalised logic.

Frankly, this is a textbook example of why STL is the spawn of satan. For
some reason people are TAUGHT that it's reasonable to write code like this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20130531/6e80c039/attachment-0001.html>


More information about the Digitalmars-d mailing list