Slow performance compared to C++, ideas?

Thu May 30 22:35:57 PDT 2013

On 05/31/2013 02:15 AM, nazriel wrote:
> On Friday, 31 May 2013 at 01:26:13 UTC, finalpatch wrote:
>> Recently I ported a simple ray tracer I wrote in C++11 to D. Thanks to the similarity between D and C++ it was almost
>> a line by line translation, in other words, very very close. However, the D verson runs much slower than the C++11
>> version. On Windows, with MinGW GCC and GDC, the C++ version is twice as fast as the D version. On OSX, I used Clang++
>> and LDC, and the C++11 version was 4x faster than D verson.  Since the comparison were between compilers that share
>> the same codegen backends I suppose that's a relatively fair comparison.  (flags used for GDC: -O3 -fno-bounds-check
>> -frelease,  flags used for LDC: -O3 -release)
>>
>> I really like the features offered by D but it's the raw performance that's worrying me. From what I read D should
>> offer similar performance when doing similar things but my own test results is not consistent with this claim. I want
>> to know whether this slowness is inherent to the language or it's something I was not doing right (very possible
>> because I have only a few days of experience with D).
>>
>> Below is the link to the D and C++ code, in case anyone is interested to have a look.
>>
>> https://dl.dropboxusercontent.com/u/974356/raytracer.d
>> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
> 
> Greetings.
> 
> After few fast changes I manage to get such results:
> [raz at d3 tmp]$ ./a.out
> rendering time 276 ms
> [raz at d3 tmp]$ ./test
> 346 ms, 814 μs, and 5 hnsecs
> 
> 
> ./a.out being binary compiled with clang++ ./test.cxx -std=c++11 -lSDL -O3
> ./test being binary compiled with ldmd2 -O3 -release -inline -noboundscheck ./test.d (Actually I used rdmd with
> --compiler=ldmd2 but I omitted it because it was rather long cmd line :p)
> 
> 
> Here is source code with changes I applied to D-code (I hope you don't mind repasting it): http://dpaste.dzfl.pl/84bb308d
> 
> I am sure there is way more room for improvements and at minimum achieving C++ performance.

You might also try changing:

            float[3] t = mixin("v[]"~op~"rhs.v[]");
            return Vec3(t[0], t[1], t[2]);

for:
            Vec3 t;
            t.v[0] = mixin("v[0] "~op~" rhs.v[0]");
            t.v[1] = mixin("v[1] "~op~" rhs.v[1]");
            t.v[2] = mixin("v[2] "~op~" rhs.v[2]");
            return t;

and so on, avoiding the float[3] and the v[] operations (which would
loop, unless the compiler/optimizer unrolls them (didn't check)).

I tested this change (removing v[] ops) in Vec3 and in
normalize(), and it made your version slightly faster
with DMD (didn't check with ldmd2).

--jm