Slow performance compared to C++, ideas?

Thu May 30 22:42:53 PDT 2013

On Friday, 31 May 2013 at 05:35:58 UTC, Juan Manuel Cabo wrote:
> On 05/31/2013 02:15 AM, nazriel wrote:
>> On Friday, 31 May 2013 at 01:26:13 UTC, finalpatch wrote:
>>> Recently I ported a simple ray tracer I wrote in C++11 to D. 
>>> Thanks to the similarity between D and C++ it was almost
>>> a line by line translation, in other words, very very close. 
>>> However, the D verson runs much slower than the C++11
>>> version. On Windows, with MinGW GCC and GDC, the C++ version 
>>> is twice as fast as the D version. On OSX, I used Clang++
>>> and LDC, and the C++11 version was 4x faster than D verson.  
>>> Since the comparison were between compilers that share
>>> the same codegen backends I suppose that's a relatively fair 
>>> comparison.  (flags used for GDC: -O3 -fno-bounds-check
>>> -frelease,  flags used for LDC: -O3 -release)
>>>
>>> I really like the features offered by D but it's the raw 
>>> performance that's worrying me. From what I read D should
>>> offer similar performance when doing similar things but my 
>>> own test results is not consistent with this claim. I want
>>> to know whether this slowness is inherent to the language or 
>>> it's something I was not doing right (very possible
>>> because I have only a few days of experience with D).
>>>
>>> Below is the link to the D and C++ code, in case anyone is 
>>> interested to have a look.
>>>
>>> https://dl.dropboxusercontent.com/u/974356/raytracer.d
>>> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
>> 
>> Greetings.
>> 
>> After few fast changes I manage to get such results:
>> [raz at d3 tmp]$ ./a.out
>> rendering time 276 ms
>> [raz at d3 tmp]$ ./test
>> 346 ms, 814 μs, and 5 hnsecs
>> 
>> 
>> ./a.out being binary compiled with clang++ ./test.cxx 
>> -std=c++11 -lSDL -O3
>> ./test being binary compiled with ldmd2 -O3 -release -inline 
>> -noboundscheck ./test.d (Actually I used rdmd with
>> --compiler=ldmd2 but I omitted it because it was rather long 
>> cmd line :p)
>> 
>> 
>> Here is source code with changes I applied to D-code (I hope 
>> you don't mind repasting it): http://dpaste.dzfl.pl/84bb308d
>> 
>> I am sure there is way more room for improvements and at 
>> minimum achieving C++ performance.
>
>
> You might also try changing:
>
>             float[3] t = mixin("v[]"~op~"rhs.v[]");
>             return Vec3(t[0], t[1], t[2]);
>
> for:
>             Vec3 t;
>             t.v[0] = mixin("v[0] "~op~" rhs.v[0]");
>             t.v[1] = mixin("v[1] "~op~" rhs.v[1]");
>             t.v[2] = mixin("v[2] "~op~" rhs.v[2]");
>             return t;
>
> and so on, avoiding the float[3] and the v[] operations (which 
> would
> loop, unless the compiler/optimizer unrolls them (didn't 
> check)).
>
> I tested this change (removing v[] ops) in Vec3 and in
> normalize(), and it made your version slightly faster
> with DMD (didn't check with ldmd2).
>
> --jm

Right, I missed that. Thanks!

Now it is:

[raz at d3 tmp]$ ./a.out
rendering time 276 ms
[raz at d3 tmp]$ ./test
238 ms, 35 μs, and 7 hnsecs

So D version starts to be faster than C++ one.