Slow performance compared to C++, ideas?
Juan Manuel Cabo
juanmanuel.cabo at gmail.com
Thu May 30 22:35:57 PDT 2013
On 05/31/2013 02:15 AM, nazriel wrote:
> On Friday, 31 May 2013 at 01:26:13 UTC, finalpatch wrote:
>> Recently I ported a simple ray tracer I wrote in C++11 to D. Thanks to the similarity between D and C++ it was almost
>> a line by line translation, in other words, very very close. However, the D verson runs much slower than the C++11
>> version. On Windows, with MinGW GCC and GDC, the C++ version is twice as fast as the D version. On OSX, I used Clang++
>> and LDC, and the C++11 version was 4x faster than D verson. Since the comparison were between compilers that share
>> the same codegen backends I suppose that's a relatively fair comparison. (flags used for GDC: -O3 -fno-bounds-check
>> -frelease, flags used for LDC: -O3 -release)
>>
>> I really like the features offered by D but it's the raw performance that's worrying me. From what I read D should
>> offer similar performance when doing similar things but my own test results is not consistent with this claim. I want
>> to know whether this slowness is inherent to the language or it's something I was not doing right (very possible
>> because I have only a few days of experience with D).
>>
>> Below is the link to the D and C++ code, in case anyone is interested to have a look.
>>
>> https://dl.dropboxusercontent.com/u/974356/raytracer.d
>> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
>
> Greetings.
>
> After few fast changes I manage to get such results:
> [raz at d3 tmp]$ ./a.out
> rendering time 276 ms
> [raz at d3 tmp]$ ./test
> 346 ms, 814 μs, and 5 hnsecs
>
>
> ./a.out being binary compiled with clang++ ./test.cxx -std=c++11 -lSDL -O3
> ./test being binary compiled with ldmd2 -O3 -release -inline -noboundscheck ./test.d (Actually I used rdmd with
> --compiler=ldmd2 but I omitted it because it was rather long cmd line :p)
>
>
> Here is source code with changes I applied to D-code (I hope you don't mind repasting it): http://dpaste.dzfl.pl/84bb308d
>
> I am sure there is way more room for improvements and at minimum achieving C++ performance.
You might also try changing:
float[3] t = mixin("v[]"~op~"rhs.v[]");
return Vec3(t[0], t[1], t[2]);
for:
Vec3 t;
t.v[0] = mixin("v[0] "~op~" rhs.v[0]");
t.v[1] = mixin("v[1] "~op~" rhs.v[1]");
t.v[2] = mixin("v[2] "~op~" rhs.v[2]");
return t;
and so on, avoiding the float[3] and the v[] operations (which would
loop, unless the compiler/optimizer unrolls them (didn't check)).
I tested this change (removing v[] ops) in Vec3 and in
normalize(), and it made your version slightly faster
with DMD (didn't check with ldmd2).
--jm
More information about the Digitalmars-d
mailing list