D 50% slower than C++. What I'm doing wrong?

Sat Apr 14 12:51:08 PDT 2012

On 14/04/12 21:05, ReneSac wrote:
> Lang| Comp | Binary size | Time (lower is better)
> C++ (g++) - 13kb - 2.42s (100%) -O3 -s
> D (DMD) - 230kb - 4.46s (184%) -O -release -inline
> D (GDC) - 1322kb - 3.69s (152%) -O3 -frelease -s

Try using extra optimizations for GDC.  Actually, GDC has a "dmd-like" 
interface, gdmd, and

    gdmd -O -release -inline

corresponds to

    gdc -O3 -fweb -frelease -finline-functions

... so there may be some optimizations you were missing.  (If you call gdmd with 
the -vdmd flag, it will tell you exactly what gdc statement it's using.)

> The only diference I could see between the C++ and D versions is that C++ has
> hints to the compiler about which functions to inline, and I could't find
> anything similar in D. So I manually inlined the encode and decode functions:

GDC has all the regular gcc optimization flags available IIRC.  The ones on the 
GDC man page are just the ones specific to GDC.

> Still, the D version is slower. What makes this speed diference? Is there any
> way to side-step this?

In my (brief and limited) experience GDC produced executables tend to have a 
noticeable but minor gap compared to equivalent g++ compiled C++ code -- nothing 
on the order of 150%.

E.g. I have some simulation code which models a reputation system where users 
rate objects and are then in turn judged on the consistency of their ratings 
with the general consensus.  A simulation with 1000 users and 1000 objects takes 
~22s to run in C++, ~24s in D compiled with gdmd -O -release -inline.

Scale that up to 2000 users and 1000 objects and it's 47s (C++) vs 53s (D).
2000 users and 2000 objects gives 1min 49s (C++) and 2min 4s (D).

So, it's a gap, but not one to write home about really, especially when you 
count that D is safer and (I think) easier/faster to program in.

It's true that DMD is much slower -- the GCC backend is much better at 
generating fast code.  If I recall right the DMD backend's encoding of floating 
point operations is considerably less sophisticated.

> Note that this simple C++ version can be made more than 2 times faster with
> algoritimical and io optimizations, (ab)using templates, etc. So I'm not asking
> for generic speed optimizations, but only things that may make the D code "more
> equal" to the C++ code.

I'm sure you can make various improvements to your D code in a similar way, and 
there are some things that improve in D when written in idiomatic "D style" as 
opposed to a more C++ish way of doing things (e.g. if you want to copy 1 vector 
to another, as happens in my code, write x[] = y[] instead of doing any kind of 
loop).

Best wishes,

     -- Joe