D 50% slower than C++. What I'm doing wrong?

q66 quaker66 at gmail.com
Sat Apr 14 12:53:05 PDT 2012


On Saturday, 14 April 2012 at 19:05:40 UTC, ReneSac wrote:
> I have this simple binary arithmetic coder in C++ by Mahoney 
> and translated to D by Maffi. I added "notrow", "final" and 
> "pure"  and "GC.disable" where it was possible, but that didn't 
> made much difference. Adding "const" to the Predictor.p() (as 
> in the C++ version) gave 3% higher performance. Here the two 
> versions:
>
> http://mattmahoney.net/dc/  <-- original zip
>
> http://pastebin.com/55x9dT9C  <-- Original C++ version.
> http://pastebin.com/TYT7XdwX  <-- Modified D translation.
>
> The problem is that the D version is 50% slower:
>
> test.fpaq0 (16562521 bytes) -> test.bmp (33159254 bytes)
>
> Lang| Comp  | Binary size | Time (lower is better)
> C++  (g++)  -      13kb   -  2.42s  (100%)   -O3 -s
> D    (DMD)  -     230kb   -  4.46s  (184%)   -O -release -inline
> D    (GDC)  -    1322kb   -  3.69s  (152%)   -O3 -frelease -s
>
>
> The only diference I could see between the C++ and D versions 
> is that C++ has hints to the compiler about which functions to 
> inline, and I could't find anything similar in D. So I manually 
> inlined the encode and decode functions:
>
> http://pastebin.com/N4nuyVMh  - Manual inline
>
> D    (DMD)  -     228kb   -  3.70s  (153%)   -O -release -inline
> D    (GDC)  -    1318kb   -  3.50s  (144%)   -O3 -frelease -s
>
> Still, the D version is slower. What makes this speed 
> diference? Is there any way to side-step this?
>
> Note that this simple C++ version can be made more than 2 times 
> faster with algoritimical and io optimizations, (ab)using 
> templates, etc. So I'm not asking for generic speed 
> optimizations, but only things that may make the D code "more 
> equal" to the C++ code.

I wrote a version http://codepad.org/phpLP7cx based on the C++ 
one.

My commands used to compile:

g++46 -O3 -s fpaq0.cpp -o fpaq0cpp
dmd -O -release -inline -noboundscheck fpaq0.d

G++ 4.6, dmd 2.059.

I did 5 tests for each:

test.fpaq0 (34603008 bytes) -> test.bmp (34610367 bytes)

The C++ average result was 9.99 seconds (varying from 9.98 to 
10.01)
The D average result was 12.00 seconds (varying from 11.98 to 
12.01)

That means there is 16.8 percent difference in performance that 
would be cleared out by usage of gdc (which I don't have around 
currently).


More information about the Digitalmars-d-learn mailing list