D 50% slower than C++. What I'm doing wrong?

Sat Apr 14 14:03:22 PDT 2012

On Saturday, 14 April 2012 at 20:58:01 UTC, Somedude wrote:
> Le 14/04/2012 21:53, q66 a écrit :
>> On Saturday, 14 April 2012 at 19:05:40 UTC, ReneSac wrote:
>>> I have this simple binary arithmetic coder in C++ by Mahoney 
>>> and
>>> translated to D by Maffi. I added "notrow", "final" and 
>>> "pure"  and
>>> "GC.disable" where it was possible, but that didn't made much
>>> difference. Adding "const" to the Predictor.p() (as in the C++
>>> version) gave 3% higher performance. Here the two versions:
>>>
>>> http://mattmahoney.net/dc/  <-- original zip
>>>
>>> http://pastebin.com/55x9dT9C  <-- Original C++ version.
>>> http://pastebin.com/TYT7XdwX  <-- Modified D translation.
>>>
>>> The problem is that the D version is 50% slower:
>>>
>>> test.fpaq0 (16562521 bytes) -> test.bmp (33159254 bytes)
>>>
>>> Lang| Comp  | Binary size | Time (lower is better)
>>> C++  (g++)  -      13kb   -  2.42s  (100%)   -O3 -s
>>> D    (DMD)  -     230kb   -  4.46s  (184%)   -O -release 
>>> -inline
>>> D    (GDC)  -    1322kb   -  3.69s  (152%)   -O3 -frelease -s
>>>
>>>
>>> The only diference I could see between the C++ and D versions 
>>> is that
>>> C++ has hints to the compiler about which functions to 
>>> inline, and I
>>> could't find anything similar in D. So I manually inlined the 
>>> encode
>>> and decode functions:
>>>
>>> http://pastebin.com/N4nuyVMh  - Manual inline
>>>
>>> D    (DMD)  -     228kb   -  3.70s  (153%)   -O -release 
>>> -inline
>>> D    (GDC)  -    1318kb   -  3.50s  (144%)   -O3 -frelease -s
>>>
>>> Still, the D version is slower. What makes this speed 
>>> diference? Is
>>> there any way to side-step this?
>>>
>>> Note that this simple C++ version can be made more than 2 
>>> times faster
>>> with algoritimical and io optimizations, (ab)using templates, 
>>> etc. So
>>> I'm not asking for generic speed optimizations, but only 
>>> things that
>>> may make the D code "more equal" to the C++ code.
>> 
>> I wrote a version http://codepad.org/phpLP7cx based on the C++ 
>> one.
>> 
>> My commands used to compile:
>> 
>> g++46 -O3 -s fpaq0.cpp -o fpaq0cpp
>> dmd -O -release -inline -noboundscheck fpaq0.d
>> 
>> G++ 4.6, dmd 2.059.
>> 
>> I did 5 tests for each:
>> 
>> test.fpaq0 (34603008 bytes) -> test.bmp (34610367 bytes)
>> 
>> The C++ average result was 9.99 seconds (varying from 9.98 to 
>> 10.01)
>> The D average result was 12.00 seconds (varying from 11.98 to 
>> 12.01)
>> 
>> That means there is 16.8 percent difference in performance 
>> that would be
>> cleared out by usage of gdc (which I don't have around 
>> currently).
>
> The code is nearly identical (there is a slight difference in 
> update(),
> where he accesses the array once more than you), but the main 
> difference
> I see is the -noboundscheck compilation option on DMD.

He also uses a class. And -noboundscheck should be automatically 
induced by -release.