D outperformed by C++, what am I doing wrong?

via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Aug 13 02:08:14 PDT 2017


On Sunday, 13 August 2017 at 08:43:29 UTC, amfvcg wrote:
> On Sunday, 13 August 2017 at 08:33:53 UTC, Petar Kirov 
> [ZombineDev] wrote:
>>
>> With Daniel's latest version (
>> http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d-learn@puremagic.com )
>>
>> $ ldc2 -O3 --release sum_subranges2.d
>> $ ./sum_subranges2
>> 210 ms, 838 μs, and 8 hnsecs
>> 50000000
>
> Great!!! And that's what I was hoping for.
>
> So the conclusion is:
>
> use the latest ldc that's out there.
>
> Thank you Petar, thank you Daniel. (I cannot change the subject 
> to SOLVED, can I?)
>
> Btw. the idiomatic version of this d sample looks how I 
> imagined it should!

There's one especially interesting result:

This instantiation:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
uint)

of the following function:

auto sum_subranges(T)(T input, uint range)
{
     import std.range : chunks, ElementType, array;
     import std.algorithm : map;
     return input.chunks(range).map!(sum);
}

gets optimized with LDC to:
   push rax
   test edi, edi
   je .LBB2_2
   mov edx, edi
   mov rax, rsi
   pop rcx
   ret
.LBB2_2:
   lea rsi, [rip + .L.str.3]
   lea rcx, [rip + .L.str]
   mov edi, 45
   mov edx, 89
   mov r8d, 6779
   call _d_assert_msg at PLT

I.e. the compiler turned a O(n) algorithm to O(1), which is quite 
neat. It is also quite surprising to me that it looks like even 
dmd managed to do a similar optimization:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
uint):
     push   rbp
     mov    rbp,rsp
     sub    rsp,0x30
     mov    DWORD PTR [rbp-0x8],edi
     mov    r9d,DWORD PTR [rbp-0x8]
     test   r9,r9
     jne    41
     mov    r8d,0x1b67
     mov    ecx,0x0
     mov    eax,0x61
     mov    rdx,rax
     mov    QWORD PTR [rbp-0x28],rdx
     mov    edx,0x0
     mov    edi,0x2d
     mov    rsi,rdx
     mov    rdx,QWORD PTR [rbp-0x28]
     call   41
41: mov    QWORD PTR [rbp-0x20],rsi
     mov    QWORD PTR [rbp-0x18],r9
     mov    rdx,QWORD PTR [rbp-0x18]
     mov    rax,QWORD PTR [rbp-0x20]
     mov    rsp,rbp algorithms a
     pop    rbp
     ret

Moral of the story: templates + ranges are an awesome combination.


More information about the Digitalmars-d-learn mailing list