D outperformed by C++, what am I doing wrong?
amfvcg via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Aug 13 02:15:48 PDT 2017
On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov
[ZombineDev] wrote:
>
> There's one especially interesting result:
>
> This instantiation:
>
> sum_subranges(std.range.iota!(int, int).iota(int, int).Result,
> uint)
>
> of the following function:
>
> auto sum_subranges(T)(T input, uint range)
> {
> import std.range : chunks, ElementType, array;
> import std.algorithm : map;
> return input.chunks(range).map!(sum);
> }
>
> gets optimized with LDC to:
> push rax
> test edi, edi
> je .LBB2_2
> mov edx, edi
> mov rax, rsi
> pop rcx
> ret
> .LBB2_2:
> lea rsi, [rip + .L.str.3]
> lea rcx, [rip + .L.str]
> mov edi, 45
> mov edx, 89
> mov r8d, 6779
> call _d_assert_msg at PLT
>
> I.e. the compiler turned a O(n) algorithm to O(1), which is
> quite neat. It is also quite surprising to me that it looks
> like even dmd managed to do a similar optimization:
>
> sum_subranges(std.range.iota!(int, int).iota(int, int).Result,
> uint):
> push rbp
> mov rbp,rsp
> sub rsp,0x30
> mov DWORD PTR [rbp-0x8],edi
> mov r9d,DWORD PTR [rbp-0x8]
> test r9,r9
> jne 41
> mov r8d,0x1b67
> mov ecx,0x0
> mov eax,0x61
> mov rdx,rax
> mov QWORD PTR [rbp-0x28],rdx
> mov edx,0x0
> mov edi,0x2d
> mov rsi,rdx
> mov rdx,QWORD PTR [rbp-0x28]
> call 41
> 41: mov QWORD PTR [rbp-0x20],rsi
> mov QWORD PTR [rbp-0x18],r9
> mov rdx,QWORD PTR [rbp-0x18]
> mov rax,QWORD PTR [rbp-0x20]
> mov rsp,rbp algorithms a
> pop rbp
> ret
>
> Moral of the story: templates + ranges are an awesome
> combination.
Change the parameter for this array size to be taken from stdin
and I assume that these optimizations will go away.
More information about the Digitalmars-d-learn
mailing list