Let's avoid range specializations - make ranges great again!
David Nadlinger via Digitalmars-d
digitalmars-d at puremagic.com
Sat Jun 4 12:06:51 PDT 2016
On Saturday, 4 June 2016 at 17:35:23 UTC, Seb wrote:
> ldc does the job (nearly) correctly.
Scratch the "nearly" – at least you can't infer that from the
performance results alone, the differences are well within the
run-to-run noise.
>> dmd -release -O -boundscheck=off test_looping.d &&
>> ./test_looping
You seem to be missing `-inline` here, although it doesn't seem
to influence the results (DMD 2.071.0/OS X x86_64).
> 2) Should it matter whether I use r.save or r[i..$]?
>
> Answer: it does for dmd and ldc.
>
> […]
>
> Open question: how can we make DMD/LDC smarter, so that they
> inline range primitives […]?
Is this really the conclusion to draw from this benchmark? First
of all, unless I'm mistaken, the three implementations are not
equivalent – f_for seems to ignore the first element by doing an
extra `popFront()`.
Fixing that, the implementations seem to be essentially
equivalent in terms of execution time (LDC master,
i7-4980HQ at 2.8GHz):
```
0 17 secs, 840 ms, 770 μs, and 9 hnsecs
1 16 secs, 680 ms, 80 μs, and 6 hnsecs
2 17 secs, 635 ms, 548 μs, and 1 hnsec
```
Even though this is using a larger number of iterations for
illustrative purposes, the difference still is to be in the
run-to-run noise.
Comparing the x86_64 assembly produced for 1 and 2, everything is
inlined correctly in both cases. The "reversed" loop structure
does lead to a slightly higher number of instructions in the
loop, though, which might or might not be measurable:
---
__D9test_save17__T9f_foreachTAmZ9f_foreachFNaNbNiNfAmmZAm:
xor eax, eax
test rsi, rsi
je LBB11_4
mov rcx, rdx
.align 4, 0x90
LBB11_2:
cmp qword ptr [rcx], rdi
je LBB11_5
inc rax
add rcx, 8
cmp rsi, rax
ja LBB11_2
LBB11_4:
mov rax, rsi
ret
LBB11_5:
sub rsi, rax
mov rax, rsi
mov rdx, rcx
ret
__D9test_save13__T5f_forTAmZ5f_forFNaNbNiNfAmmZAm:
test rsi, rsi
je LBB13_4
mov rax, rsi
dec rax
mov rcx, rdx
add rcx, 8
.align 4, 0x90
LBB13_2:
cmp qword ptr [rcx - 8], rdi
je LBB13_4
mov rdx, rcx
mov rsi, rax
dec rax
add rcx, 8
cmp rax, -1
jne LBB13_2
LBB13_4:
mov rax, rsi
ret
---
— David
More information about the Digitalmars-d
mailing list