Let's avoid range specializations - make ranges great again!

Sat Jun 4 12:06:51 PDT 2016

On Saturday, 4 June 2016 at 17:35:23 UTC, Seb wrote:
> ldc does the job (nearly) correctly.

Scratch the "nearly" – at least you can't infer that from the 
performance results alone, the differences are well within the 
run-to-run noise.

>> dmd -release -O -boundscheck=off test_looping.d && 
>> ./test_looping

You seem to be missing `-inline` here, although it doesn't seem 
to influence the results (DMD 2.071.0/OS X x86_64).

> 2) Should it matter whether I use r.save or r[i..$]?
>
> Answer: it does for dmd and ldc.
>
> […]
>
> Open question: how can we make DMD/LDC smarter, so that they 
> inline range primitives […]?

Is this really the conclusion to draw from this benchmark? First 
of all, unless I'm mistaken, the three implementations are not 
equivalent – f_for seems to ignore the first element by doing an 
extra `popFront()`.

Fixing that, the implementations seem to be essentially 
equivalent in terms of execution time (LDC master, 
i7-4980HQ at 2.8GHz):
```
0 17 secs, 840 ms, 770 μs, and 9 hnsecs
1 16 secs, 680 ms, 80 μs, and 6 hnsecs
2 17 secs, 635 ms, 548 μs, and 1 hnsec
```
Even though this is using a larger number of iterations for 
illustrative purposes, the difference still is to be in the 
run-to-run noise.

Comparing the x86_64 assembly produced for 1 and 2, everything is 
inlined correctly in both cases. The "reversed" loop structure 
does lead to a slightly higher number of instructions in the 
loop, though, which might or might not be measurable:

---
__D9test_save17__T9f_foreachTAmZ9f_foreachFNaNbNiNfAmmZAm:
     xor eax, eax
     test    rsi, rsi
     je  LBB11_4
     mov rcx, rdx
     .align  4, 0x90
LBB11_2:
     cmp qword ptr [rcx], rdi
     je  LBB11_5
     inc rax
     add rcx, 8
     cmp rsi, rax
     ja  LBB11_2
LBB11_4:
     mov rax, rsi
     ret
LBB11_5:
     sub rsi, rax
     mov rax, rsi
     mov rdx, rcx
     ret

__D9test_save13__T5f_forTAmZ5f_forFNaNbNiNfAmmZAm:
     test    rsi, rsi
     je  LBB13_4
     mov rax, rsi
     dec rax
     mov rcx, rdx
     add rcx, 8
     .align  4, 0x90
LBB13_2:
     cmp qword ptr [rcx - 8], rdi
     je  LBB13_4
     mov rdx, rcx
     mov rsi, rax
     dec rax
     add rcx, 8
     cmp rax, -1
     jne LBB13_2
LBB13_4:
     mov rax, rsi
     ret
---

  — David