Let's avoid range specializations - make ranges great again!
Seb via Digitalmars-d
digitalmars-d at puremagic.com
Sat Jun 4 10:35:23 PDT 2016
In Phobos (especially in std.algorithm) a lot of specialization
between a RandomAccessRange and an InputRange are used even
though only an InputRange is required.
The reason for this is solely performance - imho we should start
to tackle this and remove such specializations. Let's create
better code!
To quote David from the recent PR
(https://github.com/dlang/phobos/pull/4265) where adding a
identity specialization achieves a 5-10x speedup for
{min,max}Element.
> I think that we have a larger issue at hand here. Ranges are of
> prime strategic importance for D as an accessible way of
> composing complex operations out of reusable primitives. A
> large part of this comes from the fact that they are supposed
> to be zero-cost abstractions – after the compiler optimiser has
> had its way with a piece of code, we expect it to be equivalent
> to a hand-written loop.
1) Should it matter whether generic range looping or
"specialized" r[i] access is used?
Answer for dmd it does (7x slower), ldc does the job (nearly)
correctly.
```
> dmd -release -O -boundscheck=off test_looping.d &&
> ./test_looping
random_access 685 ms, 351 μs, and 9 hnsecs
foreach 685 ms, 888 μs, and 4 hnsecs
foreach_ref 685 ms, 654 μs, and 4 hnsecs
for_range_api 7 secs, 211 ms, 530 μs, and 3 hnsecs
```
> ldc -release -O3 -boundscheck=off test_looping.d &&
> ./test_looping
random_access 142 ms and 88 μs
foreach 142 ms, 8 μs, and 3 hnsecs
foreach_ref 142 ms, 400 μs, and 9 hnsecs
for_range_api 143 ms, 394 μs, and 1 hnsec
```
Benchmarking code:
https://github.com/wilzbach/perf-ranges/blob/master/test_looping.d
Bug report for DMD: https://issues.dlang.org/show_bug.cgi?id=16120
(Tested with both ldc 0.17 and 1.0)
2) Should it matter whether I use r.save or r[i..$]?
Answer: it does for dmd and ldc.
```
> dmd -release -O -boundscheck=off test_save.d && ./test_save
random_access 424 ms and 72 μs
foreach_random 420 ms, 754 μs, and 5 hnsecs
for_range_api 2 secs, 562 ms, 415 μs, and 5 hnsecs
> ldc -release -O3 -boundscheck=off test_save.d && ./test_save
random_access 245 ms, 713 μs, and 4 hnsecs
foreach_random 241 ms, 533 μs, and 1 hnsec
for_range_api 349 ms, 582 μs, and 9 hnsecs
```
Benchmarking code:
https://github.com/wilzbach/perf-ranges/blob/master/test_save.d
Open question: how can we make DMD/LDC smarter, so that they
inline range primitives and the promise about the zero-cost
abstractions is full-filled?
More information about the Digitalmars-d
mailing list