stride in slices
Meta
jared771 at gmail.com
Tue Jun 5 03:13:05 UTC 2018
On Monday, 4 June 2018 at 23:08:17 UTC, Ethan wrote:
> On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer
> wrote:
>> BTW, do you have cross-module inlining on?
>
> Just to drive this point home.
>
> https://run.dlang.io/is/nrdzb0
>
> Manually implemented stride and fill with everything forced
> inline. Otherwise, the original code is unchanged.
>
> 17 ms, 891 μs, and 6 hnsecs
> 15 ms, 694 μs, and 1 hnsec
> 15 ms, 570 μs, and 9 hnsecs
>
> My new stride outperformed std.range stride, and the manual
> for-loop. And, because the third test uses the new stride, it
> also benefited. But interestingly runs every so slightly
> faster...
Just as an aside:
...
pragma( inline ) @property length() const { return
range.length / strideCount; }
pragma( inline ) @property empty() const { return currFront >
range.length; }
pragma( inline ) @property ref Elem front() { return range[
currFront ]; }
pragma( inline ) void popFront() { currFront += strideCount; }
...
pragma( inline ) auto stride( Range )( Range r, int a )
...
pragma( inline ) auto fill( Range, Value )( Range r, Value v )
...
pragma(inline), without any argument, does not force inlining. It
actually does nothing; it just specifies that the
"implementation's default behaviour" should be used. You have to
annotate with pragma(inline, true) to force inlining
(https://dlang.org/spec/pragma.html#inline).
When I change all the pragma(inline) to pragma(inline, true),
there is a non-trivial speedup:
14 ms, 517 μs, and 9 hnsecs
13 ms, 110 μs, and 1 hnsec
13 ms, 199 μs, and 9 hnsecs
There's further reductions using ldc-beta:
14 ms, 520 μs, and 4 hnsecs
13 ms, 87 μs, and 2 hnsecs
12 ms, 938 μs, and 8 hnsecs
More information about the Digitalmars-d
mailing list