stride in slices

Tue Jun 5 03:13:05 UTC 2018

On Monday, 4 June 2018 at 23:08:17 UTC, Ethan wrote:
> On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer 
> wrote:
>> BTW, do you have cross-module inlining on?
>
> Just to drive this point home.
>
> https://run.dlang.io/is/nrdzb0
>
> Manually implemented stride and fill with everything forced 
> inline. Otherwise, the original code is unchanged.
>
> 17 ms, 891 μs, and 6 hnsecs
> 15 ms, 694 μs, and 1 hnsec
> 15 ms, 570 μs, and 9 hnsecs
>
> My new stride outperformed std.range stride, and the manual 
> for-loop. And, because the third test uses the new stride, it 
> also benefited. But interestingly runs every so slightly 
> faster...

Just as an aside:

     ...
     pragma( inline ) @property length() const { return 
range.length / strideCount; }
     pragma( inline ) @property empty() const { return currFront > 
range.length; }
     pragma( inline ) @property ref Elem front() { return range[ 
currFront ]; }
     pragma( inline ) void popFront() { currFront += strideCount; }
     ...

     pragma( inline ) auto stride( Range )( Range r, int a )
     ...

     pragma( inline ) auto fill( Range, Value )( Range r, Value v )
     ...

pragma(inline), without any argument, does not force inlining. It 
actually does nothing; it just specifies that the 
"implementation's default behaviour" should be used. You have to 
annotate with pragma(inline, true) to force inlining 
(https://dlang.org/spec/pragma.html#inline).

When I change all the pragma(inline) to pragma(inline, true), 
there is a non-trivial speedup:

14 ms, 517 μs, and 9 hnsecs
13 ms, 110 μs, and 1 hnsec
13 ms, 199 μs, and 9 hnsecs

There's further reductions using ldc-beta:

14 ms, 520 μs, and 4 hnsecs
13 ms, 87 μs, and 2 hnsecs
12 ms, 938 μs, and 8 hnsecs