stride in slices

Tue Jun 5 16:50:29 UTC 2018

On Tuesday, 5 June 2018 at 13:05:56 UTC, Steven Schveighoffer 
wrote:
> On 6/4/18 5:52 PM, DigitalDesigns wrote:
>> On Monday, 4 June 2018 at 17:40:57 UTC, Dennis wrote:
>>> On Monday, 4 June 2018 at 15:43:20 UTC, Steven Schveighoffer 
>>> wrote:
>>>> Note, it's not going to necessarily be as efficient, but 
>>>> it's likely to be close.
>>>>
>>>> -Steve
>>>
>>> I've compared the range versions with a for-loop. For 
>>> integers and longs or high stride amounts the time is roughly 
>>> equal, but for bytes with low stride amounts it can be up to 
>>> twice as slow.
>>> https://run.dlang.io/is/BoTflQ
>>>
>>> 50 Mb array, type = byte, stride = 3, compiler = LDC -O4 
>>> -release
>>> For-loop  18 ms
>>> Fill(0)   33 ms
>>> each!     33 ms
>>>
>>> With stride = 13:
>>> For-loop  7.3 ms
>>> Fill(0)   7.5 ms
>>> each!     7.8 ms
>> 
>> 
>> This is why I wanted to make sure! I would be using it for a 
>> stride of 2 and it seems it might have doubled the cost for no 
>> other reason than using ranged. Ranges are great but one can't 
>> reason about what is happening in then as easy as a direct 
>> loop so I wanted to be sure. Thanks for running the test!
>
> See later postings from Ethan and others. It's a matter of 
> optimization being able to see the "whole thing". This is why 
> for loops are sometimes better. It's not inherent with ranges, 
> but if you use the right optimization flags, it's done as fast 
> as if you hand-wrote it.
>
> What I've found with D (and especially LDC) is that when you 
> give the compiler everything to work with, it can do some 
> seemingly magic things.
>
> -Steve

It would be nice if testing could be done. Maybe even profiling 
in unit tests to make sure ranges are within some margin of 
error(10%). One of the main reasons I don't use ranges is I 
simply don't have faith they will be as fast as direct encoding. 
While they might offer a slightly easier syntax I don't know what 
is going on under the hood so I can't reason about it(unless I 
look up the source). With a for loop, it is pretty much a wrapper 
on internal cpu logic so it will be near as fast as possible.

I suppose in the long run ranges do have the potential to out 
perform since they do abstract but there is no guarantee they 
will even come close. Having some "proof" that they are working 
well would ease my mind. As this thread shows, ranges have some 
major issues. Imagine having some code on your machine that is 
very performant but on another machine in a slightly different 
circumstances it runs poorly. Now, say it is the stride issue... 
One normally would not think of that being an issue so one will 
look in other areas and could waste times. At least with direct 
loops you pretty much get what you see. It is very easy for 
ranges to be slow but more difficult for them to be fast.