Difference between input range and forward range

Thu Nov 12 01:11:20 PST 2015

On Wednesday, 11 November 2015 at 22:34:32 UTC, Jesse Phillips 
wrote:
> On Tuesday, 10 November 2015 at 16:07:01 UTC, Jonathan M Davis 
> wrote:
>> generic code, you have to consider it to be consumed, because 
>> the state of range you passed to foreach is now undefined, 
>> since what happens when copying the range is undefined. This 
>> is true even if you put a break out of the loop, because the 
>> range was copied, and you simply cannot rely on the state of 
>> the range you passed to foreach after that copy.
>
> The problem I find with generic code is when the desire is to 
> consume the data. Take this example of parsing a data stream.
>
>     auto osmData = datastream.take(size).array;
>     datastream.popFrontN(size);
>     auto header = BlobHeader(osmData);
>
> http://he-the-great.livejournal.com/49636.html
>
> How do I know if popFrontN is needed? If I was given a value 
> base range then it is. If I was given a reference range (in its 
> many forms) the extra call to popFrontN will result in an 
> unneeded data jump. I could require that a forward range is 
> passed in, then I can save() before calling .array and thus 
> always require popFrontN.
>
> The best option is probably to use the RefRange wrapper, but it 
> does create an annoying element of surprise.

Well, if we're talking forward ranges, then the only way to be 
100% consistent with this is to basically consider datastream 
unusable after the call to take, because its state is undefined. 
So, the correct way to handle this would be to do something like

auto osmData = datastream.save.take(size).array;
datastream.popFrontN(size);

Now, that's ugly, but it does ensure that the code will work 
correctly regardless of whether the range is a reference type, 
value type, or pseudo-reference type. And if you wanted to 
guaranteed reference semantics, as you say, you could use 
RefRange, though you probably do have to be careful about that.

Regardless, it highlights how save needs to be called all over 
the place if you want ranges which are reference types to work 
consistently with value types.

The bigger problem is pure input ranges. If you do

auto osmData = datastream.take(size).array;

then the state of datastream is undefined (at least in generic 
code), and you can't do _anything_ with it. If datastream is 
reference type, then using it would work just fine, since the 
first size elements would have been consumed, and we shouldn't 
have to worry about value types (because they can always be 
forward ranges - though it's technically possible for someone to 
not make them forward ranges even when they should be). However, 
we _do_ have a problem with pseudo-reference types. A pure input 
range pretty much has to be a reference type with regards to its 
elements (otherwise it could be a forward range), but stuff like 
caching front can turn it into a pseudo-reference type and 
totally break code like this. With a full-on reference type

auto osmData = datastream.take(size).array;

results in datastream being the same as it would have been had 
you called datastream.popFrontN(size) instead. But with a cached 
front, while the subsequent elements would be correct, front 
would be wrong.

So, really, you're highlighting a really nasty aspect of this 
problem. As long as we allow pseudo-reference types for pure 
input ranges (and as I understand it, existing stuff like vibe.d 
has them), I don't see how we can make this code work correctly 
and access any of the elements in the range that were after the 
elements that were accessed via take.

Actually, even with full-on reference types, we're kind of 
screwed with take and input ranges. take is lazy, so if you do

auto osmData = datastream.take(size);
datastream.popFrontN(size);

and datastream is a reference type, then take will end up 
referring to to the second n elements, not the first.

*sigh* Pure input ranges suck. It's ugly as all get out with 
forward ranges, but liberal use of save can guarantee consistent 
semantics. But with pure input ranges...

I suspect that there's a whole pile of algorithms that 
technically should never be used with pure input ranges or which 
require that you be _very_ careful with them. It would be 
ludicrous to not have take work with input ranges, but it's quite 
clear that with a pure input range, calling take _and_ accessing 
elements beyond the ones taken isn't going to work unless you 
know that the range is a reference type, and you make sure that 
you iterate through the result of take _before_ doing anything 
with the rest of the range.

I think that it's pretty clear that we need to re-examine how 
pure input ranges should work and either make a change to them or 
have some very clear guidelines on how to use them (which is not 
likely going to be easy to do correctly) and probably disallow 
them for most algorithms.

Yuck. I'm definitely going to have to stew on this one. I've 
always thought that pure input ranges were too restrictive to be 
useful in most cases (though for some stuff you're pretty much 
stuck with them without doing a lot of buffering), but they seem 
to get worse every time we examine them in depth.

- Jonathan M Davis