protocol for using InputRanges

Sun Mar 23 00:53:22 PDT 2014

On Saturday, March 22, 2014 17:50:34 Walter Bright wrote:
> It's become clear to me that we've underspecified what an InputRange is. The
> normal way to use it is:
> 
>      while (!r.empty) {
>          auto e = r.front;
>          ... do something with e ...
>          r.popFront();
>      }
> 
> no argument there. But there are two issues:
> 
> 1. If you know the range is not empty, is it allowed to call r.front without
> calling r.empty first?
> 
> If this is true, extra logic will need to be added to r.front in many cases.

You definitely don't have to call empty before calling front if you know that 
it's not empty. Both front and empty should normally be pure (or at least act 
that way) and essentially act like variables. In most cases, it works best for 
the work of the range to go in popFront. The exception is when you're dealing 
with a random-access range, since then any element could be accessed, making 
it so that you can't be doing the work in popFront. I think that we have a 
general agreement on this based on previous discussions, though it's certainly 
not unanimous.

> 2. Can r.front be called n times in a row? I.e. is calling front()
> destructive?
> 
> If true, this means that r.front will have to cache a copy in many cases.

If calling front were destructive, that would break a lot of code. It's 
probably true that most range-based code should avoid calling front multiple 
times (in case front has to do more work than just return the value as well as 
to avoid copying the result if that happens on every call), though if front is 
auto ref, it could be more efficient to call it multiple times. So, it's not 
entirely clear-cut.

But again, front and empty should normally function as if they were variables. 
They should be property functions and should be pure (or at least act like 
they're pure). I'm sure that a _lot_ of code will break if that isn't 
followed.

There are corner cases which can get a bit mucky though - e.g.

auto a = map!(to!string)(range);

In this case, front is pure, but it returns a new value each time (albeit a 
value that's equal each time until popFront is called). And there's no real 
way to fix that if the resulting range is random access (though if it weren't, 
the work could go in popFront, which _would_ make it so that front always 
returned the same result).

And there have been arguments over whether the result of front should be valid 
after popFront has been called (i.e. whether it's transient or not). A lot of 
code assumes that it will be, but we have some nasty exceptions (e.g. 
std.stdio.ByLine) - typically because front's a buffer which gets reused. 
IIRC, in those cases, Andrei favored saying that input ranges that weren't 
forward ranges could have a transient front but that forward ranges couldn't 
(which I tend to agree with, though I'd prefer that _no_ ranges have transient 
fronts, since it can really cause problems - e.g. std.array.array not 
working). I don't think that a consensus was reached on that though, since a 
few folks really liked using transient fronts with more complicated ranges.

In general though, I think that most of us would agree that front and empty 
should be treated as properties - i.e. as if they were variables - and that 
they should have try to stick to those semantics as closely as possible. 
Ranges that stray from that seriously risk not working with a lot of range-
based code.

- Jonathan M Davis