ideas about ranges

Fri May 22 08:48:34 PDT 2009

Steven Schveighoffer wrote:
> 
> The thread discussing what to do for input ranges vs. forward ranges got 
> me thinking.
> 
> The range concept may be defined backwards in terms of which is more 
> specialized.  Consider that an input range is always usable as a 
> stream.  But a stream is not easy to use as an input range (the range 
> primitive).

The hierarchy of concepts is:

input range < forward range < bidirectional range < random-access range

Having length, having slicing, and infinity are orthogonal properties.

> Case in point, a file.  To fit into the most primitive range concept, it 
> must define 3 functions:
> 
> front()
> popFront()
> empty()
> 
> empty is easy, it's just "am I at end of file"
> 
> But front is not so easy.  In order to know what's at the front, you 
> need to read it.  And at that point you've altered the underlying file.

Not even empty is easy. If you're defining a file range that gives you 
e.g. whitespace-separated integers, you can't have empty return feof(p) 
because there might be only spaces through the end of file. So you need 
to cache one element ahead to be able to implement empty().

I've discussed this with Walter for the longest time. The correct 
primitive for an input range that needs to do no caching is:

Tuple!(T, "data", bool, "got") popNext();

When you call popNext it decides in the same place whether there is 
data, and also gives it to you. If "got" comes off as false, you know 
you're done.

There are two issues with this form. It's not easy to use (even if you 
play around with the signature by e.g. passing data or bool by 
reference) and it's not easy to inline for non-input ranges. The second 
problem could possibly be eliminated, but the first stays. It's just too 
clunky a primitive to use, you'll need temporaries and all that stuff.

> Look at the implementation of front and popFront for a possible 
> FileByChar implementation:
> 
> dchar front()
> {
>   if(!bufferValid)
>     popFront();
>   return buffer;
> }
> 
> void popFront()
> {
>    // read buffer from source, setting bufferValid if the range isn't 
> empty by default
> }
> 
> What sucks about this is, we have to introduce a buffer in the range, 
> just because we can't look at data until we've popped it.  Not only 
> that, but calling front a file before anything is read requires a check 
> to fill the buffer in case we haven't read anything yet.  This could be 
> alleviated by filling in the constructor, but it's still more complex 
> than necessary.  Consider also that the underlying stream might already 
> be buffered, so we are buffering a buffer.

Yah, I agree. I've implemented a few of those in Phobos. It would be 
good to have a more solid solution. This problem, however, also collides 
with returning an rvalue versus a ref. A container wants to return a 
ref. An input range wants to return by value. Then it's difficult to use 
a container as an input range.

> And finally, if you copy such a range, the buffer might be copied while 
> the stream itself may not.  this could result in strange garbage data.

I don't understand this. You could make sure copy does the right thing.

> But since the primitives for input range are set by the compiler (it 
> uses them to do foreach), we have to implement them to make our stream 
> ranges friendly to foreach.
> 
> Round peg, meet square hole.
> 
> But what are the true requirements for iteration using foreach?
> 
> 1. check if there's anything left
> 2. get the next element
> 
> Step 2 now is split into popFront and front.  So a foreach loop is a 
> rewritten for loop like this:
> 
> foreach(x; range)
> {
>   ...
> }
> 
> translates to:
> {
>   auto _r = range;
>   while(!_r.empty)
>   {
>     auto x = _r.front();
>     _r.popFront();
>     ...
>   }
> }
> 
> What if step 2 was one function?  Call it popNext(), and make it 
> equivalent to calling _r.front() and popFront() in one step on ranges 
> that implement that method.

This will not solve all problems. It will improve things like defining 
ranges that read one character at a time from a stream. But a function 
that does read-and-check-for-empty in one shot is the true solution.

> How does this work with foreach?
> 
> {
>   auto _r = range;
>   while(!_r.empty)
>   {
>     auto x = _r.popNext();
>     ...
>   }
> }
> 
> Basically, the same code, one less line.
> 
> Consider that any range defined today with front() and popFront() can 
> implement popNext (and popNext could be an external function if we can 
> get 3015 resolved).
> 
> So what I think we may need is a different range primitive:
> 
> An iterable range defines: (name to be decided)
> 
> bool empty()
> T popNext()
> 
> An input range is an iterable range that also defines:
> 
> T front();
> popFront();

I think you are using "iterable range" instead of "input range" and 
"input range" instead of "forward range". This is compared to STL 
terminology, which I borrowed.

> Now look at our FileByChar example as an iterable range:
> 
> T popNext()
> {
>    return source.get(); // no buffering required
> }
> 
> And it works perfectly with the new foreach requirements.
> 
> And it correctly doesn't work with algorithms that require front and 
> popFront.

That's great. Again, popNext integrated with check for empty is the best 
solution. The correct way to go is to define this:

T popNext(ref bool gotSomething);

and leave it to the discretion of the range if they prefer returning by 
reference:

ref T popNext(ref bool gotSomething);

(this might be good for many forward ranges.) Then we nicely ask Walter 
to allow inlining of such functions, and finally implement this in 
std.range:

ref ElementType!R popNext(R)(R r, ref bool gotSomething)
     if (isForwardRange!R)
{
     if (r.empty)
     {
         gotSomething = false;
         static typeof(return) dumbo;
         return dumbo;
     }
     auto result = &(r.front());
     r.popFront();
     gotSomething = true;
     return *result;
}

Admittedly this looks considerably messier than it ought to, and that's 
never a good sign. For starters, I could predict with accuracy the 
sneering remarks of certain posters who shall remain unnamed. Worse, 
they'd have a point (only) this one time :o). Messiness was one of the 
factors that made me decide to steer away from this design. A simpler 
solution is to just return by value:

ElementType!R popNext(R)(R r, ref bool gotSomething)
     if (isForwardRange!R)
{
     if (r.empty)
     {
         gotSomething = false;
         return typeof(return).init;
     }
     auto result = r.front;
     r.popFront();
     return result;
}

This looks a tad more sane. But then it copies data more than it 
strictly should, and for whom? For everything that's not strictly a file 
input, which is most things you want to iterate! Wrong default.

As of this time, I am undecided on what's the best way to go. Opinions 
are welcome.

Andrei