ideas about ranges

Fri May 22 12:36:37 PDT 2009

On Fri, 22 May 2009 11:48:34 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Steven Schveighoffer wrote:
>>  The thread discussing what to do for input ranges vs. forward ranges  
>> got me thinking.
>>  The range concept may be defined backwards in terms of which is more  
>> specialized.  Consider that an input range is always usable as a  
>> stream.  But a stream is not easy to use as an input range (the range  
>> primitive).
>
> The hierarchy of concepts is:
>
> input range < forward range < bidirectional range < random-access range
>
> Having length, having slicing, and infinity are orthogonal properties.

Right, a stream doesn't fit really well within any of those categories,  
because it's even more simple than an input range.  That was my point.   
You have to bolt functionality and a buffer on top of it to get it to fit  
the model.  Maybe this is inevitable, I was wondering if it would be  
easier to deal with if the range paradigm had another simpler construct  
than input range.

>> Case in point, a file.  To fit into the most primitive range concept,  
>> it must define 3 functions:
>>  front()
>> popFront()
>> empty()
>>  empty is easy, it's just "am I at end of file"
>>  But front is not so easy.  In order to know what's at the front, you  
>> need to read it.  And at that point you've altered the underlying file.
>
> Not even empty is easy. If you're defining a file range that gives you  
> e.g. whitespace-separated integers, you can't have empty return feof(p)  
> because there might be only spaces through the end of file. So you need  
> to cache one element ahead to be able to implement empty().

Even a stream that does nothing but read straight characters, but has no  
idea how long the stream is, such as stdin, cannot specify empty without  
trying to read more data.  I see your point.

>
> I've discussed this with Walter for the longest time. The correct  
> primitive for an input range that needs to do no caching is:
>
> Tuple!(T, "data", bool, "got") popNext();
>
> When you call popNext it decides in the same place whether there is  
> data, and also gives it to you. If "got" comes off as false, you know  
> you're done.

Yes, I see now that my model won't cut it.  I had a feeling that my model  
wasn't quite correct, but as you say, yours still is clunky.

Looking at other iteration models:

C#: similar to D ranges, but moveNext (a.k.a. popFront) is required to  
start the enumeration. (yuck!)
Java: This is similar to what I was proposing: only has "get next element"  
and "has more elements".

I'm not sure if the right answer has been found by any of these yet.  I  
can feel there is a solution that should be simple, and accurately reflect  
the underlying objects' API, but I have to think more about it.

>>  What sucks about this is, we have to introduce a buffer in the range,  
>> just because we can't look at data until we've popped it.  Not only  
>> that, but calling front a file before anything is read requires a check  
>> to fill the buffer in case we haven't read anything yet.  This could be  
>> alleviated by filling in the constructor, but it's still more complex  
>> than necessary.  Consider also that the underlying stream might already  
>> be buffered, so we are buffering a buffer.
>
> Yah, I agree. I've implemented a few of those in Phobos. It would be  
> good to have a more solid solution. This problem, however, also collides  
> with returning an rvalue versus a ref. A container wants to return a  
> ref. An input range wants to return by value. Then it's difficult to use  
> a container as an input range.

don't you mean the other way around?  That is, it's easy to convert a ref  
into a value, but not vice versa.

>> But since the primitives for input range are set by the compiler (it  
>> uses them to do foreach), we have to implement them to make our stream  
>> ranges friendly to foreach.
>>  Round peg, meet square hole.
>>  But what are the true requirements for iteration using foreach?
>>  1. check if there's anything left
>> 2. get the next element
>>  Step 2 now is split into popFront and front.  So a foreach loop is a  
>> rewritten for loop like this:
>>  foreach(x; range)
>> {
>>   ...
>> }
>>  translates to:
>> {
>>   auto _r = range;
>>   while(!_r.empty)
>>   {
>>     auto x = _r.front();
>>     _r.popFront();
>>     ...
>>   }
>> }
>>  What if step 2 was one function?  Call it popNext(), and make it  
>> equivalent to calling _r.front() and popFront() in one step on ranges  
>> that implement that method.
>
> This will not solve all problems. It will improve things like defining  
> ranges that read one character at a time from a stream. But a function  
> that does read-and-check-for-empty in one shot is the true solution.

Yes, agreed.  I was focusing on not having to keep around transient data  
to avoid reordering or caching issues.  But it does need to be all-in-one  
as you have shown.

>> How does this work with foreach?
>>  {
>>   auto _r = range;
>>   while(!_r.empty)
>>   {
>>     auto x = _r.popNext();
>>     ...
>>   }
>> }
>>  Basically, the same code, one less line.
>>  Consider that any range defined today with front() and popFront() can  
>> implement popNext (and popNext could be an external function if we can  
>> get 3015 resolved).
>>  So what I think we may need is a different range primitive:
>>  An iterable range defines: (name to be decided)
>>  bool empty()
>> T popNext()
>>  An input range is an iterable range that also defines:
>>  T front();
>> popFront();
>
> I think you are using "iterable range" instead of "input range" and  
> "input range" instead of "forward range". This is compared to STL  
> terminology, which I borrowed.

I don't know what the terminology should be.  I was using your  
terminology.  when I said input range, I meant input range as you defined  
it on your std.range documentation.  An input range has the following  
methods:

T front()
void popFront()
bool empty()
and by composition: T popNext() (this wasn't in the docs, but it's  
something that we're throwing around here)

An "iterable" range, or one that does not have persistent access to  
elements and must modify the range and get data at the same time, was  
supposed to be a subset of that. I was just defining it as T popNext() and  
empty(), but it looks like those have to be combined into one function as  
you say.

> (this might be good for many forward ranges.) Then we nicely ask Walter  
> to allow inlining of such functions, and finally implement this in  
> std.range:
>
> ref ElementType!R popNext(R)(R r, ref bool gotSomething)
>      if (isForwardRange!R)
> {
>      if (r.empty)
>      {
>          gotSomething = false;
>          static typeof(return) dumbo;
>          return dumbo;
>      }
>      auto result = &(r.front());
>      r.popFront();
>      gotSomething = true;
>      return *result;
> }
>
> Admittedly this looks considerably messier than it ought to, and that's  
> never a good sign. For starters, I could predict with accuracy the  
> sneering remarks of certain posters who shall remain unnamed. Worse,  
> they'd have a point (only) this one time :o). Messiness was one of the  
> factors that made me decide to steer away from this design. A simpler  
> solution is to just return by value:

Yeah, it's ugly.  Dealing with orthogonal returns is not a graceful  
proposition.

Another idea is to make the T the ref'd arg, similar to how the system  
call read() works:

bool popNext(ref T);

This has some benefits:

1) you aren't checking a temporary for emptyness, so it fits well within a  
loop construct
2) you can avoid returning a dummy element in the empty range case.
3) you avoid returning a piece of possibly large data on the stack when  
it's probably just going to get copied anyways (although the compiler can  
sometimes optimize this).

One thing this does though, is remove the possibility of returning a  
reference to an element.  Not sure if this is a bad thing, since you can  
always convert a reference to a value, and when operating in this mode,  
you are working with the lowest common denominator (as everything has to  
be able to implement this).

However, with foreach, you may want to have references sometimes.  For  
that, I'd guess that foreach might have to support 3 apis: opApply, range,  
and stream (popNext API).

I'm not convinced that range is the best fit for ref'd data anyways,  
opApply is much more adept at it.

How does implementation of popNext look?

bool popNext(R)(R r, ref ElementType!R elem)
      if (isForwardRange!R)
{
      if (r.empty)
          return false;
      elem = r.front();
      r.popFront();
      return true;
}

usage:

R r;
for(ElementType!R x; r.popNext(x);)
{
    // use x
}

vs. your idea for popNext:

R r;
bool done;
// ... geez, I don't even know an easy way to do this???  does this work?
while(ref x = r.popNext(done), done)
{
    // use x
}

Still thinking, may have more ideas...  How do we get a reference to the  
element...

-Steve