RFC on range design for D2

Tue Sep 9 05:47:14 PDT 2008

(This is an older message that somehow didn't make it to the group. 
Resending now.)

Lionello Lunesu wrote:
> 
> "Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message 
> news:ga46ok$2s77$1 at digitalmars.com...
>> I put together a short document for the range design. I definitely 
>> missed about a million things and have been imprecise about another 
>> million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
> 
> This is just awesome. Thank you for tackling this issue.
> 
> I agree with others that some names are not so obvious. Left/right? How 
> do Arabic speakers feel about this : ) Begin/end seems more intuitive.

I don't know of that particular cultural sensitivity. Begin and end are
bad choices because they'd confuse the heck out of STL refugees. c.left
and c.right are actually STL's c.front() and c.back() or *c.begin() and
c.end()[-1], if you allow the notational abuse. But I sure hope some
good names will come along.

> Can you explain this please. From Input range:
> 
> e=r.getNext Reads an element and moves to the next one. Returns the read 
> element, which is of type ElementType!(R). The call is defined only 
> right after r.isEmpty returned false.
> 
> That last part: The call is defined only right after r.isEmpty returned 
> false.
> 
> First of all, is* functions always sound "const" to me, but the way you 
> describe isEmpty it sounds like it actually changes something, advancing 
> a pointer or something like that. What happens if isEmpty is called 
> twice? Will it skip 1 element?

Excellent question! Gosh if I staged this it couldn't have gone any better.

Consider an input range getting ints separated by whitespace out of a
FILE* - something that we'd expect our design to allow easily and
neatly. So then how do I implement isEmpty()? Well, feof(f) is not
really informative at all. Maybe the file has five more spaces and then
it ends. So in order to TELL you that the range is not empty, I actually
have to GO all the way and actually read the integer. Then I can tell
you: yeah, there's stuff available, or no, I'm done, or even throw an
exception if some error happened.

That makes isEmpty non-const. You check for r.isEmpty, it makes sure an
int is buffered inside the range's state. You call r.isEmpty again, it
doesn't do anything because an int is already buffered. You call
r.getNext, the int gets moved off the range's state into your program,
and the internal flag is set telling the range that the buffer's empty.

You call r.getNext without having called r.isEmpty, then the range makes
a heroic effort to fetch another int. If that fails, the range throws an
exception. So in essence the behavior is that you can use isEmpty to
make sure that getNext won't blow in your face (speaking of Pulp
Fiction...) I think this all is very sensible, and even more sensible
when I'll give more detail below.

> The input range behaves like C#'s IEnumerator, but at least the C# names 
> are more obvious: while (i.MoveNext()) e = i.Current; But isEmpty is 
> common to all ranges, so I understand why it's the way it is. I just 
> hope it could stay "const", not modifying the internal state. Perhaps 
> add "next" to input ranges as well?

This is even better than the previous question. Why not this for an
input iterator?

for (; !r.isEmpty; r.next) use(r.getNext);

or even

for (; !r.isEmpty; r.next) use(r.left);

thus making input ranges quite similar to forward ranges.

In that design:

a) The constructor fetches the first int

b) isEmpty is const and just returns the "available" internal flag

c) next reads the next int off the file

First I'll eliminate the design with isEmpty/next/getNext as flawed for
a subtle reason: cost of copying. Replace mentally the int above with
something that needs dynamic allocation, such as BigInt. So the range
reads one BigInt off the file, stores it in the internal buffer, and
then the user calls:

for (; !r.isEmpty; r.next)
{
     BigInt my = r.getNext();
     ....
}

Since in one iteration there's only one BigInt, not two, I'd need to do
a destructive transfer in getNext() that "moves" the state of BigInt
from the range to my, leaving r's state empty. (This feature will be
available in D2 soon.) But then what if somebody calls r.getNext()
again? Well I don't have the data anymore, so I need to issue a next().
So I discover next was not needed in the first place.

Hope everything is clear so far. Now let's discuss the isEmpty/next/left
design.

That design is also flawed, just in a different way. The range holds the
BigInt inside, but r.left gives to the client a *reference* to it. This
is cool! There is no more extra copying and everything works smoothly.

In fact this design patents a lie. It lies about giving a reference to
an element (the same way a "real" container does) but that element will
be overwritten every time r.next is called, UNBEKNOWNST TO THE CLIENT.
So consider the algorithm:

void bump(R)(R r)
{
     for (; !r.isEmpty(); r.next) ++r.left;
}

You pass an input range to bump and it compiles and executes to
something entirely nonsensical. This is an obvious misuse, but as I'm
sure you know it gets real confusing real fast.

A possible argument is to have left return a ref const(BigInt), but then
we lose the ability to transfer its value into our state.

What we want is a design that tells the truth. And a design that tells
the truth is this:

r.isEmpty does whatever the hell it takes to make sure whether there's
data available or not. It is not const and it could throw an exception.

v = r.getNext returns BY VALUE by means of DESTRUCTIVE COPY data that
came through the wire, data that the client now owns as soon as getNext
returned. There is no extra copy, no extra allocation, and the real
thing has happened: data has been read from the outside and user code
was made the only owner of it.

Andrei