Ranges and/versus iterators

Tue Mar 23 19:51:42 PDT 2010

On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
> Andrei Alexandrescu Wrote:
>> I'd gladly reconsider E* getNext(), and I like it a lot, but that
>> doesn't accommodate ranges that want to return rvalues without
>> storing them (e.g. a range using getchar() as a back-end, and
>> generally streams that don't correspond to stuff stored in memory).
>> If it's not in memory, there's no pointer to it.
>
> Second, you *have* to read data into memory.  Even with the ranges as
> they currently are, you have to read into memory.  At least this is
> less awkward.

I agree. But it's one thing to read and pass along, and a different 
thing to read and keep in a buffer inside the range.

> Take for instance a line iterator.  You have to read enough to see
> the line terminator, but you most likely do not read *exactly* to the
> line terminator, so you just read in chunks until you get a line,
> then return the pointer to the data.  It works actually quite
> elegantly.

I disagree about the elegance part. If the range arrogates the right to 
use its own buffering, then when you decide you're done with that range 
and try to read some more from the stream, you discover data has been lost.

The Phobos file I/O functions all avoid doing any more buffering than 
the backing FILE* does. They achieve performance by locking the file 
once with flockfile/funlockfile and then using fgetc_unlocked().

This puts me in real trouble with the formatted reading functions (a la 
fscanf but generalized to all input ranges), which I'm gestating about. 
The problem with the current API is that if you call input.front(), it 
will call fgetc(). But then say I decide I'm done with the range, as is 
the case with e.g. reading an integer and stopping at the first 
non-digit. That non-digit character will be lost. So there's a need to 
say, hey, put this guy back because whoever reads after me will need to 
look at it. So I need a putBackFront() or something (which would call 
fungetc()). I wish things were simpler.

> Third, the memory could be supplied by the caller.  For instance, if
> you wrote the function like this:
>
> E* getNext(E* buf = null);
>
> Then foreach could do something like this:
>
> foreach(e; streamrange)
>
> =>
>
> E _e; while(auto e = streamrange.getNext(&_e))
>
> To avoid heap allocation.  Of course, heap allocation would be the
> default if buf is null.
>
> Tango does this sort of trick quite often, and it makes the I/O code
> extremely fast.

The problem is that that speed doesn't translate very well to in-memory 
containers. For containers it's preferable to pass null so you get a 
pointer to the actual element; for streams it's preferable to not pass 
null. So it's difficult to write code that works well for both.

> Also, another thing to think about is we can generalize the return
> type to satisfying the condition:
>
> iff range is empty then cast(bool)range.getNext == false.
>
> This means as long as your range cannot return a null element for a
> non-empty return, it is OK not to use a pointer.  For example, the
> line iterator again... it can be written like:
>
> const(char)[] getNext()
>
> because you will only ever return a null const(char)[] when there is
> no data left.

I see, but if I'm looking for ints? I'll have to return a pointer - or a 
nullable or something.

> I don't think we should give up on trying to make a stream range that
> is not awkward, I really dislike the way today's input ranges map to
> streams.

Me too. Let's keep on looking, I have the feeling something good is 
right behind the corner. But then I felt that way for a year :o).

Andrei