Ranges and/versus iterators

Fawzi Mohamed fawzi at gmx.ch
Wed Mar 24 07:00:02 PDT 2010


On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:

> The Phobos file I/O functions all avoid doing any more buffering  
> than the backing FILE* does. They achieve performance by locking the  
> file once with flockfile/funlockfile and then using fgetc_unlocked().
>
> This puts me in real trouble with the formatted reading functions (a  
> la fscanf but generalized to all input ranges), which I'm gestating  
> about. The problem with the current API is that if you call  
> input.front(), it will call fgetc(). But then say I decide I'm done  
> with the range, as is the case with e.g. reading an integer and  
> stopping at the first non-digit. That non-digit character will be  
> lost. So there's a need to say, hey, put this guy back because  
> whoever reads after me will need to look at it. So I need a  
> putBackFront() or something (which would call fungetc()). I wish  
> things were simpler.

I had a pushBack (I called that unget) in http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d 
  , but I recently removed that in favor of a peek function that I  
think is much more flexible.
What I did is to base most parsing on CharReaders (for example the  
char based ones from BasicIO):
{{{
/// extent of a slice of a buffer
enum SliceExtent{ Partial, Maximal, ToEnd }

/// a delegate that reads in from a character source
alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)  
CharReader;

/// a handler of CharReader, returns true if something was read
alias bool delegate(CharReader)CharReaderHandler;
}}}

a char reader reads from the given buffer buf, and can either request  
more (by returning EOF), or eat some characters out of it. If it sets  
iterate to true it wants to iterate with the eaten buffer (useful to  
for example skip undefined amount of whitespace that might overflow  
the buffer).

Once you have that you can easily create a Peeker structure that wraps  
a CharReader, and exposes a CharReaded that tries to match it, but  
always eats 0 characters, even if the match was successful.
With it you can have a peek method that returns true if the CharReader  
that you pass in matches, false if it does not match, and what you  
want if the buffer is too small to resolve the issue.

Most of these things are templates that work for any type T. Then I  
built buffered types that using a size_t delegate(T[]) give a Reader  
based interface.

All this is not based on single elements anymore, but on arrays  
(ranges? :), but I think that is what is needed for efficient i/o.
>
> On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
>> I don't think we should give up on trying to make a stream range that
>> is not awkward, I really dislike the way today's input ranges map to
>> streams.
>
> Me too. Let's keep on looking, I have the feeling something good is  
> right behind the corner. But then I felt that way for a year :o).

give a try to
	bool popFront(ref T) ( or next, or another name, or even just a  
delegate with that signature)
I was surprised how well it works, not perfect but better than the  
other alternatives I had tried.

loop on a T[] array:
	bool popFront(ref T* el);

mapped to
	opApply(int delegate(ref T x) loopBody);

loop on a source of elements T:
	bool popFront(ref T el);

mapped to
	opApply(int delegate(ref T x) loopBody);
(well the ref there does not make much sense, but that is how opApply  
works to avoid the explosion of opApply).
All it takes is a check for pointers in the templates, and dereference  
the type of opApply.

also direct loops often look reasonable thanks to with D automatically  
dereferencing with ".":
while (it.popFront(el)){
	el.doSomething;
}

yes assigning stuff directly to el, and not its components you need a  
T* iterator and you have to write
	*el=...
and
	x= *el
but that is not so terrible.

filter applied on an iterator it is just
bool popNext(ref T el){
   while (it.popNext(el)){
     if (acceptable(el)){
       return true;
     }
   }
   return false;
}

combiners of iterators are likewise quite simple to write.




More information about the Digitalmars-d mailing list