Ranges and/versus iterators

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Wed Mar 24 15:29:18 PDT 2010


On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
>
> On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:
>
>> The Phobos file I/O functions all avoid doing any more buffering than
>> the backing FILE* does. They achieve performance by locking the file
>> once with flockfile/funlockfile and then using fgetc_unlocked().
>>
>> This puts me in real trouble with the formatted reading functions (a
>> la fscanf but generalized to all input ranges), which I'm gestating
>> about. The problem with the current API is that if you call
>> input.front(), it will call fgetc(). But then say I decide I'm done
>> with the range, as is the case with e.g. reading an integer and
>> stopping at the first non-digit. That non-digit character will be
>> lost. So there's a need to say, hey, put this guy back because whoever
>> reads after me will need to look at it. So I need a putBackFront() or
>> something (which would call fungetc()). I wish things were simpler.
>
> I had a pushBack (I called that unget) in
> http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I
> recently removed that in favor of a peek function that I think is much
> more flexible.

Thanks for sharing your design with me. Yes, peek() is more flexible 
than get/unget, but I'm under the stdio tyranny.

In fact I just realized something - I could call

setvbuf(_handle, null, _IONBF, 0)

whenever I bind a File to a FILE*. That way File can do its own 
buffering and can implement peek() etc. I wonder if we need to worry 
about sharing, because e.g. several threads would want to write to stdout.

> What I did is to base most parsing on CharReaders (for example the char
> based ones from BasicIO):
> {{{
> /// extent of a slice of a buffer
> enum SliceExtent{ Partial, Maximal, ToEnd }
>
> /// a delegate that reads in from a character source
> alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
> CharReader;
>
> /// a handler of CharReader, returns true if something was read
> alias bool delegate(CharReader)CharReaderHandler;
> }}}
>
> a char reader reads from the given buffer buf, and can either request
> more (by returning EOF), or eat some characters out of it. If it sets
> iterate to true it wants to iterate with the eaten buffer (useful to for
> example skip undefined amount of whitespace that might overflow the
> buffer).
>
> Once you have that you can easily create a Peeker structure that wraps a
> CharReader, and exposes a CharReaded that tries to match it, but always
> eats 0 characters, even if the match was successful.
> With it you can have a peek method that returns true if the CharReader
> that you pass in matches, false if it does not match, and what you want
> if the buffer is too small to resolve the issue.
>
> Most of these things are templates that work for any type T.

Wait, if you called it CharReader, how come it works with any type T? Or 
are you referring to T as the parsed type?

> Then I
> built buffered types that using a size_t delegate(T[]) give a Reader
> based interface.
>
> All this is not based on single elements anymore, but on arrays (ranges?
> :), but I think that is what is needed for efficient i/o.

Sounds good, but I wonder why you use delegates instead of classes. Is 
that for simplicity?

I confess it's not 100% clear to me how the delegates are supposed to be 
used in concert, particularly why there's a need for both CharReader and 
CharReaderHandler.

>> On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
>>> I don't think we should give up on trying to make a stream range that
>>> is not awkward, I really dislike the way today's input ranges map to
>>> streams.
>>
>> Me too. Let's keep on looking, I have the feeling something good is
>> right behind the corner. But then I felt that way for a year :o).
>
> give a try to
> bool popFront(ref T) ( or next, or another name, or even just a delegate
> with that signature)
> I was surprised how well it works, not perfect but better than the other
> alternatives I had tried.
>
> loop on a T[] array:
> bool popFront(ref T* el);

So arrays have a different interface than streams. It looks like you 
can't write code that works uniformly for both, because for some you 
need the * and for some you don't. Did I understand that correctly?

Andrei



More information about the Digitalmars-d mailing list