Ranges and/versus iterators

Fawzi Mohamed fawzi at gmx.ch
Wed Mar 24 16:09:41 PDT 2010


On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:

> On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
>>
>> On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:
>>
>>> The Phobos file I/O functions all avoid doing any more buffering  
>>> than
>>> the backing FILE* does. They achieve performance by locking the file
>>> once with flockfile/funlockfile and then using fgetc_unlocked().
>>>
>>> This puts me in real trouble with the formatted reading functions (a
>>> la fscanf but generalized to all input ranges), which I'm gestating
>>> about. The problem with the current API is that if you call
>>> input.front(), it will call fgetc(). But then say I decide I'm done
>>> with the range, as is the case with e.g. reading an integer and
>>> stopping at the first non-digit. That non-digit character will be
>>> lost. So there's a need to say, hey, put this guy back because  
>>> whoever
>>> reads after me will need to look at it. So I need a putBackFront()  
>>> or
>>> something (which would call fungetc()). I wish things were simpler.
>>
>> I had a pushBack (I called that unget) in
>> http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d ,  
>> but I
>> recently removed that in favor of a peek function that I think is  
>> much
>> more flexible.
>
> Thanks for sharing your design with me. Yes, peek() is more flexible  
> than get/unget, but I'm under the stdio tyranny.
>
> In fact I just realized something - I could call
>
> setvbuf(_handle, null, _IONBF, 0)
>
> whenever I bind a File to a FILE*. That way File can do its own  
> buffering and can implement peek() etc. I wonder if we need to worry  
> about sharing, because e.g. several threads would want to write to  
> stdout.

well for stdout by default I use locking to ensure writing writes  
chunks atomically.
I would say that by default streams imply sequence, so can safely be  
non threadsafe.
stdout, stderr and logging are exceptions, there at least chunks  
should be written atomically.

>> What I did is to base most parsing on CharReaders (for example the  
>> char
>> based ones from BasicIO):
>> {{{
>> /// extent of a slice of a buffer
>> enum SliceExtent{ Partial, Maximal, ToEnd }
>>
>> /// a delegate that reads in from a character source
>> alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
>> CharReader;
>>
>> /// a handler of CharReader, returns true if something was read
>> alias bool delegate(CharReader)CharReaderHandler;
>> }}}
>>
>> a char reader reads from the given buffer buf, and can either request
>> more (by returning EOF), or eat some characters out of it. If it sets
>> iterate to true it wants to iterate with the eaten buffer (useful  
>> to for
>> example skip undefined amount of whitespace that might overflow the
>> buffer).
>>
>> Once you have that you can easily create a Peeker structure that  
>> wraps a
>> CharReader, and exposes a CharReaded that tries to match it, but  
>> always
>> eats 0 characters, even if the match was successful.
>> With it you can have a peek method that returns true if the  
>> CharReader
>> that you pass in matches, false if it does not match, and what you  
>> want
>> if the buffer is too small to resolve the issue.
>>
>> Most of these things are templates that work for any type T.
>
> Wait, if you called it CharReader, how come it works with any type  
> T? Or are you referring to T as the parsed type?

Well I presented the CharReader for simplicity, and that is indeed  
only for chars, but most things can be generalized, and indeed if you  
look at

the Reader(T) interface in http://github.com/fawzi/blip/blob/master/blip/io/BasicIO.d 
  or blip.text.TextParser or similar they are templated with a generic  
type T.
For TextParser I was thinking T=char,wchar or dchar, whereas others  
cases are even more generic.

>> Then I
>> built buffered types that using a size_t delegate(T[]) give a Reader
>> based interface.
>>
>> All this is not based on single elements anymore, but on arrays  
>> (ranges?
>> :), but I think that is what is needed for efficient i/o.
>
> Sounds good, but I wonder why you use delegates instead of classes.  
> Is that for simplicity?

there are both, and both have their place.
delegates are very simple and can be easily built on the fly, I like  
that very much, they reduce the code footprint of various things.
More complex behaviour is better captured by classes, and indeed there  
are (also in BasicIO) the following interfaces:

interface OutStreamI{
     void rawWriteStr(char[]);
     void rawWriteStr(wchar[]);
     void rawWriteStr(dchar[]);
     void rawWrite(void[]);
     CharSink charSink();
     BinSink binSink();
     void flush();
     void close();
}

/// a reader of elements of type T
interface Reader(T){
     /// read some data into the given buffer
     size_t readSome(T[]);
     /// character reader handler
     bool handleReader(size_t delegate(T[], SliceExtent slice,out bool  
iterate) r);
     /// shutdown the input source
     void shutdownInput();
}

/// one or more readers
interface MultiReader{
     enum Mode{ Binary=1, Char=2, Wchar=4, Dchar=8 }
     /// returns the modes this reader supports
     uint modes();
     /// returns the native modes of this reader (less overhead)
     uint nativeModes();
     Reader!(char) readerChar();
     Reader!(wchar) readerWchar();
     Reader!(dchar) readerDchar();
     Reader!(void) readerBin();
     void shutdownInput();
}
there are classes that can create the more full fledged objects out of  
delegates.

> I confess it's not 100% clear to me how the delegates are supposed  
> to be used in concert, particularly why there's a need for both  
> CharReader and CharReaderHandler.

mainly one needs CharReader, which is a method that reads something.

CharReaderHandler is there just for completeness, it is a delegate of  
a method that actually reads, but normally one simply uses that  
method, i.e. it uses a Reader!(T).handleReader method...

>>> On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
>>>> I don't think we should give up on trying to make a stream range  
>>>> that
>>>> is not awkward, I really dislike the way today's input ranges map  
>>>> to
>>>> streams.
>>>
>>> Me too. Let's keep on looking, I have the feeling something good is
>>> right behind the corner. But then I felt that way for a year :o).
>>
>> give a try to
>> bool popFront(ref T) ( or next, or another name, or even just a  
>> delegate
>> with that signature)
>> I was surprised how well it works, not perfect but better than the  
>> other
>> alternatives I had tried.
>>
>> loop on a T[] array:
>> bool popFront(ref T* el);
>
> So arrays have a different interface than streams. It looks like you  
> can't write code that works uniformly for both, because for some you  
> need the * and for some you don't. Did I understand that correctly?

well the foreach loop is the same, but the iteration loop is indeed  
different in the sense that one uses a pointer to an element and the  
other the element itself.
one can write code that removes the pointer that is there  
(dereferencing it, or doing and inline function with subsequent call  
which allows you to reuse the same variable name):
void myF(ref x){
  // code
}
myF(*x);

(that is a nice trick that I used several times).

But yes there *is* a difference and the difference is that with arrays  
you might modify the element, modifying the stored value, whereas with  
streams you can't.
This conceptual difference and if reflected in the interface.
One can then discuss if immutable arrays should be iterated with  
immutable pointers or with values (i.e. copying) just as streams are.


>
> Andrei




More information about the Digitalmars-d mailing list