Streaming transport interfaces: input

Fri Oct 15 09:31:23 PDT 2010

On Fri, 15 Oct 2010 00:37:55 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 10/14/10 21:58 CDT, Steven Schveighoffer wrote:
>> On Thu, 14 Oct 2010 16:47:13 -0400, Steven Schveighoffer
>> <schveiguy at yahoo.com> wrote:
>>
>>> On Thu, 14 Oct 2010 14:43:56 -0400, Andrei Alexandrescu
>>
>>>> How? Denis' implementation has two copies in the mix. (I'm not
>>>> counting .dup etc.) Anyhow, let's do this - write down your
>>>> interfaces so I can comment on them. We talk "oh that's a buffering
>>>> interface" and "that requires buffering" and "that's an extra copy"
>>>> and so on but we have little concrete contenders. I put my cards on
>>>> the table, you put yours.
>>>
>>> I'll see if I can put something together.
>>
>> Here's a rough outline:
>
> Thanks!
>
>> enum Anchor
>> {
>> Begin,
>> Current,
>> End
>> }
>>
>> interface Seek
>> {
>> ulong seek(long delta, Anchor whence);
>> final ulong tell() { return seek(0, Anchor.Current); }
>> bool seekable(); // define as false if seeking is not supported, true  
>> if it
>> // is supported (this doesn't necessarily mean a seek will
>> // succeed).
>> }
>
> So far so good.
>
>> interface InputTransport : Seek
>> {
>> size_t read(ubyte[] data); // returns 0 on EOF.
>> }
>
> No way to check for end of stream except by reading some of it?

This is often the only way in the low level interface, and since we have  
no buffer at this point, yes, it's required to read.  How do you implement  
EOF without a buffer to hold the data you tried to read to see if you were  
at EOF?

It might be feasible to ask for EOF on the buffered version, but I still  
think it's not necessary.

>> // defined to implement either a D buffered object or wrap a FILE *.
>> //
>> interface BufferedInputTransport : Seek
>> {
>> size_t read(ubyte[] data); // returns 0 on EOF.
>
> Since this method has the same sig, why doesn't BufferedInputTransport  
> inherit InputTransport?

I thought of that, but then a buffered input class could accept a buffered  
input transport interface as its low-level implementation, so then you  
have unnecessarily double-buffered streams.

>
>> // read data into the buffer until the delegate returns other than ~0
>> //
>> // The delegate is passed the entire buffer so far, with the start of  
>> the
>> // new data just read. It returns other than ~0 when it determines the  
>> end
>> // of the data in question.
>> //
>> ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);
>
> How does the delegate say "you know what, I'm fine with the first 1000  
> bytes of the data; please take the rest of 1048 back"? Is that the  
> result of the delegate? The process feels a bit odd.

You are using the internal buffer, no copying is necessary.  So all that  
happens is the read position is moved up 1000 bytes and the first 1000  
bytes is returned.

>
>> // same as readUntil except append to the given arr, Any excess
>> // data will be pushed into the internal buffer.
>> //
>> size_t appendUntil(uint delegate(ubyte[] data, uint start) process, ref
>> ubyte[] arr)
>
> So indeed the delegate seems to return the length it wants to keep? And  
> the rest would be copied back into the stream's internal buffers? I'm  
> not sure I understand this API.

Yes, this involves a copy of the data you aren't interested in, but how  
else could you do it?  You can't know "hey this data is not going to  
satisfy the condition, so I'll preemptively read it into the buffer  
instead".

>
>> // various buffer functions.
>> @property size_t bufsize();
>> @property size_t readable();
>> // etc.
>
> Can one set bufsize?

Probably in the etc. functions ;)  I left that up in the air, because I  
haven't given full thought to a buffer implementation.

>
>> }
>>
>> The way I see it working is, there are two implementations for
>> BufferedInputTransport: FILEInputTransport and DBufferInputTransport.
>> There are numerous implementations of InputTransport, each of which can
>> be passed to the DBufferInputTransport, which uses its own buffer
>> implementation. For example, a network socket, file, inter-thread
>> stream, an array, etc.
>>
>> This way, you can play nice with C's stdio when necessary (i.e. for
>> stdin/stdout/stderr) and avoid the FILE limitations and performance
>> issues otherwise.
>
> I'm a bit unclear on the delegate stuff, but it's promising because it  
> could be quite flexible. But I wouldn't want to aggravate the users with  
> an API that's difficult to use. Could you please give a few examples  
> using delegates that implement common patterns - e.g. readline and  
> readDelim?

Sure, readline is probably easiest.  Note, I'll assume for this example  
that \n signifies a line, windows would be slightly more difficult but  
doesn't really improve the example clarity, just assume it can also be  
done:

char[] readline(BufferedInputTransport trans, bool makeCopy=false)
{
    uint checkForNL(ubyte[] data, uint start)
    {
       char[] d = cast(char[])data; // need to switch to utf8
       foreach(i, dchar d; d[start..$])
       {
          if(d == '\n')
             return i + 1 + start; // consume including the newline
       }
       return ~0;
    }

    ubyte[] result;
    if(makeCopy)
        trans.read(&checkForNL, result);
    else
        result = trans.read(&checkForNL);

    auto cresult = cast(char[])result;

    // don't include any newline read
    if(cresult[$-1] == '\n')
       cresult = cresult[0..$-1];
    return result;
}

If you specify makeCopy is true, the resulting data is unique and can be  
used wherever.  Otherwise, the resulting data is actually the buffer of  
the BufferedInputTransport stream, and shouldn't be saved as it may be  
reused.

One can easily make a range based on this as well.

-Steve