streaming redux

Wed Dec 29 07:53:31 PST 2010

On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> I've put together over the past days an embryonic streaming interface.  
> It separates transport from formatting, input from output, and buffered  
> from unbuffered operation.
>
> http://erdani.com/d/phobos/std_stream2.html
>
> There are a number of questions interspersed. It would be great to start  
> a discussion using that design as a baseline. Please voice any related  
> thoughts - thanks!

Without reading any other comments, here is my take on just the streaming  
part (not formatting).

Everything looks good except for two problems:

1. BufferedX should not inherit UnbufferedX.  The main reason for this is  
because both Buffered *and* Unbuffered can be desirable properties.  For  
example, you may want to *require* that you have a raw stream as a  
parameter without a buffer.  The perfect example is a class which wraps an  
Unbuffered stream, and adds a buffer to it (which is what I'd expect as a  
class design).  You don't want to accept a stream that's already buffered,  
or you are double-buffering.  You can deal with this at runtime by  
throwing an exception, but I think it's better to disallow this to even  
compile.

Now, this removes the possibility of having a function which accepts  
either an unbuffered or buffered stream.  I stipulate that this is not a  
valid requirement -- your code will work best with one of them, but not  
both.  If you really need to accept either, you can use templates, but I  
think you will find you always use one or the other even there.

2. I think it's a mistake to put a range interface directly in the  
interface.  A range can be built with the buffered stream as its core if  
need be.  I have long voiced my opinion that I/O should not implement  
ranges, and reference types should never be ranges.  For example, you are  
going to implement byLine based not on the range interface, but based on  
the other parts.  Why must byLine be an external range, but "byBuffer" is  
builtin to the stream?  In particular, I think popFront is an odd function  
for all buffered streams to have to implement.

To voice my opinions on the questions:

-----
Question: Should we offer an open primitive at this level? If so, what  
parameter(s) should it take?

No, if you need a new stream, create a new instance.  The OS processing  
required to open a file is going to dwarf any performance degradation of  
creating a new class on the heap.
For types that may open quick (say, an Array input stream), you can  
provide a function to re-open another array that doesn't have to go in the  
base interface.

Also note that opening a network stream requires quite different  
parameters than opening a file.  Putting it at the interface level would  
require some sort of parsed-string parameter, which puts undue  
responsibility on such a basic interface.

-----
Question: Should we offer a primitive rewind that takes the stream back to  
the beginning? That might be supported even by some streams that don't  
support general seek calls. Alternatively, some streams might support  
seek(0, SeekAnchor.start) but not other calls to seek.

Considering that seek is already callable, even if the stream doesn't  
support it (because the interface defines it), I don't think it's  
unreasonable to selectively throw exceptions if the seek isn't possible.   
In otherwords, I think seek(0) is acceptable as an alternative to rewind().

However, you may also implement:

final void rewind() { seek(0);}

directly in the interface if necessary

-----
Question: May we eliminate seekFromCurrent and seekFromEnd and just have  
seek with absolute positioning? I don't know of streams that allow seek  
without allowing tell. Even if some stream doesn't, it's easy to add  
support for tell in a wrapper. The marginal cost of calling tell is small  
enough compared to the cost of seek.

I don't think the cost of tell is marginal.  Support what the OS supports,  
and all OSes support seeking from the current position, reducing the  
number of system calls is preferable.

Also, how to implement seekFromEnd with just tell?

-----
Question: Should this throw on an unopened stream? I don't think so,  
because throwing does not offer any additional information that user code  
didn't have, and the idiom if (s.isOpen) s.close() is verbose and  
frequently encountered.

I agree, don't throw on an unopened stream.

-----
Question: Should we allow read to return an empty slice even if atEnd is  
false? If we do, we allow non-blocking streams with burst transfer.  
However, naive client code on non-blocking streams will be inefficient  
because it would essentially implement busy-waiting.

Why not return an integer so different situations could be designated?   
It's how the system call read works so you can tell no data was read but  
that's because it's a non-blocking stream.

I realize it's sexy to return the data again so it can be used  
immediately, but in practice it's more useful to return an integer.

For example, if you want to fill a buffer, you need a loop anyways  
(there's no guarantee that the first read will fill the buffer), and at  
that point, you are just going to use the length member of the return  
value to advance your loop.

I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,  
positive on data read, and throw an exception on error.

-----
Question: Should we allow an empty front on a non-empty stream? This goes  
back to handling non-blocking streams.

Well, streams shouldn't have a range interface anyways, but to answer this  
specific question, I'd say no.  front should fill the buffer if it's  
empty.  This follows the nature of all other ranges, where front is  
available on creation.

-----
Question: Should we eliminate this function? Theoretically calling  
advance(n) is equivalent with seekFromCurrent(n). However, in practice a  
file-based stream will have to implement advance even though the  
underlying file is not seekable.

I think it's good to have this function.  At first, I didn't, but now I  
realize it's good because advance(n) may be low-performance (it may use  
read to advance the stream).  If you eliminate this function, but put it's  
functionality into seekFromCurrent, this makes seekFromCurrent low  
performance.

I think you should change the requirements, however, and follow the same  
return type as I specified above for read (-1 for wouldblock, 0 for EOF,  
positive for number of bytes 'advanced').  Otherwise, you have issues with  
non-blocking streams.

====================

OK, so now I've voiced my opinions on what's there, now I'll push the  
interface I had specified some time ago (which incidentally, I am building  
an I/O library based off of it).  From my current skeleton:

     /**
      * Read data until a condition is satisfied.
      *
      * Buffers data from the input stream until the delegate returns other  
than
      * ~0.  The delegate is passed the data read so far, and the start of  
the
      * data just read.  The deleate should return ~0 if the condition is  
not
      * satisfied, or the number of bytes that should be returned otherwise.
      *
      * Any data that satisfies the condition will be considered consumed  
from
      * the stream.
      *
      * params: process = A delegate to determine satisfaction of a  
condition
      * per the terms above.
      *
      * returns: the data identified by the delegate that satisfies the
      * condition.  Note that this data may be owned by the buffer and so
      * shouldn't be written to or stored for later use without duping.
      */
     ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);

The advantage of such an interface is that it creates a very efficient way  
to specify how to buffer the data based on the data (i.e. byLine comes to  
mind).

Here is a second function that does the same as above but appends it  
directly into a user-supplied buffer:

     size_t appendUntil(uint delegate(ubyte[] data, uint start) process,  
ref ubyte[] arr);

-Steve