streaming redux
Steven Schveighoffer
schveiguy at yahoo.com
Wed Dec 29 07:53:31 PST 2010
On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> I've put together over the past days an embryonic streaming interface.
> It separates transport from formatting, input from output, and buffered
> from unbuffered operation.
>
> http://erdani.com/d/phobos/std_stream2.html
>
> There are a number of questions interspersed. It would be great to start
> a discussion using that design as a baseline. Please voice any related
> thoughts - thanks!
Without reading any other comments, here is my take on just the streaming
part (not formatting).
Everything looks good except for two problems:
1. BufferedX should not inherit UnbufferedX. The main reason for this is
because both Buffered *and* Unbuffered can be desirable properties. For
example, you may want to *require* that you have a raw stream as a
parameter without a buffer. The perfect example is a class which wraps an
Unbuffered stream, and adds a buffer to it (which is what I'd expect as a
class design). You don't want to accept a stream that's already buffered,
or you are double-buffering. You can deal with this at runtime by
throwing an exception, but I think it's better to disallow this to even
compile.
Now, this removes the possibility of having a function which accepts
either an unbuffered or buffered stream. I stipulate that this is not a
valid requirement -- your code will work best with one of them, but not
both. If you really need to accept either, you can use templates, but I
think you will find you always use one or the other even there.
2. I think it's a mistake to put a range interface directly in the
interface. A range can be built with the buffered stream as its core if
need be. I have long voiced my opinion that I/O should not implement
ranges, and reference types should never be ranges. For example, you are
going to implement byLine based not on the range interface, but based on
the other parts. Why must byLine be an external range, but "byBuffer" is
builtin to the stream? In particular, I think popFront is an odd function
for all buffered streams to have to implement.
To voice my opinions on the questions:
-----
Question: Should we offer an open primitive at this level? If so, what
parameter(s) should it take?
No, if you need a new stream, create a new instance. The OS processing
required to open a file is going to dwarf any performance degradation of
creating a new class on the heap.
For types that may open quick (say, an Array input stream), you can
provide a function to re-open another array that doesn't have to go in the
base interface.
Also note that opening a network stream requires quite different
parameters than opening a file. Putting it at the interface level would
require some sort of parsed-string parameter, which puts undue
responsibility on such a basic interface.
-----
Question: Should we offer a primitive rewind that takes the stream back to
the beginning? That might be supported even by some streams that don't
support general seek calls. Alternatively, some streams might support
seek(0, SeekAnchor.start) but not other calls to seek.
Considering that seek is already callable, even if the stream doesn't
support it (because the interface defines it), I don't think it's
unreasonable to selectively throw exceptions if the seek isn't possible.
In otherwords, I think seek(0) is acceptable as an alternative to rewind().
However, you may also implement:
final void rewind() { seek(0);}
directly in the interface if necessary
-----
Question: May we eliminate seekFromCurrent and seekFromEnd and just have
seek with absolute positioning? I don't know of streams that allow seek
without allowing tell. Even if some stream doesn't, it's easy to add
support for tell in a wrapper. The marginal cost of calling tell is small
enough compared to the cost of seek.
I don't think the cost of tell is marginal. Support what the OS supports,
and all OSes support seeking from the current position, reducing the
number of system calls is preferable.
Also, how to implement seekFromEnd with just tell?
-----
Question: Should this throw on an unopened stream? I don't think so,
because throwing does not offer any additional information that user code
didn't have, and the idiom if (s.isOpen) s.close() is verbose and
frequently encountered.
I agree, don't throw on an unopened stream.
-----
Question: Should we allow read to return an empty slice even if atEnd is
false? If we do, we allow non-blocking streams with burst transfer.
However, naive client code on non-blocking streams will be inefficient
because it would essentially implement busy-waiting.
Why not return an integer so different situations could be designated?
It's how the system call read works so you can tell no data was read but
that's because it's a non-blocking stream.
I realize it's sexy to return the data again so it can be used
immediately, but in practice it's more useful to return an integer.
For example, if you want to fill a buffer, you need a loop anyways
(there's no guarantee that the first read will fill the buffer), and at
that point, you are just going to use the length member of the return
value to advance your loop.
I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,
positive on data read, and throw an exception on error.
-----
Question: Should we allow an empty front on a non-empty stream? This goes
back to handling non-blocking streams.
Well, streams shouldn't have a range interface anyways, but to answer this
specific question, I'd say no. front should fill the buffer if it's
empty. This follows the nature of all other ranges, where front is
available on creation.
-----
Question: Should we eliminate this function? Theoretically calling
advance(n) is equivalent with seekFromCurrent(n). However, in practice a
file-based stream will have to implement advance even though the
underlying file is not seekable.
I think it's good to have this function. At first, I didn't, but now I
realize it's good because advance(n) may be low-performance (it may use
read to advance the stream). If you eliminate this function, but put it's
functionality into seekFromCurrent, this makes seekFromCurrent low
performance.
I think you should change the requirements, however, and follow the same
return type as I specified above for read (-1 for wouldblock, 0 for EOF,
positive for number of bytes 'advanced'). Otherwise, you have issues with
non-blocking streams.
====================
OK, so now I've voiced my opinions on what's there, now I'll push the
interface I had specified some time ago (which incidentally, I am building
an I/O library based off of it). From my current skeleton:
/**
* Read data until a condition is satisfied.
*
* Buffers data from the input stream until the delegate returns other
than
* ~0. The delegate is passed the data read so far, and the start of
the
* data just read. The deleate should return ~0 if the condition is
not
* satisfied, or the number of bytes that should be returned otherwise.
*
* Any data that satisfies the condition will be considered consumed
from
* the stream.
*
* params: process = A delegate to determine satisfaction of a
condition
* per the terms above.
*
* returns: the data identified by the delegate that satisfies the
* condition. Note that this data may be owned by the buffer and so
* shouldn't be written to or stored for later use without duping.
*/
ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);
The advantage of such an interface is that it creates a very efficient way
to specify how to buffer the data based on the data (i.e. byLine comes to
mind).
Here is a second function that does the same as above but appends it
directly into a user-supplied buffer:
size_t appendUntil(uint delegate(ubyte[] data, uint start) process,
ref ubyte[] arr);
-Steve
More information about the Digitalmars-d
mailing list