stream interfaces - with ranges

Thu May 17 21:19:45 PDT 2012

I think range interface is not useful for *efficient* IO. The expected
IO interface will be more *abstract* than range primitives.

---
If you use range I/F to read bytes from device, we will always do
blocking IO - even if the device is socket. It is not efficient.

auto sock = new TcpSocketDevice();
if (sock.empty) { auto e = sock.front; }
  // In empty primitive, we *must* wait the socket gets one or more
bytes or really disconnected.
  // If not, what exactly returns sock.front?
  // Then using range interface for socket reading enforces blocking
IO. It is *really* inefficient.
---
I think IO primitives must be distinct from range ones for the reasons
mentioned above...

I'm designing experimental IO primitives:
https://github.com/9rnsr/dio

I call the input stream "source", and call output stream "sink".
"source" has a 'pull' primitive, and sink has 'push' primitive, and
they can avoid blocking.
If you want to construct input range interface from "source", you
should use 'ranged' helper function in io.core module. 'ranged'
returns a wrapper object, and in its front method, It reads bytes from
"source", and if the read bytes not sufficient, blocks the input.

In other words, range is not almighty. We should think distinct
primitives for the IO.

Kenji Hara

2012/5/17 Steven Schveighoffer <schveiguy at yahoo.com>:
> OK, so I had a couple partially written replies on the 'deprecating
> std.stream etc' thread, then I had to go home.
>
> But I thought about this a lot last night, and some of the things Andrei
> and others are saying is starting to make sense (I know!).  Now I've
> scrapped those replies and am thinking about redesigning my i/o package
> (most of the code can stay intact).
>
> I'm a little undecided on some of the details, but here is what I think
> makes sense:
>
> 1. We need a buffering input stream type.  This must have additional
> methods besides the range primitives, because doing one-at-a-time byte
> reads is not going to cut it.
> 2. I realized, buffering input stream of type T is actually an input range
> of type T[].  Observe:
>
> struct /*or class*/ buffer(T)
> {
>     T[] buf;
>     InputStream input;
>     ...
>     @property T[] front() { return buf; }
>     void popFront() {input.read(buf);} // flush existing buffer, read next.
>     @property bool empty() { return buf.length == 0;}
> }
>
> Roughly speaking, not all the details are handled, but this makes a
> feasible input range that will perform quite nicely for things like
> std.algorithm.copy.  I haven't checked, but copy should be able to handle
> transferring a range of type T[] to an output range with element type T,
> if it's not able to, it should be made to work.  I know at least, an
> output stream with element type T supports putting T or T[].  What I think
> really makes sense is to support:
>
> buffer!ubyte b;
> outputStream o;
>
> o.put(b); // uses range primitives to put all the data to o, one element
> (i.e. ubyte[]) of b at a time
>
>
> 3. An ultimate goal of the i/o streaming package should be to be able to
> do this:
>
> auto x = new XmlParser("<rootElement></rootElement>");
>
> or at least
>
> auto x = new XmlParser(buffered("<rootElement></rootElement>"));
>
> So I think arrays need to be able to be treated as a buffering streams.  I
> tried really hard to think of some way to make this work with my existing
> system, but I don't think it will without unnecessary baggage, and losing
> interoperability with existing range functions.
>
> Where does this leave us?
>
> 1. I think we need, as Andrei says, an unbuffered streaming abstraction.
> I think I have this down pretty solidly in my current std.io.
> 2. A definition of a buffering range, in terms of what additional
> primitives the range should have.  The primitives should support buffered
> input and buffered output (these are two separate algorithms), but
> independently (possibly allowing switching for rw files).
> 3. An implementation of the above definition hooked to the unbuffered
> stream abstraction, to be utilized in more specific ranges.  But by
> itself, can be used as an input range or directly by code.
> 4. Specialization ranges for each type of input you want (i.e. byLine,
> byChunk, textStream).
> 5. Full replacement option of File backend.  File will start out with
> C-supported calls, but any "promotion" to using a more D-like range type
> will result in switching to a D-based stream using the above mechanisms.
> Of course, all existing code should compile that does not try to assume
> the File always has a valid FILE *.
>
> What do you all think?  I'm going to work out what the definition of 2
> should be, based on what I've written and what makes sense.
>
> Have I started to design something feasible or unworkable? :)
>
> -Steve