stream interfaces - with ranges

Fri May 18 04:05:50 PDT 2012

On 05/18/12 06:19, kenji hara wrote:
> I think range interface is not useful for *efficient* IO. The expected
> IO interface will be more *abstract* than range primitives.
> 
> ---
> If you use range I/F to read bytes from device, we will always do
> blocking IO - even if the device is socket. It is not efficient.
> 
> auto sock = new TcpSocketDevice();
> if (sock.empty) { auto e = sock.front; }
>   // In empty primitive, we *must* wait the socket gets one or more
> bytes or really disconnected.

No. 'empty' has to return true only _after_ seeing EOF.

Something like 'available' can return the number of elements known
to be fetchable w/o blocking. [1]

>   // If not, what exactly returns sock.front?

EWOULDBLOCK :^)

But, yes, it needs to block, as there's no generic way to return
EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
comes in - that one /can/ return an empty slice.
So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
(and note i'm oversimplifying -- 'fronts' can return something that
/acts/ as a slice; which is what i'm in fact are doing)

>   // Then using range interface for socket reading enforces blocking
> IO. It is *really* inefficient.

> I think IO primitives must be distinct from range ones for the reasons
> mentioned above...
> 
> I'm designing experimental IO primitives:
> https://github.com/9rnsr/dio
> 
> I call the input stream "source", and call output stream "sink".
> "source" has a 'pull' primitive, and sink has 'push' primitive, and
> they can avoid blocking.
> If you want to construct input range interface from "source", you
> should use 'ranged' helper function in io.core module. 'ranged'
> returns a wrapper object, and in its front method, It reads bytes from
> "source", and if the read bytes not sufficient, blocks the input.
> 
> In other words, range is not almighty. We should think distinct
> primitives for the IO.

Well, your 'pull' and 'push' are just different names for my 'fronts'
and 'puts' (modulo the data transfer interface, which can be done both
ways using a set of overloads, hence it doesn't matter).

I don't see any reason to invent yet another abstraction, when ranges
can be made to work with some improvements.

Ranges are just a convention; not a perfect one, but having /one/, not 
two or thirteen, is valuable. If you think ranges are flawed the
discussion should be about ripping out every trace of them from the
language and libraries and replacing them with something better. If
you think that would be bad - well, having tens of different incompatible
abstractions isn't good either. (and, yes, you can provide glue so that
they can interact, but that does not scale well)

Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
if you omit one or both of them?

artur

[1] Reminds me:

   struct S(T) {
      shared T a;
      @property size_t available()() { return a; }
   }

The compiler infers length as 'pure', which, depending on the
definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
but, as it is now, it does - so omitting a call to 'available' would
be wrong)