stream interfaces - with ranges

Fri May 18 06:51:02 PDT 2012

2012/5/18 Artur Skawina <art.08.09 at gmail.com>:
> On 05/18/12 06:19, kenji hara wrote:
>> I think range interface is not useful for *efficient* IO. The expected
>> IO interface will be more *abstract* than range primitives.
>>
>> ---
>> If you use range I/F to read bytes from device, we will always do
>> blocking IO - even if the device is socket. It is not efficient.
>>
>> auto sock = new TcpSocketDevice();
>> if (sock.empty) { auto e = sock.front; }
>>   // In empty primitive, we *must* wait the socket gets one or more
>> bytes or really disconnected.
>
> No. 'empty' has to return true only _after_ seeing EOF.
>
> Something like 'available' can return the number of elements known
> to be fetchable w/o blocking. [1]
>
>>   // If not, what exactly returns sock.front?
>
> EWOULDBLOCK :^)
>
> But, yes, it needs to block, as there's no generic way to return
> EAGAIN/EWOULDBLOCK. This is where the primitive returning a slice
> comes in - that one /can/ return an empty slice.
> So '!r.empty && r.fronts.length==0)' is the equivalent to EAGAIN.
> (and note i'm oversimplifying -- 'fronts' can return something that
> /acts/ as a slice; which is what i'm in fact are doing)

OK. If reading bytes from underlying device failed, your 'fronts' can
return empty slice. I understood.
But, It is still *not efficient*. The returned slice will specifies a
buffer controlled by underlying device. If you want to gather bytes
into one chunk, you must copy bytes from returned slice to your chunk.
We should reduce copying memories as much as possible.

And, 'put' primitive in output range concept doesn't support non-blocikng write.
'put' should consume *all* of given data and write it  to underlying
device, then it would block.

Therefore, whole of range concept doesn't cover non-blocking I/O.

>>   // Then using range interface for socket reading enforces blocking
>> IO. It is *really* inefficient.
>
>> I think IO primitives must be distinct from range ones for the reasons
>> mentioned above...
>>
>> I'm designing experimental IO primitives:
>> https://github.com/9rnsr/dio
>>
>> I call the input stream "source", and call output stream "sink".
>> "source" has a 'pull' primitive, and sink has 'push' primitive, and
>> they can avoid blocking.
>> If you want to construct input range interface from "source", you
>> should use 'ranged' helper function in io.core module. 'ranged'
>> returns a wrapper object, and in its front method, It reads bytes from
>> "source", and if the read bytes not sufficient, blocks the input.
>>
>> In other words, range is not almighty. We should think distinct
>> primitives for the IO.
>
> Well, your 'pull' and 'push' are just different names for my 'fronts'
> and 'puts' (modulo the data transfer interface, which can be done both
> ways using a set of overloads, hence it doesn't matter).
>
> I don't see any reason to invent yet another abstraction, when ranges
> can be made to work with some improvements.

For efficiency and removing bottlenecks.
Even today, I / O is the slowest operation in the entire program.
Providing good primitives for I/O is enough value.

I have designed the 'pull' and 'push' primitives with two concepts:
1. Reduce copying memories as far as possible.
2. Control buffer memory under programer side, not device side.

> Ranges are just a convention; not a perfect one, but having /one/, not
> two or thirteen, is valuable. If you think ranges are flawed the
> discussion should be about ripping out every trace of them from the
> language and libraries and replacing them with something better. If
> you think that would be bad - well, having tens of different incompatible
> abstractions isn't good either. (and, yes, you can provide glue so that
> they can interact, but that does not scale well)

Range concept is good abstraction if underlying container controlls
ownership. But, in I/O we want to *move* ownership of bytes. Range is
not designed efficiently for the purpose, IMO.

> Hmm, how are 'flush()' and 'commit()' supposed to work? Is data lost
> if you omit one or both of them?

In my io library, BufferedSink requires three primitives, flush,
commit, and writable.

> artur
>
> [1] Reminds me:
>
>   struct S(T) {
>      shared T a;
>      @property size_t available()() { return a; }
>   }
>
> The compiler infers length as 'pure', which, depending on the
> definition of 'shared' is wrong. ('shared' /shouldn't/ imply 'volatile',
> but, as it is now, it does - so omitting a call to 'available' would
> be wrong)
>