stream interfaces - with ranges

Fri May 18 07:39:55 PDT 2012

2012/5/18 Steven Schveighoffer <schveiguy at yahoo.com>:
> On Fri, 18 May 2012 00:19:45 -0400, kenji hara <k.hara.pg at gmail.com> wrote:
>
>> I think range interface is not useful for *efficient* IO. The expected
>> IO interface will be more *abstract* than range primitives.
>
>
> If all you are doing is consuming data and processing it, range interface is
> efficient.  Most streaming implementations that are synchronous use:
>
> 1. read block of data from low-level source into buffer
> 2. process buffer
> 3. If still data left, go to step 1.
>
> 1 is done via popFront, 2 is done via front.
>
> 3 is somewhat available via empty, but empty kind of depends on reading
> data.  I think it can work.
>
> It's not the ideal interface for all aspects of i/o, but it does map to
> ranges, and for single purpose tasks (such as parse an XML file), it will be
> most efficient.

Almost agree. When we want to do I/O, that is synchronous or asynchronous.
Only a few people would use non-blocking interface.
But for the library implementation, non-blocking interface is still important.
I think the non-blocking interface should be designed to avoid copying
as far as possible, and to achieve it with range interface is
impossible in general.

>> ---
>> If you use range I/F to read bytes from device, we will always do
>> blocking IO - even if the device is socket. It is not efficient.
>>
>> auto sock = new TcpSocketDevice();
>> if (sock.empty) { auto e = sock.front; }
>>  // In empty primitive, we *must* wait the socket gets one or more
>> bytes or really disconnected.
>>  // If not, what exactly returns sock.front?
>>  // Then using range interface for socket reading enforces blocking
>> IO. It is *really* inefficient.
>> ---
>
>
> sockets do not have to be blocking, and I/O does not have to use the range
> portion of the interface.
>
> And efficient I/O has little to do with synchronicity and more to do with
> reading a large amount of data at a time instead of byte by byte.
>
> Using multi-threads or fibers, and using OS primitives such as select or
> poll can make I/O quite efficient and allow you to do other things while no
> I/O is happening.  These will not happen with range interface, but will be
> available through other interfaces.

I have talked about *good I/O primitives for library implementation*.
I think range interface is one of the most useful concept for end
users, but not good one for people who want to implement efficient
libraries.

>> I think IO primitives must be distinct from range ones for the reasons
>> mentioned above...
>
>
> Yes, I agree.  But ranges can be *mapped* to stream primitives.

No, we cannot map output range concept to non-blocking output. 'put'
operation always requires blocking.

>> I'm designing experimental IO primitives:
>> https://github.com/9rnsr/dio
>
>
> I'll take a look.

Thanks.

>>
>> In other words, range is not almighty. We should think distinct
>> primitives for the IO.
>
>
> 100% agree.  The main thing I realized that brought me to propose the
> "range-based" (if you can call it that) version is that:
>
> 1. Ranges can be readily mapped to stream primitives *if* you use the
> concept of a range of T[] vs. a range of T.  So in essence, without changing
> anything I can slap on a range interface for free.
> 2. Arrays make very efficient data sources, and are easy to create.  We need
> a way to hook stream-using code onto an array.
>
> But be clear, I am *not* going to remove the existing stream I/O primitives
> I had for buffered i/o, I'm rather *adding* range primitives as well.

My policy is very similar. But, as described above, I think range
cannot cover non-blocing IO.
And I think non-blocking IO interface is important for library implementations.

Then I had taken a design that provides IO specific primitives.
Additionally I have added primitives to control underlying buffers
explicitly, because it is useful for some  byte processing - e.g.
encoding, taking a string with slicing the buffer, and so on.

Kenji Hara