stream interfaces - with ranges

kenji hara k.hara.pg at gmail.com
Fri May 18 08:43:53 PDT 2012


2012/5/19 Artur Skawina <art.08.09 at gmail.com>:
> On 05/18/12 15:51, kenji hara wrote:
>> OK. If reading bytes from underlying device failed, your 'fronts' can
>> return empty slice. I understood.
>> But, It is still *not efficient*. The returned slice will specifies a
>> buffer controlled by underlying device. If you want to gather bytes
>> into one chunk, you must copy bytes from returned slice to your chunk.
>> We should reduce copying memories as much as possible.
>
> Depends if your input range supports zero-copy or not. IOW you avoid
> the copy iff the range can somehow write the data directly to the caller
> provided buffer. This can be true eg for file reads, where you can tell
> the read(2) syscall to write into the user buffer. But what if you need to
> buffer the stream? An intermediate buffer can become necessary anyway.
> But, as i said before, i agree that a caller-provided-buffer-interface
> is useful.
>
>   E[] fronts();
>   void fronts(ref E[]);
>
> And one can be implemented in terms of the other, ie:
>
>  E[] fronts[] { E[] els; fronts(els); return els; }
>  void fronts(ref E[] e) { e[] = fronts()[]; }

The flaw of your design is, the memory to store read bytes/elements is
allocated by the lower layer.
E.g. If you want to construct linked list of some some elements, you
must copy elements from returned slice to new allocated node. I think
it is still inefficient.

> depending on which is more efficient. A range can provide
>
>  enum bool HasBuffer = 0 || 1;
>
> so that the user can pick the more suited alternative.

I think fewer primitives as possible is better design than adding
extra/optional primitives.
How many primitives in your interface design?

>> And, 'put' primitive in output range concept doesn't support non-blocikng write.
>> 'put' should consume *all* of given data and write it  to underlying
>> device, then it would block.
>
> True, a write-as-much-as-possible-but not-more primitive is needed.
>
>   size_t puts(E[], size_t atleast=size_t.max);
>
> or something like that. (Doing it this way allows for explicit
> non-blocking 'puts', ie '(written=puts(els, 0))==0' means EAGAIN.)
>
>> Therefore, whole of range concept doesn't cover non-blocking I/O.

I can agree for the signatures. but the names 'fronts' and 'puts' are
a little too similar.


>>>> I'm designing experimental IO primitives:
>>>> https://github.com/9rnsr/dio
>>>>
>> I have designed the 'pull' and 'push' primitives with two concepts:
>> 1. Reduce copying memories as far as possible.
>> 2. Control buffer memory under programer side, not device side.
>
> Do you have a contained microbenchmark? It would be easy to compare
> both approaches... If you do i'll write one using my scheme - so
> far i only did this for inter-thread communication, there's no file
> based backend.

It has a sample benchmark to compare performance with std.stdio for
line iteration.
In my PC, it is 2x faster in maximum.

>> In my io library, BufferedSink requires three primitives, flush,
>> commit, and writable.
>
> But what happens if neither flush nor commit is called?

If you forget to call 'commit', 0 length data will be written.
And if you forget to call 'flush', the committed data won't be written
to actual device.

Kenji Hara


More information about the Digitalmars-d mailing list