[RFC] I/O and Buffer Range

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Jan 6 02:25:19 PST 2014


06-Jan-2014 09:41, Jason White пишет:
> On Sunday, 5 January 2014 at 13:30:59 UTC, Dmitry Olshansky wrote:
>> I my view text implies something like:
>>
>> void write(const(char)[]);
>> size_t read(char[]);
>>
>> And binary would be:
>>
>> void write(const(ubyte)[]);
>> size_t read(ubyte[]);
>>
>> Should not clash.
>
> Those would do the same thing for either text or binary data. When I say
> text writing, I guess I mean the serialization of any type to text (like
> what std.stdio.write does):
>
>      void write(T)(T value);         // Text writing
>      void write(const(ubyte)[] buf); // Binary writing
>
>      write([1, 2, 3]); // want to write "[1, 2, 3]"
>                        // but writes "\x01\x02\x03"
>
> This clashes. We need to be able to specify if we want to write/read a
> text representation or just the raw binary data. In the above case, the
> most specialized overload will be called.

Ok, now I see. In my eye though serialization completely hides raw 
stream write.

So:
struct SomeStream{
     void write(const(ubyte)[] data...);
}

struct Serializer(Stream){
     void write(T)(T value); //calls stream.write inside of it
private:
     Stream stream;
}

>> In-memory array IMHO better not pretend to be a stream. This kind of
>> wrapping goes in the wrong direction (losing capabilities). Instead
>> wrapping a stream and/or array as a buffer range proved to me to be
>> more natural (extending capabilities).
>
> Shouldn't buffers/arrays provide a stream interface in addition to
> buffer-specific operations?

I think it may be best not to. Buffer builds on top of unbuffered 
stream. If there is a need to do large reads it may as well use naked 
stream and not worry about extra copying happening in the buffer layer.

I need to think on this. Seeing as lookahead + seek could be labeled as 
read even though it's not.

> I don't see why it would conflict with a
> range interface. As I understand it, ranges act on a single element at a
> time while streams act on multiple elements at a time. For ArrayBuffer
> in datapicked, a stream-style read is just lookahead(n) and cur += n.
> What capabilities are lost?

In short - lookahead is slicing, read would be copying.
For me prime capability of an array is slicing that is dirt cheap O(1). 
On the other hand stream interface is all about copying bytes to the 
user provided array.

In this setting it means that if you want to wrap array as stream, then 
it must follow generic stream interface. The latter cannot and should 
not think of slicing and the like. Then while wrapping it in some 
adapter up the chain it's no longer seen as array (because adapter is 
generic too and is made for streams). This is what I call capability loss.

> If buffers/arrays provide a stream interface, then they can be used by
> code that doesn't directly need the buffering capabilities but would
> still benefit from them.

See above - it would be better if the code was written for ranges not 
streams. Then e.g. slicing of buffer range on top of array works just as 
cheap as it was for arrays. And zero copies are made (=performance).

>>> Currently, std.stdio has all three of
>>> those facets rolled into one.
>>
>> Locking though is a province of shared and may need a bit more thought.
>
> Locking of streams is something that I haven't explored too deeply yet.
> Streams that communicate with the OS certainly need locking as thread
> locality makes no difference there.

Actually these objects do just fine, since OS does the locking (or makes 
sure of something equivalent). If your stream is TLS there is no need 
for extra locking at all (no interleaving of I/O calls is possible) 
regardless of its kind.

Shared instances would need locking as 2 threads may request some 
operation, and as OS locks only on per sys-call basis something cruel 
may happen in the code that deals with buffering etc.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list