streaming redux

Tue Dec 28 08:09:01 PST 2010

On 12/28/10 5:09 AM, Vladimir Panteleev wrote:
> On Tue, 28 Dec 2010 09:02:29 +0200, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> I've put together over the past days an embryonic streaming interface.
>> It separates transport from formatting, input from output, and
>> buffered from unbuffered operation.
>>
>> http://erdani.com/d/phobos/std_stream2.html
>>
>> There are a number of questions interspersed. It would be great to
>> start a discussion using that design as a baseline. Please voice any
>> related thoughts - thanks!
>
> Here are my humble observations:
>
> First of all: was ranges-like duck typing considered for streams? The
> language allows on-demand runtime polymorphism, and static typing allows
> compile-time detection of stream features for abstraction. Not sure how
> useful this is is practice, but it allows some optimizations (e.g. the
> code can be really fast when working with memory streams, due to
> inlining and lack of vcalls).

I think static polymorphism is great for ranges, which have fine 
granularity, but not for streams, which have coarse granularity. One 
read/write operation on a stream is likely to do enough work for the 
dynamic dispatch overhead to not matter.

> Also, why should there be support for unopened streams? While a stream
> should be flush-able and close-able, opening and reopening streams
> should be done at a higher level IMO.

OK.

>> Question: Should we offer an open primitive at this level? If so, what
>> parameter(s) should it take?
>
> I don't see how this would be implemented at the lowest level, taking
> into consideration all the possible stream types (network connections,
> pipes, etc.)

It could take a Variant.

>> Question: Should we offer a primitive rewind that takes the stream
>> back to the beginning? That might be supported even by some streams
>> that don't support general seek calls. Alternatively, some streams
>> might support seek(0, SeekAnchor.start) but not other calls to seek.
>
> If seek support is determined at runtime by whether the call throws an
> exception or not, then I see no difference in having a rewind method or
> having non-zero seek throw.
>
>> Question: May we eliminate seekFromCurrent and seekFromEnd and just
>> have seek with absolute positioning? I don't know of streams that
>> allow seek without allowing tell. Even if some stream doesn't, it's
>> easy to add support for tell in a wrapper. The marginal cost of
>> calling tell is small enough compared to the cost of seek.
>
> Does anyone ever use seekFromEnd in practice (except the rare case of
> supporting certain file formats)? seekFromCurrent is a nice commodity,
> but every abstract method increases the burden for implementers.
>
>> Buffered*Transport
>
> I always thought that a perfect stream library would have buffering as
> an additional layer. For example: auto f = new Buffered!FileStream(...);

So Buffered would be a template? Cool idea. Let me think of it a bit more.

>> abstract interface Formatter;
>
> I'm really not sure about this interface. I can see at most three
> implementations of it (native, high-endian and low-endian variants),
> everything else being too obscure to count. I think it should be
> implemented as static structs instead. Also, having an abstract method
> for each native type is quite ugly for D standards, I'm sure there's a
> better solution.

Nonono. Perhaps I chose the wrong name, but Formatter is really anything 
that takes typed data and encodes it in raw bytes suitable for 
transporting. That includes e.g. json, csv, and also a variety of binary 
formats.

>> Question: Should all formatters require buffered transport? Otherwise
>> they might need to keep their own buffering, which ends up being less
>> efficient with buffered transports.
>
> Ideally buffering would be optional, and constructing a buffer-enabled
> stream should be so easy it'd be an easily adoptable habit (see above).
> Last time I tried to do I/O in Java (or was it C#?) I had to instantiate
> 3-4 classes before I could read from a file. D can do better.
>
>> Question: Should we also define putln that writes the string and then
>> an line terminator?
>
> But then you're mixing together text and binary streams into the same
> interface. I don't think this is a good idea.
>
>> Question: Should we define a more involved protocol?
>
> "A more involved protocol" would really be proper serialization. Calling
> toString can work as a commodity, similar to writefln's behavior.
>
>> This final function writes a customizable "header" and a customizable
>> "footer".
>
> What is the purpose of this? TypeInfo doesn't contain the field names,
> so it can't be used for protobuf-like serialization. Compile-time
> reflection would be much more useful.
 >> Question: Should we pass the size in advance, or make the stream
>> responsible for inferring it?
>
> Code that needs to handle allocation itself can make the small effort of
> writing the lengths as well. A possible solution is to make string
> length encoding part of the interface specification, then the user can
> read the length and the contents separately themselves.
>
>> Question: How to handle associative arrays?
>
> Not a problem with static polymorphism.
>

Yah, but that precludes dynamic polymorphism...

Andrei