streaming redux

Tue Dec 28 22:32:17 PST 2010

On 12/28/10 11:39 AM, Michel Fortin wrote:
> On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> I've put together over the past days an embryonic streaming interface.
>> It separates transport from formatting, input from output, and
>> buffered from unbuffered operation.
>>
>> http://erdani.com/d/phobos/std_stream2.html
>>
>> There are a number of questions interspersed. It would be great to
>> start a discussion using that design as a baseline. Please voice any
>> related thoughts - thanks!
>
> One of my concerns is the number of virtual calls required in actual
> usage, because virtual calls prevent inlining. I know it's necessary to
> have virtual calls in the formatter to serialize objects (which requires
> double dispatch), but in your design the underlying transport layer too
> wants to be called virtually. How many virtual calls will be necessary
> to serialize an array of 10 objects, each having 10 fields? Let's see:
>
> 10 calls to Formatter.put(Object)
> + 10 calls to Object.toString(Formatter)
> + 10 objects * 10 calls per object to Formatter.put(<some field type>)
> + 10 objects * 10 calls per object to UnbufferedOutputTransport.write(in
> ubyte[])
>
> Total: 220 virtual calls, for 10 objects with 10 fields each. Most of
> the functions called virtually here are pretty trivial and would
> normally be inlined if the context allowed it. Assuming those fields are
> 4 byte integers and are stored as is in the stream, the result will be
> between 400 and 500 byte long once we add the object's class name. We
> end up having almost 1 virtual call for each two byte of emitted data;
> is this overhead really acceptable? How much inlining does it prevent?

Probably that overhead may be quite large.

> My second concern is that your approach to Formatter is too rigid. For
> instance, what if an object needs to write different fields depending on
> the output format, or write them in a different order? It'll have to
> check at runtime which kind of formatter it got (through casts
> probably). Or what if I have a formatter that wants to expose an XML
> tree instead of bytes? It'll need a totally different interface that
> deals with XML elements, attributes, and character data, not bytes.

I think that's a very rare situation. When you pick a certain formatter, 
you commit to a certain representation, period. It's poor design to have 
the object object (sic) to that representation.

To some extent representation can be tweaked via format specifiers, 
which are a language spoken by both the formatter and the formatted.

> So because of all this virtual dispatch and all this rigidity, I think
> Formatter needs to be rethought a little. My preference obviously goes
> to satically-typed formatters.

It's heartwarming to see so much interest in static polymorphism. Only a 
couple of years ago I would've had trouble convincing people of that; 
now I need to preach the advantages of dynamic polymorphism.

> But what I'd like to see is something
> like this:
>
> interface Serializable(F) {
> void writeTo(F formatter);
> }

Let me make sure I understand correctly. So when I define a class I 
commit to its possible representations? Doesn't seem good design to me. 
What if I later come with a new Formatter? I'd need to change my entire 
class hierarchy too.

> Any object can implement a serialization for a given formatter by
> implementing the interface above parametrized with the formatter type.

If only one formatter would be allowed that would be even worse. But you 
can allow several:

class Widget : Serializable!Json, Serializable!Binary {
   ...
}

Sorry, I think this is poor design.

> (Struct types could have a similar writeTo function too, they just don't
> need to implement an interface.) The formatter type can expose the
> interface it wants and use or not use virtual functions, it could be an
> XML writer interface (something with openElement, writeCharacterData,
> closeElement, etc), it could be a JSON interface; it could even be your
> Formatter as proposed, we just wouldn't be limited by it.
>
> So basically, I'm not proposing you dump Formatter, just that you make
> it part of a reusable pattern for
> formatting/serializing/unformatting/unserializing things using other
> things that your Formatter interface.

I may be misunderstanding, but to me it seems that this design brings 
more problems than it solves.

> As for the transport layer, I don't mind it much if it's an interface.
> Unlike Formatter, nothing prevents you from creating a 'final' class and
> using it directly when you can to avoid virtual dispatch. This doesn't
> work so well for Formatter however because it requires double dispatch
> when it encounters a class, which washes away all static information.

I agree that Transport is fine with the dynamic interface.

Andrei