streaming redux

Tue Dec 28 09:39:07 PST 2010

On 2010-12-28 02:02:29 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> I've put together over the past days an embryonic streaming interface. 
> It separates transport from formatting, input from output, and buffered 
> from unbuffered operation.
> 
> http://erdani.com/d/phobos/std_stream2.html
> 
> There are a number of questions interspersed. It would be great to 
> start a discussion using that design as a baseline. Please voice any 
> related thoughts - thanks!

One of my concerns is the number of virtual calls required in actual 
usage, because virtual calls prevent inlining. I know it's necessary to 
have virtual calls in the formatter to serialize objects (which 
requires double dispatch), but in your design the underlying transport 
layer too wants to be called virtually. How many virtual calls will be 
necessary to serialize an array of 10 objects, each having 10 fields? 
Let's see:

	  10 calls to Formatter.put(Object)
	+ 10 calls to Object.toString(Formatter)
	+ 10 objects * 10 calls per object to Formatter.put(<some field type>)
	+ 10 objects * 10 calls per object to 
UnbufferedOutputTransport.write(in ubyte[])

Total: 220 virtual calls, for 10 objects with 10 fields each. Most of 
the functions called virtually here are pretty trivial and would 
normally be inlined if the context allowed it. Assuming those fields 
are 4 byte integers and are stored as is in the stream, the result will 
be between 400 and 500 byte long once we add the object's class name. 
We end up having almost 1 virtual call for each two byte of emitted 
data; is this overhead really acceptable? How much inlining does it 
prevent?

My second concern is that your approach to Formatter is too rigid. For 
instance, what if an object needs to write different fields depending 
on the output format, or write them in a different order? It'll have to 
check at runtime which kind of formatter it got (through casts 
probably). Or what if I have a formatter that wants to expose an XML 
tree instead of bytes? It'll need a totally different interface that 
deals with XML elements, attributes, and character data, not bytes.

So because of all this virtual dispatch and all this rigidity, I think 
Formatter needs to be rethought a little. My preference obviously goes 
to satically-typed formatters. But what I'd like to see is something 
like this:

	interface Serializable(F) {
		void writeTo(F formatter);
	}

Any object can implement a serialization for a given formatter by 
implementing the interface above parametrized with the formatter type. 
(Struct types could have a similar writeTo function too, they just 
don't need to implement an interface.) The formatter type can expose 
the interface it wants and use or not use virtual functions, it could 
be an XML writer interface (something with openElement, 
writeCharacterData, closeElement, etc), it could be a JSON interface; 
it could even be your Formatter as proposed, we just wouldn't be 
limited by it.

So basically, I'm not proposing you dump Formatter, just that you make 
it part of a reusable pattern for 
formatting/serializing/unformatting/unserializing things using other 
things that your Formatter interface.

As for the transport layer, I don't mind it much if it's an interface. 
Unlike Formatter, nothing prevents you from creating a 'final' class 
and using it directly when you can to avoid virtual dispatch. This 
doesn't work so well for Formatter however because it requires double 
dispatch when it encounters a class, which washes away all static 
information.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/