Range interface for std.serialization

Tue Aug 27 13:12:41 PDT 2013

27-Aug-2013 23:23, Jacob Carlborg пишет:
> On 2013-08-26 18:41, Dmitry Olshansky wrote:
>> Looking at your current code in depth... finally. I would have a problem
>> starting the productive answers to the questions.
>>
>> First things first - there should not be a key parameter aside from
>> stuff added by archiver itself for its internal needs. Nor is there a
>> simple way to locate data by key afterwards (certainly not every format
>> defines such). It would require some tagged object model and there is no
>> such _requirement_ in the serialization.
>
> I would really like the serializer not being dependent on the order of
> the fields of the types it's (de)serializing.
>
I see...
That depends on the format and for these that have no keys or markers of 
any kind versioning might help here. For instance JSON/BSON could handle 
permutation of fields, but I then it falls short of handling links e.g. 
pointers (maybe there is a trick to get it, but I can't think of any 
right away).

I suspect it would be best to somehow see archives by capbilities:
1. Rigid (most binary) - in-order, depends on the order of fields, may 
need to fit a scheme (in this cases D types implicitly define one)
Rigid archivers may also enjoy (per format in the future) a code 
generator that given a scheme defines D types with a bit of CTFE+mixin.

2. Flexible - can survive reordering, is scheme-less, data defines 
structure etc. easer handles versioning e.g. XML is one.

This also neatly answers the question about scheme vs scheme-less 
serialization. Protocol buffers/Thrift may be absorbed into Rigid 
category if we can get the versioning right. Also solving versioning is 
the last roadblock (after ranges) mentioned on the path to making this 
an epic addition to Phobos.

+ Some kind of capability flag (compile-time) if it can serialize full 
graphs or if the format is to limited for such. Taking that with Rigid 
would cover most adhoc binary formats in the wild, with Flexible it 
would handle some simple hierarchical formats as well.

>> This confusion and its consequences are no doubt due to building on
>> std.xml and the interface it presents.
>
> It might be due to the XML format in general but certainly not due to
> std.xml. The XmlArchie originally used the XML package in Tango, long
> before it supported std.xml.

Was it DOM-ish too?

>> The awful duality of Serializer that results literally in:
>> if(mode == serializing) doSerializing else do doDeserializing
>>
>> And the spectacular pair:
>>
>> T deserialize (T) (Data data, string key = "")
>> {
>>          mode = deserializing;
>>      ...
>> }
>>
>> Data serialize (T) (T value, string key = null)
>> {
>>          mode = serializing;
>>      ...
>> }
>
> I guess that's easier to avoid if I divide Serializer in to two separate
> parts, one for serializing and one for deserializing.
>
Right, I was shamelessly picking at this again.

>> Amount of extra code executed per bit of output is remarkably high
>
> Now I think you're exaggerating a bit.

I've meant at least a check of 'mode' on each call to (de)serialize + 
some other branch-y stuff that tests overridden serializers etc.

It could be a relatively new idiom to follow but there is a great value 
in having a lean common path aka 90% of use cases that need no extras 
should go the fastest route potentially at the _expense_ of *less 
frequent cases*.
Simplified - the earlier you can elide extra work the better performance 
you get. To do that you may need to do double the overhead (checks) in 
less frequent case to remove some of it in the common case.

>> a hallmark of standard library is pay as you go principle. We (as
>> collectively Phobos devs) have to set the baseline for performance, if
>> it's too low we're out of the game.
>>
>> For example - events are cute, but do we all need them? Do we always
>> want an overhead of checking that stuff per field written?
>
> Sure, there are some overhead of calling some functions but the events
> are checked for at compile time so the overhead should be minimal.

See below. I was talking namely about calling functions to see that no 
events are fired anyway.

>> Instead decompose these layers, make them stackable for instance:
>>
>> auto serializer = new Serializer(...);
>> auto tracingSerializer = new TracingSerializer(serializer);
>>
>> Or just make 2 kinds of serializers with static if on a single template
>> parameter bool withEvents it's trivial. Then a couple of aliases would
>> finish the job.
>
> I don't think that will be needed. I can see if I can refactor a bit to
> minimize the overhead even more.

You are probably right as I note later on + there seems to be a way to 
elide the cost entirely if there are no events.
>
>> With that I'm observe that events are attached to types/fields... hum,
>> in such a case it needs work to make them zero-cost if absent.
>
> The only cost is calling "triggerEvents" and "triggerEvent", the rest is
> performed at compile time.

Yeah, I see, but it's still a call to delegate that's hard to inline 
(well LDC/GDC might). Would it be hard to do a compile-time check if 
there are any events with the type in question at all and then call 
triggerEvent(s)?

While we are on the subject of delegates - you absolutely should use 
'scope delegate' as most (all?) delegates are never stored anywhere but 
rather pass blocks of code to call deeper down the line.
(I guess it's somewhat Ruby-style, but it's not a problem).

>
>> And I'm afraid it's too late or the changes are too far reaching but
>> let's try it.
>>
>> I'm especially destroyed  by (and the fact that it's a part of interface
>> to implement):
>>
>>      void archiveEnum (bool value, string baseType, string key, Id id);
>>
>>      /// Ditto
>>      void archiveEnum (bool value, string baseType, string key, Id id);
>>
[snip]

> So you want templates instead?

Aye, as any faithful Phobos dev absolutely :)
Seriously though ATM I just _suspect_ there is no need for Archive to be 
an interface. I would need to think this bit through more deeply but 
virtual call per field alone make me nervous here.

> I have read your posts, thank you for your comments. I'm planning now to:
>
> * Split Serializer in to two parts
> * Make the parts struct
> * Possibly provide class wrappers
> * Split Archive in two parts
> * Add range interface to Serializer and Archive

Great checklist, this would help greatly. I'm glad you see the value in 
these changes.
Feel free to nag me on the NG and personally for any deficiency you come 
across on the way there ;)

-- 
Dmitry Olshansky