Range interface for std.serialization

Wed Aug 28 00:13:34 PDT 2013

On 2013-08-27 22:12, Dmitry Olshansky wrote:

> I see...
> That depends on the format and for these that have no keys or markers of
> any kind versioning might help here. For instance JSON/BSON could handle
> permutation of fields, but I then it falls short of handling links e.g.
> pointers (maybe there is a trick to get it, but I can't think of any
> right away).

For pointers and reference types I currently serializing all fields with 
an id then when there's a pointer or reference I can just do this:

<int name="foo" id="1">3</int>
<pointer name="bar">1</pointer>

> I suspect it would be best to somehow see archives by capbilities:
> 1. Rigid (most binary) - in-order, depends on the order of fields, may
> need to fit a scheme (in this cases D types implicitly define one)
> Rigid archivers may also enjoy (per format in the future) a code
> generator that given a scheme defines D types with a bit of CTFE+mixin.
>
> 2. Flexible - can survive reordering, is scheme-less, data defines
> structure etc. easer handles versioning e.g. XML is one.

Yes, that's a good idea. In the binary archiver I'm working on I'm 
cheating quite a bit and relax the requirements made by the serializer.

> This also neatly answers the question about scheme vs scheme-less
> serialization. Protocol buffers/Thrift may be absorbed into Rigid
> category if we can get the versioning right. Also solving versioning is
> the last roadblock (after ranges) mentioned on the path to making this
> an epic addition to Phobos.

Versioning shouldn't be that hard, I think.

> + Some kind of capability flag (compile-time) if it can serialize full
> graphs or if the format is to limited for such. Taking that with Rigid
> would cover most adhoc binary formats in the wild, with Flexible it
> would handle some simple hierarchical formats as well.

Sounds like a good idea.

> Was it DOM-ish too?

Yes.

> I've meant at least a check of 'mode' on each call to (de)serialize +
> some other branch-y stuff that tests overridden serializers etc.
>
> It could be a relatively new idiom to follow but there is a great value
> in having a lean common path aka 90% of use cases that need no extras
> should go the fastest route potentially at the _expense_ of *less
> frequent cases*.
> Simplified - the earlier you can elide extra work the better performance
> you get. To do that you may need to do double the overhead (checks) in
> less frequent case to remove some of it in the common case.

Yes, I understand the checking for "mode" wasn't the best approach. The 
internals are mostly coded to be straight forward and just work.

> See below. I was talking namely about calling functions to see that no
> events are fired anyway.

I can probably add a static-if before calling the functions.

> Yeah, I see, but it's still a call to delegate that's hard to inline
> (well LDC/GDC might). Would it be hard to do a compile-time check if
> there are any events with the type in question at all and then call
> triggerEvent(s)?

No, I don't think so. I can also make the triggerEvents take the 
delegate by alias parameter, if that helps. Or inline it manually.

> While we are on the subject of delegates - you absolutely should use
> 'scope delegate' as most (all?) delegates are never stored anywhere but
> rather pass blocks of code to call deeper down the line.
> (I guess it's somewhat Ruby-style, but it's not a problem).

Good idea. The reasons for the delegates is to avoid begin/end 
functions. This also forces the use of the API correctly. Hmm, actually 
it may not. Since the Serializer technically is the user of the archiver 
API and that is already correctly implemented. The developer do need to 
implement the archiver API correctly, but there's nothing that stops 
him/her from _not_ calling the delegate. Am I over thinking this?

> Aye, as any faithful Phobos dev absolutely :)
> Seriously though ATM I just _suspect_ there is no need for Archive to be
> an interface. I would need to think this bit through more deeply but
> virtual call per field alone make me nervous here.

Originally it was using templates. One of my design goals back then was 
to not have to use templates. Templates forces slightly more complicated 
API for the user:

auto serializer = new Serializer!(XmlArchive);

Which is fine, but I'm not very about the API for custom serialization:

class Foo
{
     void toData (Archive) (Serializer!(Archive) serializer);
}

The user is either forced to use templates here as well, or:

class Foo
{
     void toData (Serializer!(XmlArchive) serializer);
}

... use a single type of archive. It's also possible to pass in anything 
as Archive. Now we have template constraints, which didn't exist back 
then, make it a bit better.

About the large API to implement for an Archive, this is the criteria I 
had when creating the API, in order of importance.

1. Should be easy for a consumer to use
2. Should be easy for an archive implementor
3. Should be easy to implement the serializer

In this case, point 1 made it less easy for point 2. Point 2 made me 
push as much as possible to the serializer instead of having it in the 
archiver.

In the end, it's quite easy to copy-paste the API, do some search and 
replace and forward methods like these:

void archiveEnum (bool value, string baseType, string key, Id id)
void archiveEnum (char value, string baseType, string key, Id id)
void archiveEnum (int value, string baseType, string key, Id id)

... to a private template method. That's what XmlArchive does:

https://github.com/jacob-carlborg/orange/blob/master/orange/serialization/archives/XmlArchive.d#L439

-- 
/Jacob Carlborg