Request for pre-review: std.serialization/orange

Sat Oct 1 15:52:01 PDT 2011

On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg <doob at me.com> wrote:
> On 2011-10-01 06:29, Robert Jacques wrote:
>> On Thu, 29 Sep 2011 14:58:30 -0400, Jacob Carlborg <doob at me.com> wrote:

[snip]

>> (2)
>> orange.serialization.archives.XmlArchive need to be documented.
>
> I was hoping the Archive interface and the Base abstract class would be
> enough.

For the pre-review its okay. But you'll need it for the actual review.

>> (3)
>> if Archive.Array (which is poorly named btw) "is a type independent
>> representation of an array" then why does it contain an elementSize field?
>
> Suggestions for other names are welcome. Perhaps it was poorly worded,
> but what I mean is that this type can represent all array types.
>
>> (3a)
>> Also by the time archiving is called, isSliceOf should always return
>> false.
>
> Why is that?

If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc. That means that every single archive format must implement an alias resolution algorithm. This results in a lot of copy/paste boiler plate which has to be maintained. It also makes it more difficult to get extra formats supported. Worse if someone forgets to either include this functionality or to apply a patch, silent bugs and/or slowdowns are introduced. All and archiver should be responsible for is taking some decorated data structure and converting it to XML/JSON/YAML/etc and back again. Anything more complex than that should be shared functionality located in the serializer / de-serializer objects.

>> That this function exists speaks to design problems both large
>> and small. On the small scale, isSliceOf indicates that you are testing
>> every array against every other array for slicing, which I hope is not
>> the case.
>
> I do. How would I otherwise discover if an array is a slice of another
> array or not?

Okay, first some rational. Consider:

assert(!a.isSliceOf(b));
assert(!b.isSliceOf(a));
assert( c.isSliceOf(a));
assert( c.isSliceOf(b));

and

class Foo {
	float x;
	float[3] point;
}

void main() {
	auto foo = new Foo;
	auto ptr = &foo.x;
	auto slice = point[0..2];
}

In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient.

The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag:

{
	void* head;			// Start of the memory chunk
	void* tail;			// End of the memory chunk
	bool  hasAliases;	// True if there are more than one reference to this chunk
}

For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance.

As an optimization, the user should be able to 'finalize' the serialization by pruning away all memory chunks without aliases. (i.e. a serializeAndClear method)