Request for pre-review: std.serialization/orange

Mon Oct 3 22:21:24 PDT 2011

On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg <doob at me.com> wrote:
> On 2011-10-03 15:57, Robert Jacques wrote:
>> So, in essence, you are saying that by the time archiving occurs,
>> isSliceOf will always return false? Then why is it part of the public API?
>
> No, I'm not saying that. Example:
>
> struct Array
> {
> 	void* ptr;
> 	size_t length;
> 	size_t elementSize;
> 	
> 	bool isSliceOf (Array b)
> 	{
> 		return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length
> * b.elementSize;
> 	}
> }
>
> void main ()
> {
> 	auto a = [0, 1, 2, 3, 4];
> 	auto b = a[2 .. 4];
> 	
> 	auto aArr = Array(a.ptr, a.length, 4);
> 	auto bArr = Array(b.ptr, b.length, 4);
> 	
> 	assert(bArr.isSliceOf(aArr));
> 	assert(!aArr.isSliceOf(bArr));
> }
>
> Both the asserts in the above code passes as expected. See, no
> serialization or archiving in sight. Is there something I'm missing?

That putting isSliceOf in the public API, implies its usage by the archiver.

> Actually it does not need to be part of the public API when I think
> about it. I can move it into Serializer. Array would still need to be
> public since both Serailzer and Archive need access to it and the
> package attribute doesn't work very well.

Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider:

auto c = a[1..3];

auto cArr = Array(c.ptr,c.length,4);

assert(!cArr.isSliceOf(bArr));
assert(!bArr.isSliceOf(cArr));

// and

b ~= 5;

bArr = Array(b.ptr, b.length, 4);

assert(!bArr.isSliceOf(aArr));

In short, a serializer must be capable of handling overlapping arrays, not just strict slices. The array representation therefore needs to parameterize both the underlying array common to all slices and the actual slice that was serialized for that key.

>>>> The solution, in my mind, is to think in terms of memory blocks/chucks.
>>>> Every reference can be thought as pointing to a memory chunk defined by
>>>> two pointers and a flag:
>>>>
>>>> {
>>>> void* head; // Start of the memory chunk
>>>> void* tail; // End of the memory chunk
>>>> bool hasAliases; // True if there are more than one reference to this
>>>> chunk
>>>> }
>>>>
>>>> For alias detection / resolution, you build a balanced tree of memory
>>>> chunks, widening the chunk and flagging hasAliases as appropriate. Which
>>>> should give you O(N log(N)) performance.
>>>
>>> I'm not sure I understand. That would require that the arrays are stored
>>> in a continues block of memory? Won't "head" and "tail" always point to
>>> start of the array and the end of the array?
>>
>> Most of the time yes. But not all of the time. It would be fair to say
>> that 'head' and 'tail' would be inside the GC memory region of the array
>> and are <= or >= of the start/end of an array, respectively. More
>> importantly, both objects and pointers would resolve to memory chunks as
>> well and be included in the alias resolution algorithm.
>
> Now I think I start to understand.
>
>> I'm assuming you
>> currently separate object and array alias resolution and don't handle
>> pointers at all.
>
> Yes. Pointers are handled as well. It's handled similar to
> arrays/slices. First it always serializes what the pointer points to.
> Then in a post process step, after all serialization is done, it
> replaces all serialized pointers, that points to a value that has been
> serialized, with a reference. If a given pointer doesn't point to a
> value that has be serialized it's left as is.

Can a pointer point to the interior of an object? To an element of an array?

> I can see if I this memory chunk approach can be used instead. How will
> this be used with the balanced tree, could you give a simple example?

Well, balanced trees need a comparison function so:

struct Node {
     void* head; // Start of the memory chunk
     void* tail; // End of the memory chunk
     bool hasAliases; // True if there are more than one reference to this chunk
     //... Other meta-data, i.e. ID,

     int opCmp(const ref Node b) {
         if(  tail < b.head) return -1;
         if(b.tail <   head) return 1;
         return 0;
     }
}

On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency:

head = min(head,b.head);
tail = max(tail,b.tail);
hasAliases = true;

After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator:

     int opCmp(const ref void* b) {
         if(tail < b   ) return -1;
         if(b    < head) return 1;
         return 0;
     }

// Which you'd probably use like:

if( auto node = arr.ptr in setOfAliases ) {
     auto offset = arr.ptr - node.head;
     //...
} else {//...

So at the pseudo-code level, it would be something like:

foreach(obj; [objects,pointers,arrays]) {
    auto node = Node(obj);
    setOfAliases.insert(node);
    // Convert obj into an intermediate form
}

setOfAliases.pruneUnaliasedNodes;
setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc

foreach(obj; [objects,pointers,arrays]) {
     if( auto node = arr.ptr in setOfAliases ) {
         auto offset = arr.ptr - node.head;
         // Perform an aliased archival
     } else {
         // Perform an unaliased archival
     }
}

I hope that helped, though I'm not sure if I really answered your question or not.