Range interface for std.serialization

Thu Aug 22 07:16:54 PDT 2013

On Thursday, 22 August 2013 at 07:16:11 UTC, Jacob Carlborg wrote:
> On 2013-08-22 05:13, Tyler Jameson Little wrote:
>
>> I don't like this because it still caches the whole object 
>> into memory.
>> In a memory-restricted application, this is unacceptable.
>
> It need to store all serialized reference types, otherwise it 
> cannot properly serialize a complete object graph. We don't 
> want duplicates. Example:
>
> The following code:
>
> auto bar = new Bar;
> bar.a = 3;
>
> auto foo = new Foo;
> foo.a = bar;
> foo.b = bar;
>
> Is serialized as:
>
> <object runtimeType="main.Foo" type="main.Foo" key="0" id="0">
>     <object runtimeType="main.Bar" type="main.Bar" key="a" 
> id="1">
>         <int key="a" id="2">3</int>
>     </object>
>     <reference key="b">1</reference>
> </object>
>
> When "foo.b" is just serializes a reference, not the complete 
> object, because that has already been serialized. The 
> serializer needs to keep track of that.

Right, but it doesn't need to keep the serialized data in memory.

>> I think one call to popFront should release part of the 
>> serialized
>> object. For example:
>>
>> struct B {
>>     int c, d;
>> }
>>
>> struct A {
>>     int a;
>>     B b;
>> }
>>
>> The JSON output of this would be:
>>
>>     {
>>         a: 0,
>>         b: {
>>             c: 0,
>>             d: 0
>>         }
>>     }
>>
>> There's no reason why the serializer can't output this in 
>> chunks:
>>
>> Chunk 1:
>>
>>     {
>>         a: 0,
>>
>> Chunk 2:
>>
>>         b: {
>>
>> Etc...
>
> It seems hard to keep track of nesting. I can't see how pretty 
> printing using this technique would work.

Can't you just keep a counter? When you enter anything that would 
increase the indentation level, increment the indentation level. 
When leaving, decrement. At each level, insert whitespace equal 
to indentationLevel * whitespacePerLevel. This seems pretty 
trivial, unless I'm missing something.

Also, I didn't check, but it turns off pretty-printing be 
default, right?

>> This is just a read-only property, which arguably doesn't break
>> misconceptions. There should be no reason to assign directly 
>> to a range.
>
> How should I set the data used for deserializing?

How about passing it in with a function? Each range passed this 
way would represent a single object, so the current 
deserialize!Foo(InputRange) would work the same way it does now.

>> I agree that (de)serializing a large list of objects lazily is
>> important, but I don't think that's the natural interface for a
>> Serializer. I think that each object should be lazily 
>> serialized instead
>> to maximize throughput.
>>
>> If a Serializer is defined as only (de)serializing a single 
>> object, then
>> serializing a range of Type would be as simple as using map() 
>> with a
>> Serializer (getting a range of Serialize). If the allocs are 
>> too much,
>> then the same serializer can be used, but serialize 
>> one-at-a-time.
>>
>> My main point here is that data should be written as it's being
>> serialized. In a networked application, it may take a few 
>> packets to
>> encode a larger object, so the first packets should be sent 
>> ASAP.
>>
>> As usual, feel free to destroy =D
>
> Again, how does one keep track of nesting in formats like XML, 
> JSON and YAML?

YAML will take a little extra care since whitespace is 
significant, but it should work well enough as I've described 
above.