Range interface for std.serialization

Wed Aug 21 20:13:34 PDT 2013

On Wednesday, 21 August 2013 at 20:21:49 UTC, Dicebot wrote:
> My 5 cents:
>
> On Wednesday, 21 August 2013 at 18:45:48 UTC, Jacob Carlborg 
> wrote:
>> If this alternative is chosen how should the range for the 
>> XmlArchive work like? Currently the archive returns a string, 
>> should the range just wrap the string and step through 
>> character by character? That doesn't sound very effective.
>
> It should be range of strings - one call to popFront should 
> serialize one object from input object range and provide 
> matching string buffer.

I don't like this because it still caches the whole object into 
memory. In a memory-restricted application, this is unacceptable.

I think one call to popFront should release part of the 
serialized object. For example:

struct B {
     int c, d;
}

struct A {
     int a;
     B b;
}

The JSON output of this would be:

     {
         a: 0,
         b: {
             c: 0,
             d: 0
         }
     }

There's no reason why the serializer can't output this in chunks:

Chunk 1:

     {
         a: 0,

Chunk 2:

         b: {

Etc...

Most archive formats should support chunking. I realize this may 
be a rather large change to Orange, but I think it's a direction 
it should be headed.

>> Alternative AO2:
>>
>> Another idea is the archive is an output range, having this 
>> interface:
>>
>> auto archive = new XmlArchive!(char);
>> archive.writeTo(outputRange);
>>
>> auto serializer = new Serializer(archive);
>> serializer.serialize(new Object);
>>
>> Use the output range when the serialization is done.
>
> I can't imagine a use case for this. Adding ranges just because 
> you can is not very good :)

I completely agree.

>> A problem with this, actually I don't know if it's considered 
>> a problem, is that the following won't be possible:
>>
>> auto archive = new XmlArchive!(InputRange);
>> archive.data = archive.data;
>
> What this snippet should do?
>
>> Which one would usually expect from an OO API. The problem 
>> here is that the archive is typed for the original input range 
>> but the returned range from "data" is of a different type.
>
> Range-based algorithms don't assign ranges. Transferring data 
> from one range to another is done via copy(sourceRange, 
> destRange) and similar tools.

This is just a read-only property, which arguably doesn't break 
misconceptions. There should be no reason to assign directly to a 
range.

> It looks like difficulties come from your initial assumption 
> that one call to serialize/deserialize implies one object - in 
> that model ranges hardly are useful. I don't think it is a 
> reasonable restriction. What is practically useful is 
> (de)serialization of large list of objects lazily - and this is 
> a natural job for ranges.

I agree that (de)serializing a large list of objects lazily is 
important, but I don't think that's the natural interface for a 
Serializer. I think that each object should be lazily serialized 
instead to maximize throughput.

If a Serializer is defined as only (de)serializing a single 
object, then serializing a range of Type would be as simple as 
using map() with a Serializer (getting a range of Serialize). If 
the allocs are too much, then the same serializer can be used, 
but serialize one-at-a-time.

My main point here is that data should be written as it's being 
serialized. In a networked application, it may take a few packets 
to encode a larger object, so the first packets should be sent 
ASAP.

As usual, feel free to destroy =D