Range interface for std.serialization

Thu Aug 22 19:06:33 PDT 2013

On Thursday, 22 August 2013 at 14:48:57 UTC, Dicebot wrote:
> On Thursday, 22 August 2013 at 03:13:46 UTC, Tyler Jameson 
> Little wrote:
>> On Wednesday, 21 August 2013 at 20:21:49 UTC, Dicebot wrote:
>>> It should be range of strings - one call to popFront should 
>>> serialize one object from input object range and provide 
>>> matching string buffer.
>>
>> I don't like this because it still caches the whole object 
>> into memory. In a memory-restricted application, this is 
>> unacceptable.
>
> Well, in memory-restricted applications having large object at 
> all is unacceptable. Rationale is that you hardly ever want 
> half-deserialized object. If environment is very restrictive, 
> smaller objects will be used anyway (list of smaller objects).

It seems you and I are trying to solve two very different 
problems. Perhaps if I explain my use-case, it'll make things 
clearer.

I have a server that serializes data from a socket, processes 
that data, then updates internal state and sends notifications to 
clients (involves serialization as well).

When new clients connect, they need all of this internal state, 
so the easiest way to do this is to create one large object out 
of all of the smaller objects:

     class Widget {
     }

     class InternalState {
         Widget[string] widgets;
         ... other data here
     }

InternalState isn't very big by itself; it just has an 
associative array of Widget pointers with some other rather small 
data. When serialized, however, this can get quite large. Since 
archive formats are orders of magnitude less-efficient than 
in-memory stores, caching the archived version of the internal 
state can be prohibitively expensive.

Let's say the serialized form of the internal state is 5MB, and I 
have 128MB available, while 50MB or so is used by the 
application. This leaves about 70MB, so I can only support 14 
connected clients.

With a streaming serializer (per object), I'll get that 5MB down 
to a few hundred KB and I can support many more clients.

>> ...
>> There's no reason why the serializer can't output this in 
>> chunks
>
> Outputting on its own is not useful to discuss - in pipe model 
> output matches input. What is the point in outputting partial 
> chunks of serialized object if you still need to provide it as 
> a whole to the input?

This only makes sense if you are deserializing right after 
serializing, which is *not* a common thing to do.

Also, it's much more likely to need to serialize a single object 
(as in a REST API, 3d model parser [think COLLADA] or config 
parser). Providing a range seems to fit only a small niche, 
people that need to dump the state of the system. With 
single-object serialization and chunked output, you can define 
your own range to get the same effect, but with an API as you 
detailed, you can't avoid memory problems without going outside 
std.