Range interface for std.serialization

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Aug 26 09:41:39 PDT 2013


26-Aug-2013 18:37, Jacob Carlborg пишет:
> On 2013-08-26 15:57, Dmitry Olshansky wrote:
>
>> It's not just yet another. It isn't about particular shade of color. I
>> can explain the ifs and whys of any design decision here if there is a
>> doubt. I don't care for names but I see the precise semantics and there
>> is little left to define.
>
> Yes, please do.

> As I see it there are four parts of the interface that
> need to be solved:
>
> 1. How to get data in to the serializer
> 2. How to get data out of the serializer
 > 3. How to get data in to the archiver
 > 4. How to get data out of the archiver

More a question of implementation then.

Answer to both of them - wrapping an output range in archiver and being 
one for serializer. As for connection with your current API that gives 
away an array - just think std.array.Appender (and a multitude more ways 
to chew the data).

Looking at your current code in depth... finally. I would have a problem 
starting the productive answers to the questions.

First things first - there should not be a key parameter aside from 
stuff added by archiver itself for its internal needs. Nor is there a 
simple way to locate data by key afterwards (certainly not every format 
defines such). It would require some tagged object model and there is no 
such _requirement_ in the serialization.

Citing a line from archive.d
"""
There are a couple of limitations when implementing a new archive, this 
is due* to how the serializer and the archive interface is built. Except 
for what this interface says explicitly an archive needs to be able to 
handle the following:
Unarchive a value based on a key or id, regardless of where in the 
archive  the value is located
"""
- this is impossible in the setting of serialization.

Serialization is NOT about modeling the whole dataset and providing 
queries into that. A model of serialization is that of unix _tar_  just 
dump any graphs of data to "TAPE" and/or restore back. If you can do 
processing on the fly - bonus points (and what I push for can).

This confusion and its consequences are no doubt due to building on 
std.xml and the interface it presents.

Second - no, not every operation has to return some piece of Data that 
is produced. It would be tremendously inefficient and require keeping 
memory references to that alive (or be unsafe in addition to slow). 
Instead it just outputs something to the underlying sink.

class Serializer
{
	...
	//now we are output range for anything.
	//add constraint as you see fit
	void put(T)(T value)
	{
		serializeInternal(value);//calls methods of archiver
	}

	Archiver archiver;
}

class MyArchiver(Output)
	if(isOutputRange!(Output, dchar)) //or ubyte if binary
{
	...
	this(Output sink)
	{
		this.sink = sink;
	}
	
	//and a method for example
	private void archivePrimitive (T) (T value, string key, Id id)
	{
		//along the lines of this, don't take literally
		//I've no idea of the actual format for tags you use
    		formattedWrite(sink, "<%s>%s</%s>", id, value, id);
         }
	...
	Output sink;
}


The user just writes e.g.

auto app = appender!(char[])();
auto archiver = new XmlArchiver(app);
auto serializer = new Serializer(archiver);

and works with it serializer as with output range of anything (I showed 
the example before)

Then once the data is required just peek at app.data and there it is.
So in-memory case is easily covered. Other sinks bring more benefits see 
e.g.:

auto sink = stdout.lockingtextWriter();

And the same code now writes directly to stdout and no worries if there 
is a lot of stuff to write.

....

No matter how I look at code it needs a lot of (re-)work.
For instance Archive type is obsessed with strings. I can't see a need 
for that many strings attached :)
The awful duality of Serializer that results literally in:
if(mode == serializing) doSerializing else do doDeserializing

And the spectacular pair:

T deserialize (T) (Data data, string key = "")
{
         mode = deserializing;
	...
}

Data serialize (T) (T value, string key = null)
{
         mode = serializing;
	...
}

Amount of extra code executed per bit of output is remarkably high, and 
a hallmark of standard library is pay as you go principle. We (as 
collectively Phobos devs) have to set the baseline for performance, if 
it's too low we're out of the game.

For example - events are cute, but do we all need them? Do we always 
want an overhead of checking that stuff per field written?

Instead decompose these layers, make them stackable for instance:

auto serializer = new Serializer(...);
auto tracingSerializer = new TracingSerializer(serializer);

Or just make 2 kinds of serializers with static if on a single template 
parameter bool withEvents it's trivial. Then a couple of aliases would 
finish the job.

With that I'm observe that events are attached to types/fields... hum, 
in such a case it needs work to make them zero-cost if absent.

>> Pardon me if my tone is a bit sharp. I like any other want the best
>> design we can get. Now that the great deal of work is done it would be a
>> shame to present it in a bad package.
>
> Yes, that's why we're having this discussion.

And I'm afraid it's too late or the changes are too far reaching but 
let's try it.

I'm especially destroyed  by (and the fact that it's a part of interface 
to implement):

     void archiveEnum (bool value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (bool value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (byte value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (char value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (dchar value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (int value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (long value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (short value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (ubyte value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (uint value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (ulong value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (ushort value, string baseType, string key, Id id);

     /// Ditto
     void archiveEnum (wchar value, string baseType, string key, Id id);


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list