Range interface for std.serialization
Dmitry Olshansky
dmitry.olsh at gmail.com
Mon Aug 26 09:41:39 PDT 2013
26-Aug-2013 18:37, Jacob Carlborg пишет:
> On 2013-08-26 15:57, Dmitry Olshansky wrote:
>
>> It's not just yet another. It isn't about particular shade of color. I
>> can explain the ifs and whys of any design decision here if there is a
>> doubt. I don't care for names but I see the precise semantics and there
>> is little left to define.
>
> Yes, please do.
> As I see it there are four parts of the interface that
> need to be solved:
>
> 1. How to get data in to the serializer
> 2. How to get data out of the serializer
> 3. How to get data in to the archiver
> 4. How to get data out of the archiver
More a question of implementation then.
Answer to both of them - wrapping an output range in archiver and being
one for serializer. As for connection with your current API that gives
away an array - just think std.array.Appender (and a multitude more ways
to chew the data).
Looking at your current code in depth... finally. I would have a problem
starting the productive answers to the questions.
First things first - there should not be a key parameter aside from
stuff added by archiver itself for its internal needs. Nor is there a
simple way to locate data by key afterwards (certainly not every format
defines such). It would require some tagged object model and there is no
such _requirement_ in the serialization.
Citing a line from archive.d
"""
There are a couple of limitations when implementing a new archive, this
is due* to how the serializer and the archive interface is built. Except
for what this interface says explicitly an archive needs to be able to
handle the following:
Unarchive a value based on a key or id, regardless of where in the
archive the value is located
"""
- this is impossible in the setting of serialization.
Serialization is NOT about modeling the whole dataset and providing
queries into that. A model of serialization is that of unix _tar_ just
dump any graphs of data to "TAPE" and/or restore back. If you can do
processing on the fly - bonus points (and what I push for can).
This confusion and its consequences are no doubt due to building on
std.xml and the interface it presents.
Second - no, not every operation has to return some piece of Data that
is produced. It would be tremendously inefficient and require keeping
memory references to that alive (or be unsafe in addition to slow).
Instead it just outputs something to the underlying sink.
class Serializer
{
...
//now we are output range for anything.
//add constraint as you see fit
void put(T)(T value)
{
serializeInternal(value);//calls methods of archiver
}
Archiver archiver;
}
class MyArchiver(Output)
if(isOutputRange!(Output, dchar)) //or ubyte if binary
{
...
this(Output sink)
{
this.sink = sink;
}
//and a method for example
private void archivePrimitive (T) (T value, string key, Id id)
{
//along the lines of this, don't take literally
//I've no idea of the actual format for tags you use
formattedWrite(sink, "<%s>%s</%s>", id, value, id);
}
...
Output sink;
}
The user just writes e.g.
auto app = appender!(char[])();
auto archiver = new XmlArchiver(app);
auto serializer = new Serializer(archiver);
and works with it serializer as with output range of anything (I showed
the example before)
Then once the data is required just peek at app.data and there it is.
So in-memory case is easily covered. Other sinks bring more benefits see
e.g.:
auto sink = stdout.lockingtextWriter();
And the same code now writes directly to stdout and no worries if there
is a lot of stuff to write.
....
No matter how I look at code it needs a lot of (re-)work.
For instance Archive type is obsessed with strings. I can't see a need
for that many strings attached :)
The awful duality of Serializer that results literally in:
if(mode == serializing) doSerializing else do doDeserializing
And the spectacular pair:
T deserialize (T) (Data data, string key = "")
{
mode = deserializing;
...
}
Data serialize (T) (T value, string key = null)
{
mode = serializing;
...
}
Amount of extra code executed per bit of output is remarkably high, and
a hallmark of standard library is pay as you go principle. We (as
collectively Phobos devs) have to set the baseline for performance, if
it's too low we're out of the game.
For example - events are cute, but do we all need them? Do we always
want an overhead of checking that stuff per field written?
Instead decompose these layers, make them stackable for instance:
auto serializer = new Serializer(...);
auto tracingSerializer = new TracingSerializer(serializer);
Or just make 2 kinds of serializers with static if on a single template
parameter bool withEvents it's trivial. Then a couple of aliases would
finish the job.
With that I'm observe that events are attached to types/fields... hum,
in such a case it needs work to make them zero-cost if absent.
>> Pardon me if my tone is a bit sharp. I like any other want the best
>> design we can get. Now that the great deal of work is done it would be a
>> shame to present it in a bad package.
>
> Yes, that's why we're having this discussion.
And I'm afraid it's too late or the changes are too far reaching but
let's try it.
I'm especially destroyed by (and the fact that it's a part of interface
to implement):
void archiveEnum (bool value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (bool value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (byte value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (char value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (dchar value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (int value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (long value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (short value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (ubyte value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (uint value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (ulong value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (ushort value, string baseType, string key, Id id);
/// Ditto
void archiveEnum (wchar value, string baseType, string key, Id id);
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list