Range interface for std.serialization

Tue Aug 27 12:23:22 PDT 2013

On 2013-08-26 18:41, Dmitry Olshansky wrote:

> More a question of implementation then.
>
> Answer to both of them - wrapping an output range in archiver and being
> one for serializer. As for connection with your current API that gives
> away an array - just think std.array.Appender (and a multitude more ways
> to chew the data).

Ok, thank you.

> Looking at your current code in depth... finally. I would have a problem
> starting the productive answers to the questions.
>
> First things first - there should not be a key parameter aside from
> stuff added by archiver itself for its internal needs. Nor is there a
> simple way to locate data by key afterwards (certainly not every format
> defines such). It would require some tagged object model and there is no
> such _requirement_ in the serialization.

I would really like the serializer not being dependent on the order of 
the fields of the types it's (de)serializing.

> Citing a line from archive.d
> """
> There are a couple of limitations when implementing a new archive, this
> is due* to how the serializer and the archive interface is built. Except
> for what this interface says explicitly an archive needs to be able to
> handle the following:
> Unarchive a value based on a key or id, regardless of where in the
> archive  the value is located
> """
> - this is impossible in the setting of serialization.
>
> Serialization is NOT about modeling the whole dataset and providing
> queries into that. A model of serialization is that of unix _tar_  just
> dump any graphs of data to "TAPE" and/or restore back. If you can do
> processing on the fly - bonus points (and what I push for can).
>
> This confusion and its consequences are no doubt due to building on
> std.xml and the interface it presents.

It might be due to the XML format in general but certainly not due to 
std.xml. The XmlArchie originally used the XML package in Tango, long 
before it supported std.xml.

> Second - no, not every operation has to return some piece of Data that
> is produced. It would be tremendously inefficient and require keeping
> memory references to that alive (or be unsafe in addition to slow).
> Instead it just outputs something to the underlying sink.
>
> class Serializer
> {
>      ...
>      //now we are output range for anything.
>      //add constraint as you see fit
>      void put(T)(T value)
>      {
>          serializeInternal(value);//calls methods of archiver
>      }
>
>      Archiver archiver;
> }
>
> class MyArchiver(Output)
>      if(isOutputRange!(Output, dchar)) //or ubyte if binary
> {
>      ...
>      this(Output sink)
>      {
>          this.sink = sink;
>      }
>
>      //and a method for example
>      private void archivePrimitive (T) (T value, string key, Id id)
>      {
>          //along the lines of this, don't take literally
>          //I've no idea of the actual format for tags you use
>             formattedWrite(sink, "<%s>%s</%s>", id, value, id);
>          }
>      ...
>      Output sink;
> }

Good, thank you.

> The user just writes e.g.
>
> auto app = appender!(char[])();
> auto archiver = new XmlArchiver(app);
> auto serializer = new Serializer(archiver);
>
> and works with it serializer as with output range of anything (I showed
> the example before)
>
> Then once the data is required just peek at app.data and there it is.
> So in-memory case is easily covered. Other sinks bring more benefits see
> e.g.:
>
> auto sink = stdout.lockingtextWriter();
>
> And the same code now writes directly to stdout and no worries if there
> is a lot of stuff to write.

Thank you for giving some concrete ides of the API.

> ....
>
> No matter how I look at code it needs a lot of (re-)work.
> For instance Archive type is obsessed with strings. I can't see a need
> for that many strings attached :)

Yeah, I know. It's manly because the archive doesn't use templates, 
because it need to implement an interface.

> The awful duality of Serializer that results literally in:
> if(mode == serializing) doSerializing else do doDeserializing
>
> And the spectacular pair:
>
> T deserialize (T) (Data data, string key = "")
> {
>          mode = deserializing;
>      ...
> }
>
> Data serialize (T) (T value, string key = null)
> {
>          mode = serializing;
>      ...
> }

I guess that's easier to avoid if I divide Serializer in to two separate 
parts, one for serializing and one for deserializing.

> Amount of extra code executed per bit of output is remarkably high

Now I think you're exaggerating a bit.

, and
> a hallmark of standard library is pay as you go principle. We (as
> collectively Phobos devs) have to set the baseline for performance, if
> it's too low we're out of the game.
>
> For example - events are cute, but do we all need them? Do we always
> want an overhead of checking that stuff per field written?

Sure, there are some overhead of calling some functions but the events 
are checked for at compile time so the overhead should be minimal.

> Instead decompose these layers, make them stackable for instance:
>
> auto serializer = new Serializer(...);
> auto tracingSerializer = new TracingSerializer(serializer);
>
> Or just make 2 kinds of serializers with static if on a single template
> parameter bool withEvents it's trivial. Then a couple of aliases would
> finish the job.

I don't think that will be needed. I can see if I can refactor a bit to 
minimize the overhead even more.

> With that I'm observe that events are attached to types/fields... hum,
> in such a case it needs work to make them zero-cost if absent.

The only cost is calling "triggerEvents" and "triggerEvent", the rest is 
performed at compile time.

> And I'm afraid it's too late or the changes are too far reaching but
> let's try it.
>
> I'm especially destroyed  by (and the fact that it's a part of interface
> to implement):
>
>      void archiveEnum (bool value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (bool value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (byte value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (char value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (dchar value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (int value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (long value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (short value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ubyte value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (uint value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ulong value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (ushort value, string baseType, string key, Id id);
>
>      /// Ditto
>      void archiveEnum (wchar value, string baseType, string key, Id id);

So you want templates instead?

I have read your posts, thank you for your comments. I'm planning now to:

* Split Serializer in to two parts
* Make the parts struct
* Possibly provide class wrappers
* Split Archive in two parts
* Add range interface to Serializer and Archive

-- 
/Jacob Carlborg