Request for review - std.serialization (orange)

Matt Soucy msoucy at csh.rit.edu
Tue Apr 2 06:38:17 PDT 2013


On 04/02/2013 03:21 AM, Jacob Carlborg wrote:
> On 2013-04-01 19:13, Jesse Phillips wrote:
>
>> Let me see if I can describe this.
>>
>> PB does encoding to binary by type. However it also has a schema in a
>> .proto file. My first concern is that this file provides the ID to use
>> for each field, while arbitrary the ID must be what is specified.
>>
>> The second one I'm concerned with is option to pack repeated fields. I'm
>> not sure the specifics for this encoding, but I imagine some compression.
>>
>> This is why I think I'd have to implement my own Serializer to be able
>> to support PB, but also believe we could have a binary format based on
>> PB (which maybe it would be possible to create a schema of Orange
>> generated data, but it would be hard to generate data for a specific
>> schema).
>
> As I understand it there's a "schema definition", that is the .proto
> file. You compile this schema to produce D/C++/Java/whatever code that
> contains structs/classes with methods/fields that matches this schema.
>
> If you need to change the schema, besides adding optional fields, you
> need to recompile the schema to produce new code, right?
>
> If you have a D class/struct that matches this schema (regardless if
> it's auto generated from the schema or manually created) with actual
> instance variables for the fields I think it would be possible to
> (de)serialize into the binary PB format using Orange.
>
> Then there's the issue of the options supported by PB like optional
> fields and pack repeated fields (which I don't know what it means).
>
> It seems PB is dependent on the order of the fields so that won't be a
> problem. Just disregard the "key" that is passed to the archive and
> deserialize the next type that is expected. Maybe you could use the
> schema to do some extra validations.
>
> Although, I don't know how PB handles multiple references to the same
> value.
>
> Looking at this:
>
> https://developers.google.com/protocol-buffers/docs/overview
>
> Below "Why not just use XML?", they both mention a text format (not to
> be confused with the schema, .proto) and a binary format. Although the
> text format seems to be mostly for debugging.
>

Unfortunately, only partially correct. Optional isn't an "option", it's 
a way of saying that a field may be specified 0 or 1 times. If two 
messages with the same ID are read and the ID is considered optional in 
the schema, then they are merged.

Packed IS an "option", which can only be done to primitives. It changes 
serialization from:
 > return raw.map!(a=>(MsgType!BufferType | (id << 3)).toVarint() ~ 
a.writeProto!BufferType())().reduce!((a,b)=>a~b)();
to
 > auto msg = raw.map!(writeProto!BufferType)().reduce!((a,b)=>a~b)();
 > return (2 | (id << 3)).toVarint() ~ msg.length.toVarint() ~ msg;

(Actual snippets from my partially-complete protocol buffer library)

If you had a struct that matches that schema (PB messages have value 
semantics) then yes, in theory you could do something to serialize the 
struct based on the schema, but you'd have to maintain both separately.

PB is NOT dependent on the order of the fields during serialization, 
they can be sent/received in any order. You could use the schema like 
you mentioned above to tie member names to ids, though.

PB uses value semantics, so multiple references to the same thing isn't 
really an issue that is covered.

I hadn't actually noticed that TextFormat stuff before...interesting. I 
might take a look at that later when I have time.

-Matt Soucy


More information about the Digitalmars-d mailing list