"Protocol Buffers" for Tango & Phobos ??

Tue Sep 2 21:29:50 PDT 2008

Brian Price wrote:
> == Quote from Nick B (nick.barbalich at gmail.com)'s article
>> Hi
>> I came across this the other day, and no one has mentioned this, on this
>> news group before, I thought I would bring the subject to the
>> communities attention, so to speak.
>> Google has very recently, open sourced "Protocol Buffers".
>> What is it you ask ?  In a couple of lines it is a language-neutral,
>> platform-neutral, extensible way of serializing structured data for use
>> in communications protocols, data storage, and more.
>> Think XML, but smaller, faster, and simpler.
>> Why not just use XML ?
>> Protocol buffers have many advantages over XML for serializing
>> structured data.  Protocol buffers:
>> * are simpler
>> * are 3 to 10 times smaller
>> * are 20 to 100 times faster
>> * are less ambiguous
>> * generate data access classes that are easier to use programmatically.
>> PB supports Java, Python,and C++ currently.
>> A more detailed overview can be found here:
>> http://code.google.com/apis/protocolbuffers/docs/overview.html
>> and the FAQ here:
>> http://code.google.com/apis/protocolbuffers/docs/faq.html
>> See the answer to the question "Can I add support for a new language to
>> protocol buffers?" inside the FAQ.
>> Some Tips and comments can be found here:
>> http://zunger.livejournal.com/164024.html
>> My questions.
>> Does the D community see this of interest ?
>> Is this something they might use ?
>> Do they see value in it being added to the respective
>> Tango or Phobos frameworks ?
>> any other comments ?
>> cheers
>> Nick B.
> 
> Hate to say it but this is yet another case of reinventing the wheel.  The worst
> thing about this throwback to the early 90s is its inherent violation of DRY.
> This package intermingles and confuses three separate issues, treating them as a
> monolithic whole, namely: serialization, marshalling, and versioning.

Can you explain where you see violation of DRY exactly? Also, can you 
explain the big difference between serialization and marshalling? I find 
it hard to draw any meaningful line between the two..

> Binary solutions such as this, while more efficient byte-wise, run into
> portability problems especially with floating point values.

So, what sort of problems do you see with this specific solution? To me, 
it looks like a text based solution would have more problems with 
floating point values, because they typically get converted to/from 
decimal..

> They also lose the
> human readability of the data (sans the use of special tools).  Often the use of a
> binary solution is a case of premature optimization and indicates bad design at a
> higher level.  The marshalling strategy should be 'pluggable' so that one can use
> an easier to debug, human readable, data format during development.

Well, in my experience, most data tends to live somewhere not readily 
accessible and/or human readable, so you need a tool one way or the 
other. Making generic tools for PBs shouldn't be too hard, as far as I 
can tell they do have support for generic message handling, 
'reflection', there's a DebugString() method on each of them, etc..

> Versioning problems can be (and have been) addressed in a number of ways over the
> years, the simplest and imo often the best, is to sidestep the problem by
> serializing a map of the data rather than just the raw data.  That is, each value
> has associated metadata indicating its field name (a key-value mapping iow).

And the PB encoding differs from this how exactly? Each value has a key 
preceding it, which identifies which field it is..

> If XML is too heavy a hammer, JSON (with or without embedded metadata) is a good
> alternative with quite a bit of industry support.  In fact, the use of JSON
> encoding with embedded metadata gives you a lightweight solution that works with
> nearly every language in present use.  Even better, any language with good
> reflection support (such as Java) allows an implementation that does not violate DRY.

I haven't tried it, but it looks like generic mapping to/from JSON would 
be very easily done on top of PBs, for when you need it (like when 
publishing a JavaScript API). But why would you want to use JSON in any 
other case? Compared to PBs, it's harder/slower to parse, definitely 
takes more space/bandwidth, has fewer types, and you don't get the nice 
free API protocol buffers provide you for each message type.

> This "Protocol Buffers" solution is a dinosaur, a throwback, yet another wheel
> when we need ground effects or anti-gravity.

Well, you haven't convinced me yet :)

Mitja