[GSoC] Improved FlatBuffers and/or Protobuf Support ~ Binary Serialization

Dragos Carp dragoscarp at gmail.com
Fri Mar 29 23:19:10 UTC 2019


Hi Ahmet,

welcome to the D forum.

As the author of protobuf-d I'll try to give you some feedback to 
the points you made. I couldn't find the time to also do the 
flatbuffers implementation, so my comments are related just to 
protobuf. If you are interested to do the Flatbuffers work, I'll 
be more than happy to play the mentor role for you - I have some 
ideas there. But let's get to the existing, real stuff.

On Friday, 29 March 2019 at 00:18:40 UTC, Ahmet Sait wrote:
>
>   - It should be possible to parse schema and output mixable D 
> code at
>     compile time
>   const schema = `message Person
>   {
>       required string name = 1;
>       required int32 id = 2;
>   }`;
>   mixin(fromProtoSchema(schema));

I don't think that it is worth the effort.
1. A complete implementation for .proto file parsing is 
complicated 
(https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
2. Theoretically, protobuf definitions does not change often, and 
considering that compile time parsing is somehow slow, the 
benefit of parsing them at every compilation is actually a 
drawback.
3. protoc plugin is the Protobuf recommended way of parsing 
.proto definitions: 
https://developers.google.com/protocol-buffers/docs/proto3#generating

>
>   - There should be no need for a schema definition, a custom 
> type annotated
>     with UDAs should be enough
>   struct Person
>   {
>       @protoID(1) string name;
>       @protoID(2) int age;
>   }
>   serialize(Person("Walter", 42), stdout);

protobuf-d does that already, see the unittest for toProtobuf: 
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193

>
> - Simple things should be simple
> It should be dead simple to do basic stuff:
>   auto obj = deserialize!SomeType(stdin);
>   serialize(obj, stdout);

Again, protobuf-d has that: 
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/decoding.d#L214

>
> - Complex things should be possible
> The library should be flexible and extensible without 
> modification

toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods are 
protobuf customization points in protobuf-d. For an example see 
https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54

>
> - Support for library and tool based usage
> It should be usable as a library without any additional setup 
> but also usable
> as a schema compiler.

protobuf-d is usable as library, see 
https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
Regarding the usage as schema compiler, review the first comment.

>
> - Support for common Phobos types
> Nullable, tuples, std.datetime, std.complex, std.bigint, 
> containers...

Protobuf is a language agnostic serialization format. Having 
.protobuf definitions for common Phobos types will just shift the 
problem somewhere else (i.e. other programming languages).

Nevertheless Protobuf addresses probably the same problem by 
defining the "well-known" types 
(https://developers.google.com/protocol-buffers/docs/reference/google.protobuf).
protobuf-d also supports those, so that std.datetime.Systime is 
mapped to google.protobuf.Timestamp and std.datetime.Duration to 
google.protobuf.Duration

>
> I'm personally not happy with any of the existing libraries but 
> they will
> likely be a valuable resource regardless.

The existing protobuf libraries are quite mature and probably 
improving those will be time better spent than starting once 
again from scratch.

>
> Questions:
> - How much work would be ideal for GSoC? Should I be working on 
> flatbuffers
>   only or protobuf too? (Seems like flatbuffers need more love)

I'm quite satisfied with protobuf-d implementation: it is small 
(aprox. 4k LOC), clean and quite feature complete - 26 failing 
conformance test vs. 27 resp. 41 for the official C++ and Java 
counterparts. Of course there is still enough space for 
improvement, but at least in case of protobuf-d not enough for a 
GSoC application.

On the other hand Flatbuffers is a very good candidate: it has 
its own specialties, but is also somehow similar to protobuf. 
This would reduce the planning risks considerably.

> - Should I tackle the std.serialization [3] idea?

I see std.serialization as a high level API. Probably this will 
be a long term std.experimental.serialization, that will require 
quite some time till multiple serialization formats implements 
it. Just after that, if it will ever happen, we can remove the 
"experimental" part. I don't see this as a suited GSoC project.

> - Any other serialization related suggestions?
https://arrow.apache.org/


Cheers, Dragos


More information about the Digitalmars-d mailing list