[GSoC] Improved FlatBuffers and/or Protobuf Support ~ Binary Serialization

Ahmet Sait nightmarex1337 at hotmail.com
Mon Apr 1 23:42:24 UTC 2019


On Friday, 29 March 2019 at 23:19:10 UTC, Dragos Carp wrote:
> Hi Ahmet,
>
> welcome to the D forum.
>
> As the author of protobuf-d I'll try to give you some feedback 
> to the points you made. I couldn't find the time to also do the 
> flatbuffers implementation, so my comments are related just to 
> protobuf. If you are interested to do the Flatbuffers work, 
> I'll be more than happy to play the mentor role for you - I 
> have some ideas there. But let's get to the existing, real 
> stuff.

Glad to hear, thanks!

> On Friday, 29 March 2019 at 00:18:40 UTC, Ahmet Sait wrote:
>>
>>   - It should be possible to parse schema and output mixable D 
>> code at
>>     compile time
>>   const schema = `message Person
>>   {
>>       required string name = 1;
>>       required int32 id = 2;
>>   }`;
>>   mixin(fromProtoSchema(schema));
>
> I don't think that it is worth the effort.
> 1. A complete implementation for .proto file parsing is 
> complicated 
> (https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
> 2. Theoretically, protobuf definitions does not change often, 
> and considering that compile time parsing is somehow slow, the 
> benefit of parsing them at every compilation is actually a 
> drawback.
> 3. protoc plugin is the Protobuf recommended way of parsing 
> .proto definitions: 
> https://developers.google.com/protocol-buffers/docs/proto3#generating

It doesn't immediately strike me as complicated and 
https://github.com/msoucy/dproto apparently has this feature so 
I'm guessing it can be used as a reference. Compile times are of 
course not expected to be good with this approach but it's 
promising if Stefan's New CTFE gets completed in the future. Then 
again you likely have more experience about this so I should 
probably defer this to when New CTFE is ready.

>>   - There should be no need for a schema definition, a custom 
>> type annotated
>>     with UDAs should be enough
>>   struct Person
>>   {
>>       @protoID(1) string name;
>>       @protoID(2) int age;
>>   }
>>   serialize(Person("Walter", 42), stdout);
>
> protobuf-d does that already, see the unittest for toProtobuf: 
> https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/encoding.d#L193
>
>>
>> - Simple things should be simple
>> It should be dead simple to do basic stuff:
>>   auto obj = deserialize!SomeType(stdin);
>>   serialize(obj, stdout);
>
> Again, protobuf-d has that: 
> https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/prot

I assumed it wasn't the case since examples folder didn't have 
such code, thanks for pointing out.

>> - Complex things should be possible
>> The library should be flexible and extensible without 
>> modification
>
> toProtobuf, fromProtobuf, toJSONValue, fromJSONValue methods 
> are protobuf customization points in protobuf-d. For an example 
> see 
> https://github.com/dcarp/protobuf-d/blob/3f8a1a5129c98920e1652e965004ac77e9bb8ef1/src/google/protobuf/wrappers.d#L27-L54
>
>>
>> - Support for library and tool based usage
>> It should be usable as a library without any additional setup 
>> but also usable
>> as a schema compiler.
>
> protobuf-d is usable as library, see 
> https://github.com/huntlabs/grpc-dlang/blob/57c8fe9808f8e860c4b0668a83cdabd78b296ce5/dub.json#L9
> Regarding the usage as schema compiler, review the first 
> comment.

These are basically a checklist that I want to fill whether it 
already exists. Say, if I were to write flatbuffers-d I would 
want to implement them.

>> - Support for common Phobos types
>> Nullable, tuples, std.datetime, std.complex, std.bigint, 
>> containers...
>
> Protobuf is a language agnostic serialization format. Having 
> .protobuf definitions for common Phobos types will just shift 
> the problem somewhere else (i.e. other programming languages).
>
> Nevertheless Protobuf addresses probably the same problem by 
> defining the "well-known" types 
> (https://developers.google.com/protocol-buffers/docs/reference/google.protobuf).
> protobuf-d also supports those, so that std.datetime.Systime is 
> mapped to google.protobuf.Timestamp and std.datetime.Duration 
> to google.protobuf.Duration

Makes sense, I'm in the opinion that API should support common 
types if there is direct correspondence or well established 
conventions for said type.

>> I'm personally not happy with any of the existing libraries 
>> but they will
>> likely be a valuable resource regardless.
>
> The existing protobuf libraries are quite mature and probably 
> improving those will be time better spent than starting once 
> again from scratch.

I feel like there is some lack of documentation since none of 
those things you mentioned are obvious looking at the repo. 
Nevertheless, I'm happy to hear that protobuf-d is mature & 
feature complete.

>> Questions:
>> - How much work would be ideal for GSoC? Should I be working 
>> on flatbuffers
>>   only or protobuf too? (Seems like flatbuffers need more love)
>
> I'm quite satisfied with protobuf-d implementation: it is small 
> (aprox. 4k LOC), clean and quite feature complete - 26 failing 
> conformance test vs. 27 resp. 41 for the official C++ and Java 
> counterparts. Of course there is still enough space for 
> improvement, but at least in case of protobuf-d not enough for 
> a GSoC application.
>
> On the other hand Flatbuffers is a very good candidate: it has 
> its own specialties, but is also somehow similar to protobuf. 
> This would reduce the planning risks considerably.

Agreed, I'm going to focus on flatbuffers in my proposal then.

>> - Should I tackle the std.serialization [3] idea?
>
> I see std.serialization as a high level API. Probably this will 
> be a long term std.experimental.serialization, that will 
> require quite some time till multiple serialization formats 
> implements it. Just after that, if it will ever happen, we can 
> remove the "experimental" part. I don't see this as a suited 
> GSoC project.

I see, thanks for the feedback.

>> - Any other serialization related suggestions?
> https://arrow.apache.org/

Thanks, I'll take a look.


More information about the Digitalmars-d mailing list