std.data.json formal review

Tue Aug 11 15:21:13 PDT 2015

On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:
> See 
> http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html
>
> The question whether each field is "really" needed obviously 
> depends on the application. However, the biggest type is BigInt 
> that, form a quick look, contains a dynamic array + a bool 
> field, so it's not as compact as it could be, but also not 
> really large. There is also an additional Location field that 
> may sometimes be important for good error messages and the like 
> and sometimes may be totally unneeded.
>

Urg. Looks like BigInt should steal a bit somewhere instead of 
having a bool like this. That is not really your lib's fault, but 
that's quite an heavy cost.

Consider this, if the struct fit into 2 registers, it will be 
passed around as such rather than in memory. That is a 
significant difference. For BigInt itself, and, by proxy, for the 
JSON library.

Putting the BigInt thing aside, it seems like the biggest field 
in there is an array of JSONValues or a string. For the string, 
you can artificially limit the length by 3 bits to stick a tag. 
That still give absurdly large strings. For the JSONValue case, 
the alignment on the pointer is such as you can steal 3 bits from 
there. Or as for string, the length can be used.

It seems very realizable to me to have the JSONValue struct fit 
into 2 registers, granted the tag fit in 3 bits (8 different 
types).

I can help with that if you want to.

> However, my goal when implementing this has never been to make 
> the DOM representation as efficient as possible. The simple 
> reason is that a DOM representation is inherently inefficient 
> when compared to operating on the structure using either the 
> pull parser or using a deserializer that directly converts into 
> a static D type. IMO these should be advertised instead of 
> trying to milk a dead cow (in terms of performance).
>

Indeed. Still, JSON nodes should be as lightweight as possible.

>> 2/ As far as I can see, the element are discriminated using 
>> typeid. An
>> enum is preferable as the compiler would know values ahead of 
>> time and
>> optimize based on this. It also allow use of things like final 
>> switch.
>
> Using a tagged union like structure is definitely what I'd like 
> to have, too. However, the main goal was to build the DOM type 
> upon a generic algebraic type instead of using a home-brew 
> tagged union. The reason is that it automatically makes 
> different DOM types with a similar structure interoperable 
> (JSON/BSON/TOML/...).
>

That is a great point that I haven't considered. I'd go the other 
way around about it: providing a compatible typeid based struct 
from the enum tagged one for compatibility. It can even be alias 
this so the transition is transparent.

The transformation is not bijective, so that'd be great to get 
the most restrictive form (the enum) and fallback on the least 
restrictive one (alias this) when wanted.

> Now Phobos unfortunately only has Algebraic, which not only 
> doesn't have a type enum, but is currently also really bad at 
> keeping static type information when forwarding function calls 
> or operators. The only options were basically to resort to 
> Algebraic for now, but have something that works, or to first 
> implement an alternative algebraic type and get it accepted 
> into Phobos, which would delay the whole process nearly 
> indefinitely.
>

That's fine. Done is better than perfect. Still API changes tend 
to be problematic, so we need to nail that part at least, and an 
enum with fallback on typeid based solution seems like the best 
option.

> Or do you perhaps mean the JSON -> deserialize -> manipulate -> 
> serialize -> JSON approach? That definitely is not a "loser 
> strategy"*, but yes, it is limited to applications where you 
> have a partially fixed schema. However, arguably most 
> applications fall into that category.
>

Yes.