std.data.json formal review

Wed Jul 29 01:18:03 PDT 2015

Am 29.07.2015 um 00:29 schrieb Walter Bright:
> On 7/28/2015 7:07 AM, Atila Neves wrote:
>> Start of the two week process, folks.
>
> Thank you very much, Sönke, for taking this on. Thank you, Atila, for
> taking on the thankless job of being review manager.
>
> Just looking at the documentation only, some general notes:
>
> 1. Not sure that 'JSON' needs to be embedded in the public names.
> 'parseJSONStream' should just be 'parseStream', etc. Name
> disambiguation, if needed, should be ably taken care of by a number of D
> features for that purpose. Additionally, I presume that the stdx.data
> package implies a number of different formats. These formats should all
> use the same names with as similar as possible APIs - this won't work
> too well if JSON is embedded in the APIs.

This is actually one of my pet peeves. Having a *readable* API that 
tells the reader immediately what happens is IMO one of the most 
important aspects (far more important than an API that allows quick 
typing). A number of times I've seen D code that omits part of what it 
actually does in its name and the result was that it was constantly 
necessary to scroll up to see where a particular name might come from. 
So I have a strong preference to keep "JSON", because it's an integral 
part of the semantics.

>
> 2. JSON is a trivial format, http://json.org/. But I count 6 files and
> 30 names in the public API.

The whole thing provides a stream parser with high level helpers to make 
it convenient to use, a DOM module, a separate lexer and a generator 
module that operates in various different modes (maybe two additional 
modes still to come!). Every single function provides real and 
frequently useful benefits. So if anything, there are still some little 
things missing.

All in all, even if JSON may be a simple format, the source code is 
already almost 5k LOC (includes unit tests of course). But apart from 
maintainability they have mainly been separated to minimize the amount 
of code that needs to be dragged in for a particular functionality (not 
only other JSON modules, but also from different parts of Phobos).

>
> 3. Stepping back a bit, when I think of parsing JSON data, I think:
>
>      auto ast = inputrange.toJSON();
>
> where toJSON() accepts an input range and produces a container, the ast.
> The ast is just a JSON value. Then, I can just query the ast to see what
> kind of value it is (using overloading), and walk it as necessary.

We can drop the "Value" part of the name of course, if we expect that 
function to be used a lot, but there is still the parseJSONStream 
function which is arguably not less important. BTW, you just mentioned 
the DOM part so far, but for any code that where performance is a 
priority, the stream based pull parser is basically the way to go. This 
would also be the natural entry point for any serialization library.

And my prediction is, if we do it right, that working with JSON will in 
most cases simply mean "S s = deserializeJSON(json_input);", where S is 
a D struct that gets populated with the deserialized JSON data. Where 
that doesn't fit, performance oriented code would use the pull parser. 
So the DOM part of the system, which is the only thing the current JSON 
module has, will only be left as a niche functionality.

> To create output:
>
>      auto r = ast.toChars();  // r is an InputRange of characters
>      writeln(r);

Do we have an InputRange version of the various number-to-string 
conversions? It would be quite inconvenient to reinvent those (double, 
long, BigInt) in the JSON package. Of course, using to!string internally 
would be an option, but it would obviously destroy all @nogc 
opportunities and performance benefits.

>
> So, we'll need:
>      toJSON
>      toChars
>      JSONException
>
> The possible JSON values are:
>      string
>      number
>      object (associative arrays)
>      array
>      true
>      false
>      null
>
> Since these are D builtin types, they can actually be a simple union of
> D builtin types.

The idea is to have JSONValue be a simple alias to Algebraic!(...), just 
that there are currently still some workarounds for DMD < 2.067.0 on 
top, which means that JSONValue is a struct that "alias this" inherits 
from Algebraic for the time being. Those workarounds will be removed 
when the code is actually put into Phobos.

But a simple union would obviously not be enough, it still needs a type 
tag of some form and needs to provide a @safe interface on top of it. 
Algebraic is the only thing that comes close right now, but I'd really 
prefer to have a fully statically typed version of Algebraic that uses 
an enum as the type tag instead of working with delegates/typeinfo.

>
> There is a decision needed about whether toJSON() allocates data or
> returns slices into its inputrange. This can be 'static if' tested by:
> if inputrange can return immutable slices.

The test is currently "is(T == string) || is (T == immutable(ubyte)[])", 
but slicing is done in those cases and the non-DOM parser interface is 
even @nogc as long as exceptions are disabled.

> toChars() can take a compile
> time argument to determine if it is 'pretty' or not.

As long as JSON DOM values are stored in a generic Algebraic (which is a 
huge win in terms of interoperability!), toChars won't suffice as a 
name. It would have to be toJSON(Chars) (as it basically is now). I've 
gave the "pretty" version a separate name simply because it's more 
convenient to use and pretty printing will probably be by far the most 
frequently used option when converting to a string.