RFC: std.json sucessor

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Sun Oct 12 11:17:33 PDT 2014


Here's my destruction of std.data.json.

* lexer.d:

** Beautifully done. From what I understand, if the input is string or 
immutable(ubyte)[] then the strings are carved out as slices of the 
input, as opposed to newly allocated. Awesome.

** The string after lexing is correctly scanned and stored in raw format 
(escapes are not rewritten) and decoded on demand. Problem with decoding 
is that it may allocate memory, and it would be great (and not 
difficult) to make the lexer 100% lazy/non-allocating. To achieve that, 
lexer.d should define TWO "Kind"s of strings at the lexer level: regular 
string and undecoded string. The former is lexer.d's way of saying "I 
got lucky" in the sense that it didn't detect any '\\' so the raw and 
decoded strings are identical. No need for anyone to do any further 
processing in the majority of cases => win. The latter means the lexer 
lexed the string, saw at least one '\\', and leaves it to the caller to 
do the actual decoding.

** After moving the decoding business out of lexer.d, a way to take this 
further would be to qualify lexer methods as @nogc if the input is 
string/immutable(ubyte)[]. I wonder how to implement a conditional 
attribute. We'll probably need a language enhancement for that.

** The implementation uses manually-defined tagged unions for work. 
Could we use Algebraic instead - dogfooding and all that? I recall there 
was a comment in Sönke's original work that Algebraic has a specific 
issue (was it false pointers?) - so the question arises, should we fix 
Algebraic and use it thus helping other uses as well?

** I see the "boolean" kind, should we instead have the "true_" and 
"false_" kinds?

** Long story short I couldn't find any major issue with this module, 
and I looked! I do think the decoding logic should be moved outside of 
lexer.d or at least the JSONLexerRange.

* generator.d: looking good, no special comments. Like the consistent 
use of structs filled with options as template parameters.

* foundation.d:

** At four words per token, Location seems pretty bulky. How about 
reducing line and column to uint?

** Could JSONException create the message string in toString (i.e. 
when/if used) as opposed to in the constructor?

* parser.d:

** How about using .init instead of .defaults for options?

** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End 
markers shouldn't appear as nodes. There should be an "object" node 
only. I guess that's needed for laziness.

** It's unclear where memory is being allocated in the parser. @nogc 
annotations wherever appropriate would be great.

* value.d:

** Looks like this is/may be the only place where memory is being 
managed, at least if the input is string/immutable(ubyte)[]. Right?

** Algebraic ftw.

============================

Overall: This is very close to everything I hoped! A bit more care to 
@nogc would be awesome, especially with the upcoming focus on memory 
management going forward.

After one more pass it would be great to move forward for review.


Andrei


More information about the Digitalmars-d mailing list