std.data.json formal review

Sat Aug 15 17:50:45 PDT 2015

On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
>> I don't know what 'isStringInputRange' is. Whatever it is, it should be
>> a 'range of char'.
>
> I'll rename it to isCharInputRange. We don't have something like that in Phobos,
> right?

That's right, there isn't one. But I use:

     if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

I'm not a fan of more names for trivia, the deluge of names has its own costs.

>> There is no reason to validate UTF-8 input. The only place where
>> non-ASCII code units can even legally appear is inside strings, and
>> there they can just be copied verbatim while looking for the end of the
>> string.
> The idea is to assume that any char based input is already valid UTF (as D
> defines it), while integer based input comes from an unverified source, so that
> it still has to be validated before being cast/copied into a 'string'. I think
> this is a sensible approach, both semantically and performance-wise.

The json parser will work fine without doing any validation at all. I've been 
implementing string handling code in Phobos with the idea of doing validation 
only if the algorithm requires it, and only for those parts that require it.

There are many validation algorithms in Phobos one can tack on - having two 
implementations of every algorithm, one with an embedded reinvented validation 
and one without - is too much.

The general idea with algorithms is that they do not combine things, but they 
enable composition.

>> Why do both? Always return an input range. If the user wants a string,
>> he can pipe the input range to a string generator, such as .array
> Convenience for one.

Back to the previous point, that means that every algorithm in Phobos should 
have two versions, one that returns a range and the other a string? All these 
variations will result in a combinatorical explosion.

The other problem, of course, is that returning a string means the algorithm has 
to decide how to allocate that string. As much as possible, algorithms should 
not be making allocation decisions.

> The lack of number to input range conversion functions is
> another concern. I'm not really keen to implement an input range style
> floating-point to string conversion routine just for this module.

Not sure what you mean. Phobos needs such routines anyway, and you still have to 
do something about floating point.

> Finally, I'm a little worried about performance. The output range based approach
> can keep a lot of state implicitly using the program counter register. But an
> input range would explicitly have to keep track of the current JSON element, as
> well as the current character/state within that element (and possibly one level
> deeper, for example for escape sequences). This means that it will require
> either multiple branches or indirection for each popFront().

Often this is made up for by not needing to allocate storage. Also, that state 
is in the cached "hot zone" on top of the stack, which is much faster to access 
than a cold uninitialized array.

I share your concern with performance, and I had very good results with Warp by 
keeping all the state on the stack in this manner.