std.data.json formal review

Sat Aug 15 03:18:24 PDT 2015

Am 14.08.2015 um 10:17 schrieb Walter Bright:
> On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
>> Am 14.08.2015 um 02:26 schrieb Walter Bright:
>>> On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
>>>> These were, AFAICS, the only major open issues (a decision for an
>>>> opt() variant
>>>> would be nice, but fortunately that's not a fundamental decision in
>>>> any way).
>>>
>>> 1. What about the issue of having the API be a composable range
>>> interface?
>>>
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html
>>>
>>>
>>> I.e. the input range should be the FIRST argument, not the last.
>>
>> Hm, it *is* the first function argument, just the last template argument.
>
> Ok, my mistake. I didn't look at the others.
>
> I don't know what 'isStringInputRange' is. Whatever it is, it should be
> a 'range of char'.

I'll rename it to isCharInputRange. We don't have something like that in 
Phobos, right?

>>> 2. Why are integers acceptable as lexer input? The spec specifies
>>> Unicode.
>> In this case, the lexer will perform on-the-fly UTF validation of the
>> input. It
>> can do so more efficiently than first validating the input using a
>> wrapper
>> range, because it has to check the value of most incoming code units
>> anyway.
>
> There is no reason to validate UTF-8 input. The only place where
> non-ASCII code units can even legally appear is inside strings, and
> there they can just be copied verbatim while looking for the end of the
> string.

The idea is to assume that any char based input is already valid UTF (as 
D defines it), while integer based input comes from an unverified 
source, so that it still has to be validated before being cast/copied 
into a 'string'. I think this is a sensible approach, both semantically 
and performance-wise.

>
>
>>> 3. Why are there 4 functions that do the same thing?
>>>
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html
>>>
>>> After all, there already is a
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
>>>
>> There are two classes of functions that are not covered by
>> GeneratorOptions:
>> writing to a stream or returning a string.
>
> Why do both? Always return an input range. If the user wants a string,
> he can pipe the input range to a string generator, such as .array

Convenience for one. The lack of number to input range conversion 
functions is another concern. I'm not really keen to implement an input 
range style floating-point to string conversion routine just for this 
module.

Finally, I'm a little worried about performance. The output range based 
approach can keep a lot of state implicitly using the program counter 
register. But an input range would explicitly have to keep track of the 
current JSON element, as well as the current character/state within that 
element (and possibly one level deeper, for example for escape 
sequences). This means that it will require either multiple branches or 
indirection for each popFront().