std.data.json formal review

Sat Aug 22 05:21:36 PDT 2015

Am 17.08.2015 um 00:03 schrieb Walter Bright:
> On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
>> Am 16.08.2015 um 02:50 schrieb Walter Bright:
>>>      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))
>>>
>>> I'm not a fan of more names for trivia, the deluge of names has its own
>>> costs.
>>
>> Good, I'll use `if (isInputRange!R &&
>> (isSomeChar!(ElementEncodingType!R) ||
>> isIntegral!(ElementEncodingType!R))`. It's just used in number of
>> places and
>> quite a bit more verbose (twice as long) and I guess a large number of
>> algorithms in Phobos accept char ranges, so that may actually warrant
>> a name in
>> this case.
>
> Except that there is no reason to support wchar, dchar, int, ubyte, or
> anything other than char. The idea is not to support something just
> because you can, but there should be an identifiable, real use case for
> it first. Has anyone ever seen Json data as ulongs? I haven't either.

But you have seen ubyte[] when reading something from a file or from a 
network stream. But since Andrei now also wants to remove it, so be it. 
I'll answer some of the other points anyway:

>>> The json parser will work fine without doing any validation at all. I've
>>> been implementing string handling code in Phobos with the idea of doing
>>> validation only if the algorithm requires it, and only for those parts
>>> that require it.
>>
>> Yes, and it won't do that if a char range is passed in. If the
>> integral range
>> path gets removed there are basically two possibilities left, perform the
>> validation up-front (slower), or risk UTF exceptions in unrelated
>> parts of the
>> code base. I don't see why we shouldn't take the opportunity for a
>> full and fast
>> validation here. But I'll relay this to Andrei, it was his idea
>> originally.
>
> That argument could be used to justify validation in every single
> algorithm that deals with strings.

Not really for all, but indeed there are more where this could apply in 
theory. However, JSON is used frequently in situations where parsing 
speed, or performance in general, is often crucial (e.g. web services), 
which makes it stand out due to practical concerns. Others, such as an 
XML parser would apply, too, but probably none of the generic string 
manipulation functions.

>>>>> Why do both? Always return an input range. If the user wants a string,
>>>>> he can pipe the input range to a string generator, such as .array
>>>> Convenience for one.
>>>
>>> Back to the previous point, that means that every algorithm in Phobos
>>> should have two versions, one that returns a range and the other a
>>> string? All these variations will result in a combinatorical explosion.
>>
>> This may be a factor of two, but not a combinatorial explosion.
>
> We're already up to validate or not, to string or not, i.e. 4 combinations.

Validation is part of the lexer and not the generator. There is no 
combinatorial relation between the two. Validation is also just a 
template parameter, so there are no two combinations in terms of 
implementation either. There is just a "static if" statement somewhere 
to decide if validate() should be called or not.

>>> The other problem, of course, is that returning a string means the
>>> algorithm has to decide how to allocate that string. As much as
>>> possible, algorithms should not be making allocation decisions.
>>
>> Granted, the fact that format() and to!() support input ranges (I
>> didn't notice
>> that until now) makes the issue less important. But without those, it
>> would
>> basically mean that almost all places that generate JSON strings would
>> have to
>> import std.array and append .array. Nothing particularly bad if viewed in
>> isolation, but makes the language appear a lot less clean/more verbose
>> if it
>> occurs often. It's also a stepping stone for language newcomers.
>
> This has been argued before, and the problem is it applies to EVERY
> algorithm in Phobos, and winds up with a doubling of the number of
> functions to deal with it. I do not view this as clean.
>
> D is going to be built around ranges as a fundamental way of coding.
> Users will need to learn something about them. Appending .array is not a
> big hill to climb.

It isn't if you get taught about it. But it surely is if you don't know 
about it yet and try to get something working based only on the JSON API 
(language newcomer that wants to work with JSON). It's also still an 
additional thing to remember, type and read, making it an additional 
piece of cognitive load, even for developers that are fluent with this. 
Have many of such pieces and they add up to a point where productivity 
goes to its knees.

I already personally find it quite annoying constantly having to import 
std.range, std.array and std.algorithm to just use some small piece of 
functionality in std.algorithm. It's also often not clear in which of 
the three modules/packages a certain function is. We need to find a 
better balance here if D is to keep its appeal as a language where you 
stay in "the zone"  (a.k.a flow), which always has been a big thing for me.