std.data.json formal review

Sat Aug 15 23:52:52 PDT 2015

On 16-Aug-2015 03:50, Walter Bright wrote:
> On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
>>> There is no reason to validate UTF-8 input. The only place where
>>> non-ASCII code units can even legally appear is inside strings, and
>>> there they can just be copied verbatim while looking for the end of the
>>> string.
>> The idea is to assume that any char based input is already valid UTF
>> (as D
>> defines it), while integer based input comes from an unverified
>> source, so that
>> it still has to be validated before being cast/copied into a 'string'.
>> I think
>> this is a sensible approach, both semantically and performance-wise.
>
> The json parser will work fine without doing any validation at all. I've
> been implementing string handling code in Phobos with the idea of doing
> validation only if the algorithm requires it, and only for those parts
> that require it.
>

Aye.

> There are many validation algorithms in Phobos one can tack on - having
> two implementations of every algorithm, one with an embedded reinvented
> validation and one without - is too much.

Actually there are next to none. `validate` that throws on failed 
validation is a misnomer.

> The general idea with algorithms is that they do not combine things, but
> they enable composition.
>

At the lower level such as tokenizers combining a couple of simple steps 
together makes sense because it makes things run faster. It usually 
eliminates the need for temporary result that must be digestible by the 
next range.

For instance "combining" decoding and character classification one may 
side-step generating the codepoint value itself (because now it doesn't 
have to produce it for the top-level algorithm).

-- 
Dmitry Olshansky