RFC: std.json sucessor

Sönke Ludwig via Digitalmars-d digitalmars-d at puremagic.com
Mon Aug 25 14:27:43 PDT 2014


Am 25.08.2014 22:51, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang at gmail.com>":
> On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:
>> BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159,
>> which is another argument for just letting the lexer assume valid UTF.
>
> The lexer cannot assume valid UTF since the client might be a rogue, but
> it can just bail out if the lookahead isn't jSON? So UTF-validation is
> limited to strings.

But why should UTF validation be the job of the lexer in the first 
place? D's "string" type is also defined to be UTF-8, so given that, it 
would of course be free to assume valid UTF-8. I agree with Walter there 
that validation/conversion should be added as a separate proxy range. 
But if we end up going for validating in the lexer, it would indeed be 
enough to validate inside strings, because the rest of the grammar 
assumes a subset of ASCII.

>
> You have to parse the strings because of the \uXXXX escapes of course,
> so some basic validation is unavoidable?

At least no UTF validation is needed. Since all non-ASCII characters 
will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed 
to be valid wherever in the string it occurs, and all other bytes that 
don't belong to an escape sequence are just passed through as-is.

> But I guess full validation of
> string content could be another useful option along with "ignore
> escapes" for the case where you want to avoid decode-encode scenarios.
> (like for a proxy, or if you store pre-escaped unicode in a database)



More information about the Digitalmars-d mailing list