RFC: std.json sucessor
Sönke Ludwig via Digitalmars-d
digitalmars-d at puremagic.com
Mon Aug 25 14:27:43 PDT 2014
Am 25.08.2014 22:51, schrieb "Ola Fosheim Grøstad"
<ola.fosheim.grostad+dlang at gmail.com>":
> On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:
>> BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159,
>> which is another argument for just letting the lexer assume valid UTF.
>
> The lexer cannot assume valid UTF since the client might be a rogue, but
> it can just bail out if the lookahead isn't jSON? So UTF-validation is
> limited to strings.
But why should UTF validation be the job of the lexer in the first
place? D's "string" type is also defined to be UTF-8, so given that, it
would of course be free to assume valid UTF-8. I agree with Walter there
that validation/conversion should be added as a separate proxy range.
But if we end up going for validating in the lexer, it would indeed be
enough to validate inside strings, because the rest of the grammar
assumes a subset of ASCII.
>
> You have to parse the strings because of the \uXXXX escapes of course,
> so some basic validation is unavoidable?
At least no UTF validation is needed. Since all non-ASCII characters
will always be composed of bytes >0x7F, a sequence \uXXXX can be assumed
to be valid wherever in the string it occurs, and all other bytes that
don't belong to an escape sequence are just passed through as-is.
> But I guess full validation of
> string content could be another useful option along with "ignore
> escapes" for the case where you want to avoid decode-encode scenarios.
> (like for a proxy, or if you store pre-escaped unicode in a database)
More information about the Digitalmars-d
mailing list