RFC: std.json sucessor
via Digitalmars-d
digitalmars-d at puremagic.com
Tue Aug 26 01:24:54 PDT 2014
On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:
> That's true. So the ideal solution would be to *assume* UTF-8
> when the input is char based and to *validate* if the input is
> "numeric".
I think you should validate JSON-strings to be UTF-8 encoded even
if you allow illegal unicode values. Basically ensuring that
>0x7f has the right number of bytes after it, so you don't get
>0x7f as the last byte in a string etc.
> Well, that's something that's definitely out of the scope of
> this proposal. Definitely an interesting direction to pursue,
> though.
Maybe the interface/code structure is or could be designed so
that the implementation could later be version()'ed to SIMD where
possible.
>> You cannot assume \u… to be valid if you convert it.
>
> I meant "X" to stand for a hex digit. The point was just that
> you don't have to worry about interacting in a bad way with UTF
> sequences when you find "\uXXXX".
When you convert "\uXXXX" to UTF-8 bytes, is it then validated as
a legal code point? I guess it is not necessary.
Btw, I believe rapidJSON achieves high speed by converting
strings in situ, so that if the prefix is escape free it just
converts in place when it hits the first escape. Thus avoiding
some moving.
More information about the Digitalmars-d
mailing list