RFC: std.json sucessor

Tue Aug 26 01:24:54 PDT 2014

On Tuesday, 26 August 2014 at 07:51:04 UTC, Sönke Ludwig wrote:
> That's true. So the ideal solution would be to *assume* UTF-8 
> when the input is char based and to *validate* if the input is 
> "numeric".

I think you should validate JSON-strings to be UTF-8 encoded even 
if you allow illegal unicode values. Basically ensuring that 
 >0x7f has the right number of bytes after it, so you don't get 
 >0x7f as the last byte in a string etc.

> Well, that's something that's definitely out of the scope of 
> this proposal. Definitely an interesting direction to pursue, 
> though.

Maybe the interface/code structure is or could be designed so 
that the implementation could later be version()'ed to SIMD where 
possible.

>> You cannot assume \u… to be valid if you convert it.
>
> I meant "X" to stand for a hex digit. The point was just that 
> you don't have to worry about interacting in a bad way with UTF 
> sequences when you find "\uXXXX".

When you convert "\uXXXX" to UTF-8 bytes, is it then validated as 
a legal code point? I guess it is not necessary.

Btw, I believe rapidJSON achieves high speed by converting 
strings in situ, so that if the prefix is escape free it just 
converts in place when it hits the first escape. Thus avoiding 
some moving.