[Issue 5904] New: std.json parseString doesn't handle chars outside the BMP
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Thu Apr 28 12:28:34 PDT 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5904
Summary: std.json parseString doesn't handle chars outside the
BMP
Product: D
Version: D2
Platform: Other
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Phobos
AssignedTo: nobody at puremagic.com
ReportedBy: sean at invisibleduck.org
--- Comment #0 from Sean Kelly <sean at invisibleduck.org> 2011-04-28 12:24:48 PDT ---
According to RFC 4627, characters outside the Basic Multilingual Plane (ie.
those that require more than two bytes to represent) are encoded as a surrogate
pair in JSON strings. In effect, what you have to do is test whether a
"\uXXXX" value is >= 0xD800 and <= 0xDBFF. If so, then the next value should
be another "\uXXXX" character representing the low surrogate. To verify this,
the value should be >= 0xDC00 and <= 0xDFFF. If it isn't, then skip the
preceding "\uXXXX" value (the high surrogate) as invalid and decode the
following "\uXXXX" value as a standalone Unicode code-point (the RFC is
actually unclear on this point, but this seems the most reasonable failure
mode). Assuming that you have a valid high and low surrogate, stick them into
a wchar[2] and convert to UTF8.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list