[Issue 17553] std.json invalid utf8 sequence
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Mon Jun 26 03:14:26 PDT 2017
https://issues.dlang.org/show_bug.cgi?id=17553
Vladimir Panteleev <dlang-bugzilla at thecybershadow.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |dlang-bugzilla at thecybershad
| |ow.net
Resolution|--- |INVALID
--- Comment #1 from Vladimir Panteleev <dlang-bugzilla at thecybershadow.net> ---
As far as Phobos (and some parts of the language itself) are concerned, D
strings are expected to be UTF-encoded, i.e. contain a valid stream of UTF
characters. Your program elides that assumption by using a cast - the normal
way to read text data into a string is the readText function, which does UTF
validation. When using readText, reading a file which does not contain valid
UTF will result in an exception being thrown.
As for JSON encoding - although most of JSON transformations concern themselves
with just the ASCII part, the JSON standard does forbid encoding Unicode
control characters, which may appear in a valid D string but must not appear in
a JSON-encoded one. This includes the high control characters (code points 0x80
to 0x9F); so, the encoding code must check for these code points when
constructing the JSON string. Although they could in theory be special cased,
the most straight-forward way to do it is to look at the input string as a
range of Unicode code points (dchars), i.e. rely on auto-decoding, which is
what the current implementation does.
In any case, JSON strings are certainly not meant to store binary data - even
if the example "worked" (for a certain definition of "work"), the resulting
JSON object will not be in any particular encoding. Even though the JSON syntax
is restricted to ASCII characters, JSON itself is not - it is Unicode aware,
and contains instructions on how to properly encode and decode Unicode
characters, so it can't be used for storing arbitrary binary data.
If you have a specific use case in mind which is in line with the JSON spec and
how D deals with Unicode and strings, please reopen; otherwise, there is no
actionable defect presented in this issue.
--
More information about the Digitalmars-d-bugs
mailing list