[Issue 17553] std.json invalid utf8 sequence

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Mon Jun 26 03:14:26 PDT 2017


https://issues.dlang.org/show_bug.cgi?id=17553

Vladimir Panteleev <dlang-bugzilla at thecybershadow.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |dlang-bugzilla at thecybershad
                   |                            |ow.net
         Resolution|---                         |INVALID

--- Comment #1 from Vladimir Panteleev <dlang-bugzilla at thecybershadow.net> ---
As far as Phobos (and some parts of the language itself) are concerned, D
strings are expected to be UTF-encoded, i.e. contain a valid stream of UTF
characters. Your program elides that assumption by using a cast - the normal
way to read text data into a string is the readText function, which does UTF
validation. When using readText, reading a file which does not contain valid
UTF will result in an exception being thrown.

As for JSON encoding - although most of JSON transformations concern themselves
with just the ASCII part, the JSON standard does forbid encoding Unicode
control characters, which may appear in a valid D string but must not appear in
a JSON-encoded one. This includes the high control characters (code points 0x80
to 0x9F); so, the encoding code must check for these code points when
constructing the JSON string. Although they could in theory be special cased,
the most straight-forward way to do it is to look at the input string as a
range of Unicode code points (dchars), i.e. rely on auto-decoding, which is
what the current implementation does.

In any case, JSON strings are certainly not meant to store binary data - even
if the example "worked" (for a certain definition of "work"), the resulting
JSON object will not be in any particular encoding. Even though the JSON syntax
is restricted to ASCII characters, JSON itself is not - it is Unicode aware,
and contains instructions on how to properly encode and decode Unicode
characters, so it can't be used for storing arbitrary binary data.

If you have a specific use case in mind which is in line with the JSON spec and
how D deals with Unicode and strings, please reopen; otherwise, there is no
actionable defect presented in this issue.

--


More information about the Digitalmars-d-bugs mailing list