[Issue 18844] New: std.utf.decode skips valid character on invalid multibyte sequence
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Wed May 9 10:53:17 UTC 2018
https://issues.dlang.org/show_bug.cgi?id=18844
Issue ID: 18844
Summary: std.utf.decode skips valid character on invalid
multibyte sequence
Product: D
Version: D2
Hardware: x86_64
OS: Linux
Status: NEW
Severity: enhancement
Priority: P1
Component: phobos
Assignee: nobody at puremagic.com
Reporter: default_357-line at yahoo.de
When decoding an invalid UTF-8 string, like cast(string) [cast(ubyte) 'ä',
't'], with Yes.useReplacementDchar, std.utf.decode will advance the cursor past
the letter where the multibyte sequence hit an error, even if that letter is in
itself a valid start of a new byte sequence. As a result, decode will advance
the index to 2, leading the string to decode as "�" when it should decode as
"�t".
--
More information about the Digitalmars-d-bugs
mailing list