[Issue 19428] New: std.string.indexOf wrong result with bad unicode
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Fri Nov 23 22:39:35 UTC 2018
https://issues.dlang.org/show_bug.cgi?id=19428
Issue ID: 19428
Summary: std.string.indexOf wrong result with bad unicode
Product: D
Version: D2
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P3
Component: phobos
Assignee: nobody at puremagic.com
Reporter: dlang-bugzilla at thecybershadow.net
//////////////////// test.d ///////////////////
import std.algorithm.comparison;
import std.range;
import std.string;
void main()
{
assert(indexOf(
only('\uFFFD', '\uFFFD', '\uFFFD'),
"\x83\x84\x85",
CaseSensitive.yes) == -1);
}
///////////////////////////////////////////////
Looks like it's replacing bad Unicode with replacement characters under the
hood.
This becomes worse when something causes the same thing to happen to the
haystack, as in this unit test:
https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903
Note that this unittest is incorrectly annotated as nothrow/@nogc. We can't use
the kind of decoding that substitutes errors with replacement characters, as
that will introduce bugs like these.
--
More information about the Digitalmars-d-bugs
mailing list