[Issue 19428] New: std.string.indexOf wrong result with bad unicode

d-bugmail at puremagic.com d-bugmail at puremagic.com
Fri Nov 23 22:39:35 UTC 2018


https://issues.dlang.org/show_bug.cgi?id=19428

          Issue ID: 19428
           Summary: std.string.indexOf wrong result with bad unicode
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: phobos
          Assignee: nobody at puremagic.com
          Reporter: dlang-bugzilla at thecybershadow.net

//////////////////// test.d ///////////////////
import std.algorithm.comparison;
import std.range;
import std.string;

void main()
{
    assert(indexOf(
            only('\uFFFD', '\uFFFD', '\uFFFD'),
            "\x83\x84\x85",
            CaseSensitive.yes) == -1);
}
///////////////////////////////////////////////

Looks like it's replacing bad Unicode with replacement characters under the
hood.

This becomes worse when something causes the same thing to happen to the
haystack, as in this unit test:

https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903

Note that this unittest is incorrectly annotated as nothrow/@nogc. We can't use
the kind of decoding that substitutes errors with replacement characters, as
that will introduce bugs like these.

--


More information about the Digitalmars-d-bugs mailing list