[Issue 10472] lastIndexOf(string, string) does not find single character string at beginning of string
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Tue Jun 25 05:00:59 PDT 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10472
--- Comment #1 from monarchdodra at gmail.com 2013-06-25 05:00:57 PDT ---
The problem is this condition:
----
if (cast(dchar)(cast(Char)c) == c)
----
This is basically saying "if the code_point_ representation fits in a singe
code_unit_, then we look at the code_units_". This is wrong, since for UTF8
characters with code_point_s in the 0x80 0xFF range will "fit" in a single
code_unit_, but actually have a dual code_unit_ representation. In particular:
ö is represented by \00F6, which fits in a single code unit, yet, when encoded
into UTF8 take up two: "0xC3 0xB6"
The correct question is:
if (codeLength!Char(c) == 1)
Or, if you want to tweak a little, since you don't need the *actual*
codeLength:
----
static if (Char.sizeof == 1) immutable fits = c <= 0x7F;
else static if (Char.sizeof == 2) immutable fits = c <= 0xFFFF;
else immutable fits = true;
if (fits)
{
...
----
------------------
BTW, implementation wise, I do believe a simple foreach_reverse is more
efficient, because it pops *as* it decodes. The for loop needs to stride
backwards (again) after a call to "back" ("back" strides backwards already)).
foreach_reverse (i, dchar c2 ; s)
{
if ( c2 == c)
return i;
}
and
immutable c1 = std.uni.toLower(c);
foreach_reverse (i, dchar c2 ; s)
{
if ( std.uni.toLower(c2) == c1)
return i;
}
In any case, it is much simpler.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list