[Issue 12455] [uni][reg] Bad lowercase mapping for 'LATIN CAPITAL LETTER I WITH DOT ABOVE'

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Sat Apr 19 06:52:32 PDT 2014


https://issues.dlang.org/show_bug.cgi?id=12455

--- Comment #2 from monarchdodra at gmail.com ---
I toyed around. The issue (apparently) is that it *can* be converted as:

LATIN CAPITAL LETTER I (U+0049)
COMBINING DOT ABOVE (U+0307)

As such, when converted to lower case, it becomes:

LATIN SMALL LETTER I (U+0049)
COMBINING DOT ABOVE (U+0307)

EG:
//----
import std.uni, std.stdio, std.string, std.conv;

void main()
{
    auto c = 'İ'; // '\U0130' LATIN CAPITAL LETTER I WITH DOT ABOVE
    auto s = "İ"; // '\U0130' LATIN CAPITAL LETTER I WITH DOT ABOVE
    assert(std.uni.isUpper(c)); //Passes
    auto sl = std.uni.toLower(s).to!dstring;
    assert(sl == "\u0069\u0307"); //PASSES
}
//----

Because uni "thinks" the lowercase doesn't fit in a single dchar, it simply
does nothing (as documeted).

However, it's still wrong, as the standard (from what I read), is pretty clear
on the fact that the lower case is simply 'i'.

Furthermore, "LATIN SMALL LETTER I + COMBINING DOT ABOVE" is pretty
redundant...

--


More information about the Digitalmars-d-bugs mailing list