[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Sun Dec 13 16:02:15 PST 2015
https://issues.dlang.org/show_bug.cgi?id=15440
ag0aep6g at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ag0aep6g at gmail.com
--- Comment #1 from ag0aep6g at gmail.com ---
Here are three Unicode documents and what they say about the lowercase of
U+0130. (search for "LATIN CAPITAL LETTER I WITH DOT ABOVE"):
1) <http://www.unicode.org/charts/PDF/U0100.pdf> says: "lowercase is 0069 i".
2) <http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt> gives U+0069
as the lowercase, too, if I read it right.
3) <http://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip>
gives 'slc="0069" lc="0069 0307"'. I assume "slc" means "simple lowercase", and
"lc" means "lowercase".
So it seems that the "simple lowercase" is 'i', but the proper(?) lowercase is
"\u0069\u0307".
That makes sense when it's supposed to be reversible without assuming a Turkish
context. Uppercasing "\u0069\u0307" you get "\u0049\u0307" ('I' + combining
dot) which is equivalent to "\u0130".
Seems to me that std.uni is playing by the book, and that there's a point in
what the book says. But I don't know enough about Unicode to speak with
certainty.
--
More information about the Digitalmars-d-bugs
mailing list