[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Sun Dec 13 16:02:15 PST 2015


https://issues.dlang.org/show_bug.cgi?id=15440

ag0aep6g at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ag0aep6g at gmail.com

--- Comment #1 from ag0aep6g at gmail.com ---
Here are three Unicode documents and what they say about the lowercase of
U+0130. (search for "LATIN CAPITAL LETTER I WITH DOT ABOVE"):

1) <http://www.unicode.org/charts/PDF/U0100.pdf> says: "lowercase is 0069 i".

2) <http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt> gives U+0069
as the lowercase, too, if I read it right.

3) <http://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip>
gives 'slc="0069" lc="0069 0307"'. I assume "slc" means "simple lowercase", and
"lc" means "lowercase".

So it seems that the "simple lowercase" is 'i', but the proper(?) lowercase is
"\u0069\u0307".

That makes sense when it's supposed to be reversible without assuming a Turkish
context. Uppercasing "\u0069\u0307" you get "\u0049\u0307" ('I' + combining
dot) which is equivalent to "\u0130".

Seems to me that std.uni is playing by the book, and that there's a point in
what the book says. But I don't know enough about Unicode to speak with
certainty.

--


More information about the Digitalmars-d-bugs mailing list