[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Mon Jan 11 12:10:55 PST 2016
https://issues.dlang.org/show_bug.cgi?id=15440
Ali Cehreli <acehreli at yahoo.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |acehreli at yahoo.com
--- Comment #3 from Ali Cehreli <acehreli at yahoo.com> ---
It looks like I am outdated on this issue because I had never heard of the 0069
0307 sequence before H. S. Teoh brought the following change to my attention:
https://github.com/D-Programming-Language/phobos/pull/3848
I've learned since then that the two-character sequence should be the default
but TR locale should still use just 0069. According to the following quote,
Java 7 behaves differently depending on locale:
http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/
<quote>
CODE LOWER TITLE UPPER LANGUAGE
0130; 0069 0307; 0130; 0130;
0130; 0069; 0130; 0130; tr;
0130; 0069; 0130; 0130; az;
Entries with a language take precedence over those without, so in my JVM where
the default locale is English, the first row of the mapping is used, which
lines-up with the codepoints that we saw outputted in our Java 7 example.
Therefore to make Java do the right thing here for Turkish, we need to
explicitly specify the Turkish locale (“tr” is the ISO 639 alpha-2 language
code for Turkish) to the toLowerCase method
</quote>
Should std.uni be locale-aware?
--
More information about the Digitalmars-d-bugs
mailing list