[Issue 15440] std.uni outputs \u0069\u0307 as the lower case of \u0130
    via Digitalmars-d-bugs 
    digitalmars-d-bugs at puremagic.com
       
    Mon Jan 11 12:10:55 PST 2016
    
    
  
https://issues.dlang.org/show_bug.cgi?id=15440
Ali Cehreli <acehreli at yahoo.com> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |acehreli at yahoo.com
--- Comment #3 from Ali Cehreli <acehreli at yahoo.com> ---
It looks like I am outdated on this issue because I had never heard of the 0069
0307 sequence before H. S. Teoh brought the following change to my attention:
  https://github.com/D-Programming-Language/phobos/pull/3848
I've learned since then that the two-character sequence should be the default
but TR locale should still use just 0069. According to the following quote,
Java 7 behaves differently depending on locale:
  http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/
<quote>
CODE       LOWER   TITLE   UPPER  LANGUAGE
0130;  0069 0307;   0130;   0130;
0130;  0069;        0130;   0130;       tr;
0130;  0069;        0130;   0130;       az;
Entries with a language take precedence over those without, so in my JVM where
the default locale is English, the first row of the mapping is used, which
lines-up with the codepoints that we saw outputted in our Java 7 example.
Therefore to make Java do the right thing here for Turkish, we need to
explicitly specify the Turkish locale (“tr” is the ISO 639 alpha-2 language
code for Turkish) to the toLowerCase method
</quote>
Should std.uni be locale-aware?
--
    
    
More information about the Digitalmars-d-bugs
mailing list