toLower() and Unicode are incomplete was: Re: avoid toLower in std.algorithm.sort compare alias

Ali Çehreli acehreli at yahoo.com
Sat Apr 21 18:43:23 PDT 2012


On 04/21/2012 04:24 PM, Jay Norwood wrote:
 > While playing with sorting the unzip archive entries I tried use of the
 > last example in http://dlang.org/phobos/std_algorithm.html#sort
 >
 > std.algorithm.sort!("toLower(a.name) <
 > toLower(b.name)",std.algorithm.SwapStrategy.stable)(entries);

Stealing this thread to point out that converting a letter to upper or 
lower case cannot be done without knowing the writing system. Phobos's 
toLower() documentation currently says: "Returns a string which is 
identical to s except that all of its characters are lowercase (in 
unicode, not just ASCII)."

Unicode cannot define the conversions of at least the following letters 
without knowing the actual alphabet that the text is written in:

- Lowercase of I is ı in some alphabets[*] and i in many others.

- Uppercase of i is İ in some alphabets[*] and I in many others.

Ali

[*] Turkish, Azeri, Chrimean Tatar, Gagauz, Celtic, etc.



More information about the Digitalmars-d-learn mailing list