toLower() and Unicode are incomplete was: Re: avoid toLower in std.algorithm.sort compare alias

Dmitry Olshansky dmitry.olsh at gmail.com
Sun Apr 22 01:09:59 PDT 2012


On 22.04.2012 5:43, Ali Çehreli wrote:
> On 04/21/2012 04:24 PM, Jay Norwood wrote:
>  > While playing with sorting the unzip archive entries I tried use of the
>  > last example in http://dlang.org/phobos/std_algorithm.html#sort
>  >
>  > std.algorithm.sort!("toLower(a.name) <
>  > toLower(b.name)",std.algorithm.SwapStrategy.stable)(entries);
>
> Stealing this thread to point out that converting a letter to upper or
> lower case cannot be done without knowing the writing system. Phobos's
> toLower() documentation currently says: "Returns a string which is
> identical to s except that all of its characters are lowercase (in
> unicode, not just ASCII)."

Oh, come on. This function wasn't updated for ages. I bet this wording 
here is intact since unicode 4.0 ;)

>
> Unicode cannot define the conversions of at least the following letters
> without knowing the actual alphabet that the text is written in:
>
> - Lowercase of I is ı in some alphabets[*] and i in many others.
>
> - Uppercase of i is İ in some alphabets[*] and I in many others.
>

Fair point. The list however is not that long and a system may choose to 
support this or not (changing behavior based on writing system is called 
tailoring I believe).

> Ali
>
> [*] Turkish, Azeri, Chrimean Tatar, Gagauz, Celtic, etc.
>


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list