toLower() and Unicode are incomplete was: Re: avoid toLower in std.algorithm.sort compare alias
Dmitry Olshansky
dmitry.olsh at gmail.com
Sun Apr 22 01:09:59 PDT 2012
On 22.04.2012 5:43, Ali Çehreli wrote:
> On 04/21/2012 04:24 PM, Jay Norwood wrote:
> > While playing with sorting the unzip archive entries I tried use of the
> > last example in http://dlang.org/phobos/std_algorithm.html#sort
> >
> > std.algorithm.sort!("toLower(a.name) <
> > toLower(b.name)",std.algorithm.SwapStrategy.stable)(entries);
>
> Stealing this thread to point out that converting a letter to upper or
> lower case cannot be done without knowing the writing system. Phobos's
> toLower() documentation currently says: "Returns a string which is
> identical to s except that all of its characters are lowercase (in
> unicode, not just ASCII)."
Oh, come on. This function wasn't updated for ages. I bet this wording
here is intact since unicode 4.0 ;)
>
> Unicode cannot define the conversions of at least the following letters
> without knowing the actual alphabet that the text is written in:
>
> - Lowercase of I is ı in some alphabets[*] and i in many others.
>
> - Uppercase of i is İ in some alphabets[*] and I in many others.
>
Fair point. The list however is not that long and a system may choose to
support this or not (changing behavior based on writing system is called
tailoring I believe).
> Ali
>
> [*] Turkish, Azeri, Chrimean Tatar, Gagauz, Celtic, etc.
>
--
Dmitry Olshansky
More information about the Digitalmars-d-learn
mailing list