std.string.toUpper() for greek characters

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Oct 3 11:21:39 PDT 2012


On 03-Oct-12 21:10, Ali Çehreli wrote:
> On 10/03/2012 03:56 AM, Minas wrote:
>> Currently, toUpper() (and probably toLower()) does not handle greek
>> characters correctly. I fixed toUpper() by making a another function for
>> greek characters
>>
>> // called if (c >= 0x387 && c <= 0x3CE)
>> dchar toUpperGreek(dchar c)
>> {
>> if( c >= 'α' && c <= 'ω' )
>> {
>> if( c == 'ς' )
>> c = 'Σ';
>> else
>> c -= 32;
>> }
>> else
>> {
>> dchar[dchar] map;
>> map['ά'] = 'Ά';
>> map['έ'] = 'Έ';
>> map['ή'] = 'Ή';
>> map['ί'] = 'Ί';
>> map['ϊ'] = 'Ϊ';
>> map['ΐ'] = 'Ϊ';
>> map['ό'] = 'Ό';
>> map['ύ'] = 'Ύ';
>> map['ϋ'] = 'Ϋ';
>> map['ΰ'] = 'Ϋ';
>> map['ώ'] = 'Ώ';
>>
>> c = map[c];
>> }
>>
>> return c;
>> }
>>
>> Then, in toUpper()
>> {
>> ....
>> if (c >= 0x387 && c <= 0x3CE)
>> c = toUpperGreek()...
>> ///
>> }
>>
>> Do you think it should stay like that or I should copy-paste it in the
>> body of toUpper()?
>>
>> I'm going to fix toLower() as well and make a pull request.
>
> I don't want to detract from the usefulness of these functions but
> toupper and tolower has been two of the strangests functions of the
> computer history. It is amazing that they are still accepted, because
> they are useful in very limited situations and those situations are
> becoming rarer as more and more systems support Unicode.
>
Glad you showed up!

One and by far the most useful case is case-insensitive matching.
That being said this doesn't and shouldn't involve toLower/toUpper  (and 
on the whole string) anywhere. Not only it's multipass vs single pass 
but it's also wrong. As a lot of other ASCII-minded carry-overs.

Other then this and being used as some intermediate sanitized form I 
don't think it has much use.

> Two quick examples:
>
> 1) How should this string be capitalized in a scientific article?
>
>    "Anti-obesity effects of α-lipoic acid"

There is a lot of lousy conversions. The basic toLower is defined in the 
standard, try it here:
http://unicode.org/cldr/utility/transform.jsp?a=Upper&b=Anti-obesity+effects+of+%CE%B1-lipoic+acid

> I don't think the α in there should be upper-cased.

Depends on why you are doing it in the first place :) Capitalizing 
scientific article strikes me as kind of strange as well.


> 2) How should this name be capitalized in a list of names?
>
>    "Ali"
>
Again what's the goal of capitalization here?
Simplifying matching afterwards? - Then it doesn't matter as long as 
it's lousiness is acceptable (rarely so) and it stays within the system, 
i.e. doesn't leak away.

> It completely depends on the writing system of that string itself, not
> even the current locale. (There are two uppercases that I know of, which
> can be considered as correct: "ALI" and "ALİ".)
>
One word: tailoring. Basically any software made in Turkey has to do ALİ :)
Only half-joking.

> I agree that your toUpper() and toLower() will be useful in many
> contexts but will necessarily do the wrong thing in others.
>
> Ali


-- 
Dmitry Olshansky


More information about the Digitalmars-d-announce mailing list