Turkish 'I's can't D either

Ali Cehreli acehreli at yahoo.com
Sat Sep 5 03:01:31 PDT 2009


Stewart Gordon Wrote:

> I is the uppercase form of ı.
> Ä° is the uppercase form of i.
> 
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> lists them as
> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
> 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN 
> CAPITAL LETTER I DOT;;;0069;
> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
> 
> but this is inadequate: while it tells you how to case-convert ı and İ 
> (that's what the 0049 and 0069 at the end are), you need to add a 
> locale-specific rule to all this to convert I and i in Turkish.

I think there should be three i's to solve problems like being able to capitalize strings that contain words from two languages as in e.g. an imaginary company name "Ali & Jim". The two lowercase i's should have been separate to be able to work with them correctly. The problem stems from Unicode...

A group of us are about to start a small project that involves thin wrappers around Phobos to favor the Turkish behavior for character and string processing. That should help with applications that are happy to use Turkish only. More complex applications could use libraries like IBM's ICU.

Ali



More information about the Digitalmars-d mailing list