Turkish 'I's can't D either

Mon Aug 24 21:23:25 PDT 2009

You may be aware of the problems related to the consistency of the two separate letter 'I's in the Turkish alphabet (and the alphabets that are based on the Turkish alphabet).

Lowercase and uppercase versions of the two are consistent in whether they have a dot or not:

  http://en.wikipedia.org/wiki/Turkish_I

Turkish alphabet being in a position so close to the western alphabets, but not close enough, puts it in a strange position. (Strangely; the same applies geographically, politically, socially, etc. as well... ;))

Computer systems *almost* work for Turkish, but not for those two letters.

I love the fact that D allows Unicode letters in the source code and that it natively supports Unicode. I cannot stress enough how important this is. That is the single biggest reason why I decided to finally write a programming tutorial. Thank you to all who proposed and implemented those features!

Back to the Turquois 'I's... What a programmer is to do who is writing programs that deals with Turkish letters?

a) Accept that Phobos too has this age old behavior that is a result of premature optimization (i.e. this code in tolower: c + (cast(char)'a' - 'A'))

b) Accept that the problem is unsolvable because the letter I has two minuscules, and the letter i has two majuscules anyway, and that the intent is not always clear

c) Accept Turkish alphabet as being pathological (merely for being in the minority!), and use a Turkish version of Phobos or some other library

d) Solve the problem with locale support

Is option d possible with today's systems? Whose resposibility is this anyway? OS? Language? Program? Something else?

The fact that alphanumerical ordering is also of interest, I think this has something to do with locales.

Is there a way for a program to work with Turkish letters and ensure that the following program produces the expected output of 'dotless i', 'I with dot', and 0?

import std.stdio;
import std.string;
import std.c.locale;
import std.uni;

void main()
{
    const char * result = setlocale(LC_ALL, "tr_TR.UTF-8");
    assert(result);

    writeln(toUniLower('I'));
    writeln(toUniUpper('i'));
    writeln(indexOf("I",
                    '\u0131',               // dotless i
                    (CaseSensitive).no));
}

This is a practical question. I really want to be able to work with Turkish... :)

Thank you,
Ali