Replacing tango.text.Ascii.isearch

Ali Çehreli acehreli at yahoo.com
Wed Oct 26 06:05:14 UTC 2022


On 10/25/22 22:49, Siarhei Siamashka wrote:

 > Unicode is significantly simpler than a set of various
 > incompatible 8-bit encodings

Strongly agreed.

 > I'm surely
 > able to ignore the peculiarities of modern Turkish Unicode

The problem with Unicode is its main aim of allowing characters of 
multiple writing systems in the same text. When multiple writing systems 
are in play, conflicts and ambiguities will appear.

 > and wait for
 > the other people to come up with a solution for D language if they
 > really care.

I solved my problem by writing an Alphabet hierarchy in the past. I 
don't like that code but it still works:

 
https://bitbucket.org/acehreli/ddili/src/4c0552fe8352dfe905c9734a57d84d36ce4ed476/src/alphabet.d#lines-50

It handles capitalization, ordering, etc. I use it when preparing the 
Index section of the Turkish edition of "Programming in D":

   http://ddili.org/ders/d/ix.html

One of the ambiguities is what came up on this thread: Should a word 
that starts with I (capital i) be listed under I (because it's Turkish) 
or under İ (because it's English)? So far, I am lucky because the only 
word that starts with I happens to be the English "IDE", so it goes 
under i (which appears as İ in the Turkish edition) and would make sense 
to a Turkish reader because a Turkish reader might (really?) accept it 
as the capital of ide.

It's confusing but it seems to work. :) It doesn't matter. Life is 
imperfect and things will somehow work in the end.

Ali



More information about the Digitalmars-d-learn mailing list