More Unicode fun

foobar foo at bar.com
Fri Jan 14 14:29:37 PST 2011


After reading most of the posts above the subject, I wanted to first thank Spir, Michel and others who brought this topic to light.
Since Andrei and others asked for more information on the subject I wanted to contribute what I know to this discussion:

1. Regarding combining marks, Hebrew (and Arabic) make extensive use of this.

Hebrew has letters only for consonants, vowels are optional combining marks. In addition, some letters have diacritics (e.g an 's' sound vs a 'sh' in Hebrew is differentiated based on if there's a diacritic dot on the left or the right top corner of the letter 'ש') 
in Addition to that, some punctuation is also combining marks (you add a middle dot to emphasize a consonant whereas in western languages you double the letter (such as in the word 'letter')
On top of that, biblical text has an additional set of marks to represent the chanting rhythm.

So it's definitly possible in Hebrew to have more than one combining mark on the same base letter. When comparing such letters the order of the combining marks should not matter and I think there's a default normalized order in such cases. 

2. case depends on locale. In Turkish for instance, they have two 'i' letters, one with a dot and one without. Therefore the Turkish upper case of i is a capital 'i' with a dot, different from English.


More information about the Digitalmars-d mailing list