Today's programming challenge - How's your Range-Fu ?

Shachar Shemesh via Digitalmars-d digitalmars-d at puremagic.com
Sun Apr 19 02:42:15 PDT 2015


On 19/04/15 10:51, Abdulhaq wrote:
> MiOn Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
>> On 18/04/15 21:40, Walter Bright wrote:

>> Also, notice that some letters can only be achieved using multiple
>> code points. Hebrew diacritics, for example, do not, typically, have a
>> composite form. My name fully spelled (which you rarely would do),
>> שַׁחַר, cannot be represented with less than 6 code points, despite
>> having only three letters.
>>
>
> Yes Arabic is similar too
>

Actually, the Arab presentation forms serve a slightly different 
purpose. In Hebrew, the presentation forms are mostly for Bibilical 
text, where certain decorations are usually done.

For Arabic, the main reason for the presentation forms is shaping. 
Almost every Arabic letter can be written in up to four different forms 
(alone, start of word, middle of word and end of word). This means that 
Arabic has 28 letters, but over 100 different shapes for those letters. 
These days, when the font can do the shaping, the 28 letters suffice. 
During the DOS days, you needed to actually store those glyphs 
somewhere, which means that you needed to allocate a number to them.

In Hebrew, some letters also have a final form. Since the numbers are so 
significantly smaller, however, (22 letters, 5 of which have final 
forms), Hebrew keyboards actually have all 27 letters on them. Going 
strictly by the "Unicode way", one would be expected to spell שלום with 
U05DE as the last letter, and let the shaping engine figure out that it 
should use the final form (or add a ZWNJ). Since all Hebrew code charts 
contained a final form Mem, however, you actually spell it with U05DD in 
the end, and it is considered a distinct letter.

Shachar


More information about the Digitalmars-d mailing list