Today's programming challenge - How's your Range-Fu ?
Shachar Shemesh via Digitalmars-d
digitalmars-d at puremagic.com
Sun Apr 19 02:42:15 PDT 2015
On 19/04/15 10:51, Abdulhaq wrote:
> MiOn Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
>> On 18/04/15 21:40, Walter Bright wrote:
>> Also, notice that some letters can only be achieved using multiple
>> code points. Hebrew diacritics, for example, do not, typically, have a
>> composite form. My name fully spelled (which you rarely would do),
>> שַׁחַר, cannot be represented with less than 6 code points, despite
>> having only three letters.
>>
>
> Yes Arabic is similar too
>
Actually, the Arab presentation forms serve a slightly different
purpose. In Hebrew, the presentation forms are mostly for Bibilical
text, where certain decorations are usually done.
For Arabic, the main reason for the presentation forms is shaping.
Almost every Arabic letter can be written in up to four different forms
(alone, start of word, middle of word and end of word). This means that
Arabic has 28 letters, but over 100 different shapes for those letters.
These days, when the font can do the shaping, the 28 letters suffice.
During the DOS days, you needed to actually store those glyphs
somewhere, which means that you needed to allocate a number to them.
In Hebrew, some letters also have a final form. Since the numbers are so
significantly smaller, however, (22 letters, 5 of which have final
forms), Hebrew keyboards actually have all 27 letters on them. Going
strictly by the "Unicode way", one would be expected to spell שלום with
U05DE as the last letter, and let the shaping engine figure out that it
should use the final form (or add a ZWNJ). Since all Hebrew code charts
contained a final form Mem, however, you actually spell it with U05DD in
the end, and it is considered a distinct letter.
Shachar
More information about the Digitalmars-d
mailing list