Today's programming challenge - How's your Range-Fu ?
Abdulhaq via Digitalmars-d
digitalmars-d at puremagic.com
Sun Apr 19 00:51:41 PDT 2015
MiOn Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
> On 18/04/15 21:40, Walter Bright wrote:
>>
>> I'm not arguing against the existence of the Unicode standard,
>> I'm
>> saying I can't figure any justification for standardizing
>> different
>> encodings of the same thing.
>>
>
> A lot of areas in Unicode are due to pre-Unicode legacy.
>
> I'm guessing here, but looking at the code points, é (U00e9 -
> Latin small letter E with acute), which comes from Latin-1,
> which is designed to follow ISO-8859-1. U0301 (Combining acute
> accent) comes from "Combining diacritical marks".
>
> The way I understand things, Unicode would really prefer to use
> U0065+U0301 rather than U00e9. Because of legacy systems, and
> because they would rather have the ISO-8509 code pages be 1:1
> mappings, rather than 1:n mappings, they introduced code points
> they really would rather do without.
>
> This also explains the "presentation forms" code pages (e.g.
> http://www.unicode.org/charts/PDF/UFB00.pdf). These were
> intended to be glyphs, rather than code points. Due to legacy
> reasons, it was not possible to simply discard them. They
> received code points, with a warning not to use these code
> points directly.
>
> Also, notice that some letters can only be achieved using
> multiple code points. Hebrew diacritics, for example, do not,
> typically, have a composite form. My name fully spelled (which
> you rarely would do), שַׁחַר, cannot be represented with less
> than 6 code points, despite having only three letters.
>
> The last paragraph isn't strictly true. You can use UFB2C +
> U05B7 for the first letter instead of U05E9 + U05C2 + U05B7.
> You would be using the presentation form which, as pointed
> above, is only there for legacy.
>
> Shachar
> or shall I say
> שחר
Yes Arabic is similar too
More information about the Digitalmars-d
mailing list