Today's programming challenge - How's your Range-Fu ?

via Digitalmars-d digitalmars-d at puremagic.com
Sun Apr 19 02:54:58 PDT 2015


On Sunday, 19 April 2015 at 02:20:01 UTC, Shachar Shemesh wrote:
> U0065+U0301 rather than U00e9. Because of legacy systems, and 
> because they would rather have the ISO-8509 code pages be 1:1 
> mappings, rather than 1:n mappings, they introduced code points 
> they really would rather do without.

That's probably right. It is in fact a major feat to have the 
world adopt a new standard wholesale, but there are also 
difficult "semiotic" issues when you encode symbols and different 
languages view symbols differently (e.g. is "ä" an "a" or do you 
have two unique letters in the alphabet?)

Take "å", it can represent a unit (ångström) or a letter with a 
circle above it, or a unique letter in the alphabet. The letter 
"æ" can be seen as a combination of "ae" or a unique letter.

And we can expect languages, signs and practices to evolve over 
time too. How can you normalize encodings without normalizing 
writing practice and natural language development? That would be 
beyond the mandate of a unicode standard organization...


More information about the Digitalmars-d mailing list