The Case Against Autodecode
Patrick Schluter via Digitalmars-d
digitalmars-d at puremagic.com
Sat Jun 4 01:00:07 PDT 2016
One has also to take into consideration that Unicode is the way
it is because it was not invented in an empty space. It had to
take consideration of the existing and find compromisses allowing
its adoption. Even if they had invented the perfect encoding, NO
ONE WOULD HAVE USED IT, as it would have fubar the existing.
As it was invented it allowed a (relatively smooth) transition.
Here some points that made it even possible that Unicode could be
adopted at all:
- 16 bits: while that choice was a bit shortsighted, 16 bits is a
good compromice between compactness and richness (BMP suffice to
express nearly all living languages).
- Using more or less the same arrangement of codepoints as in the
different codepages. This allowed to transform legacy documents
with simple scripts (matter of fact I wrote a script to repair
misencoded Greek documents, it consisted mainly of unich =
ch>0x80 ? ch+0x2D0 : ch;
- Utf-8: this was the genious stroke encoding that allowed to mix
it all without requiring awful acrobatics (Joakim is completely
out to lunch on that one, shifting encoding without
self-synchronisation are hellish, that's why Chinese and Japanese
adopted Unicode without hesitation, they had enough experience
with their legacy encodings.
- Letting time for the transition.
So all the points that people here criticize, were in fact the
reason why Unicode could even be become the standard it is now.
More information about the Digitalmars-d
mailing list