The Case Against Autodecode

Patrick Schluter via Digitalmars-d digitalmars-d at puremagic.com
Sat Jun 4 01:00:07 PDT 2016


One has also to take into consideration that Unicode is the way 
it is because it was not invented in an empty space. It had to 
take consideration of the existing and find compromisses allowing 
its adoption. Even if they had invented the perfect encoding, NO 
ONE WOULD HAVE USED IT, as it would have fubar the existing.
As it was invented it allowed a (relatively smooth) transition. 
Here some points that made it even possible that Unicode could be 
adopted at all:
- 16 bits: while that choice was a bit shortsighted, 16 bits is a 
good compromice between compactness and richness (BMP suffice to 
express nearly all living languages).
- Using more or less the same arrangement of codepoints as in the 
different codepages. This allowed to transform legacy documents 
with simple scripts (matter of fact I wrote a script to repair 
misencoded Greek documents, it consisted mainly of  unich = 
ch>0x80 ? ch+0x2D0 : ch;
- Utf-8: this was the genious stroke encoding that allowed to mix 
it all without requiring awful acrobatics (Joakim is completely 
out to lunch on that one, shifting encoding without 
self-synchronisation are hellish, that's why Chinese and Japanese 
adopted Unicode without hesitation, they had enough experience 
with their legacy encodings.
- Letting time for the transition.

So all the points that people here criticize, were in fact the 
reason why Unicode could even be become the standard it is now.


More information about the Digitalmars-d mailing list