Why UTF-8/16 character encodings?
Diggory
diggsey at googlemail.com
Sat May 25 00:48:04 PDT 2013
I think you are a little confused about what unicode actually
is... Unicode has nothing to do with code pages and nobody uses
code pages any more except for compatibility with legacy
applications (with good reason!).
Unicode is:
1) A standardised numbering of a large number of characters
2) A set of standardised algorithms for operating on these
characters
3) A set of standardised encodings for efficiently encoding
sequences of these characters
You said that phobos converts UTF-8 strings to UTF-32 before
operating on them but that's not true. As it iterates over UTF-8
strings it iterates over dchars rather than chars, but that's not
in any way inefficient so I don't really see the problem.
Also your complaint that UTF-8 reserves the short characters for
the english alphabet is not really relevant - the characters with
longer encodings tend to be rarer (such as special symbols) or
carry more information (such as chinese characters where the same
sentence takes only about 1/3 the number of characters).
More information about the Digitalmars-d
mailing list