Selectable encodings

Anders F Björklund afb at algonet.se
Thu Apr 6 08:12:39 PDT 2006


James Dunne wrote:

> The char type is really a misnomer for dealing with UTF-8 encoded 
> strings.  It should be named closer to "code-unit for UTF-8 encoding". 

Yeah, but it does hold an *ASCII* character ?

Usually the D code handles char[] with dchar,
but with a "short path" for ASCII characters...

> I could be wrong (and I bet I am) on the terminology used to describe
> char, but I really mean it to just store a full Unicode character
> such that strings of chars can safely assume character index == array
> index.

For the general case, UTF-32 is a pretty wasteful
Unicode encoding just to have that priviledge ?

See http://www.unicode.org/faq/utf_bom.html#12

--anders



More information about the Digitalmars-d mailing list