De-Referencing A Pointer

xs0 xs0 at xs0.com
Tue Mar 21 15:56:18 PST 2006


James Dunne wrote:
> Correct me on this if I am wrong:
> 
> UNICODE is *not* an _encoding_ standard; it is a standard mapping of 
> character glyphs to integer values and specifies no requirements for 
> storage or encoding.

Well, since you asked - it's not glyphs, but characters :)

http://en.wikipedia.org/wiki/Glyph

> The encoding to which you (and many others) refer to by the name of 
> UNICODE is in fact UCS-2, I believe.  This is the encoding where the 
> Basic Multilingual Plane (BMP) of the Unicode table maps directly onto 
> 65536 values.
And here it's more complicated. UCS-2 is exactly what you say, but AFAIK 
D uses UTF-16 and so does Windows:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp

They're almost the same, except UTF-16 allows surrogate pairs for 
encoding other character planes, while UCS-2 doesn't. I think UCS-2 is 
somewhat deprecated generally, exactly because of this reason.


xs0



More information about the Digitalmars-d-learn mailing list