To Walter, about char[] initialization by FF

Wed Aug 2 00:46:17 PDT 2006

On Wed, 02 Aug 2006 00:11:26 -0700, Walter Bright wrote:

> Derek Parnell wrote:
>> On Tue, 1 Aug 2006 19:57:08 -0700, Andrew Fedoniouk wrote:
>> 
>>> (Hope this long dialog will help all of us to better understand what UNICODE 
>>> is)
>>>
>>> "Walter Bright" <newshound at digitalmars.com> wrote in message 
>>> news:eao5st$2r1f$1 at digitaldaemon.com...
>>>> Andrew Fedoniouk wrote:
>>>>> Compiler accepts input stream as either BMP codes or full unicode set
>>>> encoded using UTF-16.
>>>>
>>>> BMP is a subset of UTF-16.
>>> Walter with deepest respect but it is not. Two different things.
>>>
>>> UTF-16 is a variable-length enconding - byte stream.
>>> Unicode BMP is a range of numbers strictly speaking.
>> 
>> Andrew is correct. In UTF-16, characters are variable length, from 2 to 4
>> bytes long. In UTF-8, characters are from 1 to 4 bytes long (this used to
>> be up to 6 but that has changed). UCS-2 is a subset of Unicode characters
>> that are all represented by 2-byte integers. Windows NT had implemented
>> UCS-2 but not UTF-16, but Windows 2000 and above support UTF-16 now.
> 
> If UCS-2 is not a subset of UTF-16, what UCS-2 sequences are not valid 
> UTF-16?

Huh??? I said "UCS-2 is a subset of Unicode characters" Did you miss that?
UTF-16 is not a subset as it can be used to encode every Unicode code
point. UCS-2 is a subset as it can *not* encode code points that are
outside of the "basic multilingual plane" (aka BMP). 

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
2/08/2006 5:43:18 PM