To Walter, about char[] initialization by FF

Wed Aug 2 01:16:04 PDT 2006

Derek Parnell wrote:
> On Wed, 02 Aug 2006 00:11:26 -0700, Walter Bright wrote:
>> Derek Parnell wrote:
>>> On Tue, 1 Aug 2006 19:57:08 -0700, Andrew Fedoniouk wrote:
>>>> "Walter Bright" <newshound at digitalmars.com> wrote in message 
>>>> news:eao5st$2r1f$1 at digitaldaemon.com...
>>>>> Andrew Fedoniouk wrote:
>>>>>> Compiler accepts input stream as either BMP codes or full unicode set
>>>>> encoded using UTF-16.
>>>>>
>>>>> BMP is a subset of UTF-16.
>>>> Walter with deepest respect but it is not. Two different things.
>>>>
>>>> UTF-16 is a variable-length enconding - byte stream.
>>>> Unicode BMP is a range of numbers strictly speaking.
>>> Andrew is correct. In UTF-16, characters are variable length, from 2 to 4
>>> bytes long. In UTF-8, characters are from 1 to 4 bytes long (this used to
>>> be up to 6 but that has changed). UCS-2 is a subset of Unicode characters
>>> that are all represented by 2-byte integers. Windows NT had implemented
>>> UCS-2 but not UTF-16, but Windows 2000 and above support UTF-16 now.
>> If UCS-2 is not a subset of UTF-16, what UCS-2 sequences are not valid 
>> UTF-16?
> 
> Huh??? I said "UCS-2 is a subset of Unicode characters" Did you miss that?

I saw it, but that statement is not the same as "UCS-2 is a subset of 
UTF-16". The issue I was talking about is "BMP [UCS-2] is a subset of 
UTF-16", which Andrew keeps replying "it is not". You said "Andrew is 
correct", so I inferred you were agreeing that UCS-2 is not a subset of 
UTF-16.

> UTF-16 is not a subset as it can be used to encode every Unicode code
> point. UCS-2 is a subset as it can *not* encode code points that are
> outside of the "basic multilingual plane" (aka BMP). 

I think you and I are in agreement.