To Walter, about char[] initialization by FF

Walter Bright newshound at digitalmars.com
Wed Aug 2 00:11:26 PDT 2006


Derek Parnell wrote:
> On Tue, 1 Aug 2006 19:57:08 -0700, Andrew Fedoniouk wrote:
> 
>> (Hope this long dialog will help all of us to better understand what UNICODE 
>> is)
>>
>> "Walter Bright" <newshound at digitalmars.com> wrote in message 
>> news:eao5st$2r1f$1 at digitaldaemon.com...
>>> Andrew Fedoniouk wrote:
>>>> Compiler accepts input stream as either BMP codes or full unicode set
>>> encoded using UTF-16.
>>>
>>> BMP is a subset of UTF-16.
>> Walter with deepest respect but it is not. Two different things.
>>
>> UTF-16 is a variable-length enconding - byte stream.
>> Unicode BMP is a range of numbers strictly speaking.
> 
> Andrew is correct. In UTF-16, characters are variable length, from 2 to 4
> bytes long. In UTF-8, characters are from 1 to 4 bytes long (this used to
> be up to 6 but that has changed). UCS-2 is a subset of Unicode characters
> that are all represented by 2-byte integers. Windows NT had implemented
> UCS-2 but not UTF-16, but Windows 2000 and above support UTF-16 now.

If UCS-2 is not a subset of UTF-16, what UCS-2 sequences are not valid 
UTF-16?




More information about the Digitalmars-d mailing list