To Walter, about char[] initialization by FF

Andrew Fedoniouk news at terrainformatica.com
Sat Jul 29 14:35:39 PDT 2006


"Carlos Santander" <csantander619 at gmail.com> wrote in message 
news:eagiip$1lad$3 at digitaldaemon.com...
> Andrew Fedoniouk escribió:
>> 2) For char[] selection of 0xFF is wrong and even worse.
>> For example character with code 0xFF in Latin-I encoding is
>> "y diaeresis". In many European languages and Far East encodings 0xFF is 
>> a valid code point.
>> For example in KOI-8 encoding 0xFF is officially assigned value.
>>
>
> But D's chars are UTF-8, not Latin-1 nor any other, so I don't think this 
> applies.
>

UTF-8 is a multibyte transport encoding of full 21-bit UNICODE codepoint.
Strictly speaking single byte in UTF-8 sequence cannot be named as 
char[acter]

char as typename implies that value of its type contains some complete
codepoint (assumed that information about codepage is stored somewhere
or is known at the point of use)

I mean that "UTF-8 characrter" (if it makes any sense at all) as type
is always char[] and not a single char.

0xFF as a char initialization value implies that D char is not supposed
to handle single byte character encodings at all. Is this an original 
intention?

Andrew Fedoniouk.
http://terrainformatica.com














More information about the Digitalmars-d mailing list