To Walter, about char[] initialization by FF

Andrew Fedoniouk news at terrainformatica.com
Sat Jul 29 14:56:49 PDT 2006


"Walter Bright" <newshound at digitalmars.com> wrote in message 
news:eagk1o$1mph$1 at digitaldaemon.com...
> Andrew Fedoniouk wrote:
>> Following assumption ( 
>> http://www.digitalmars.com/d/archives/digitalmars/D/3239.html):
>>
>> "codepoint U+FFFF is not a legitimate Unicode character, and, 
>> furthermore, it is guaranteed by the
>> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode 
>> character.
>> This codepoint will remain forever unassigned, precisely so that it may 
>> be used
>> for purposes such as this."
>>
>> is just wrong.
>>
>> 1) 0xFFFF is a valid UNICODE character - it is one of the "Specials" from
>> R-zone: {U+FFF0..U+FFFF} - region assigned already.
>
> "the value FFFF is guaranteed not to be a Unicode character at all"
> http://www.unicode.org/charts/PDF/UFFF0.pdf
>
>
>> 2) For char[] selection of 0xFF is wrong and even worse.
>> For example character with code 0xFF in Latin-I encoding is
>> "y diaeresis". In many European languages and Far East encodings 0xFF is 
>> a valid code point.
>> For example in KOI-8 encoding 0xFF is officially assigned value.
>
> char[] is not Unicode, it is UTF-8. For UTF-8, 0xFF is not a valid value. 
> The Unicode U00FF is not encoded into UTF-8 as FF.
>
> "The octet values C0, C1, F5 to FF never appear." 
> http://www.ietf.org/rfc/rfc3629.txt
>
>
>> What is the point of current initializaton?
>
> The point is to initialize it with an invalid value, in order to flush out 
> uninitialized data errors.
>
>> If you are doing intialization already
>> and this intialization is a part of specification so why not to use
>> official "Nul" values in this case?
>
> Because 0 is a valid UTF-8 character.

1) What "UTF-8 character" means exactly?
2) In ASCII char(0) is officially NUL. Why not to initialize strings
by null?

>
>
>> You are doing the same for floats - you are using NaNs there
>>  (Null value for floats). Why not to use the same for chars?
>
> The FF initialization does correspond (as close as we can get) with NaN 
> for floats. 0 can masquerade as legitimate data, FF cannot.

I don't get it, sorry. In KOI-8R (Russian) enconding 0xFF is letter '?'
Are you saying that I cannot use char[] to represen russian text in D?

Andrew Fedoniouk.
http://terrainformatica.com





More information about the Digitalmars-d mailing list