To Walter, about char[] initialization by FF

Walter Bright newshound at digitalmars.com
Sat Jul 29 14:32:30 PDT 2006


Andrew Fedoniouk wrote:
> Following assumption ( 
> http://www.digitalmars.com/d/archives/digitalmars/D/3239.html):
> 
> "codepoint U+FFFF is not a legitimate Unicode character, and, furthermore, 
> it is guaranteed by the
> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
> This codepoint will remain forever unassigned, precisely so that it may be 
> used
> for purposes such as this."
> 
> is just wrong.
> 
> 1) 0xFFFF is a valid UNICODE character - it is one of the "Specials" from
> R-zone: {U+FFF0..U+FFFF} - region assigned already.

"the value FFFF is guaranteed not to be a Unicode character at all"
http://www.unicode.org/charts/PDF/UFFF0.pdf


> 2) For char[] selection of 0xFF is wrong and even worse.
> For example character with code 0xFF in Latin-I encoding is
> "y diaeresis". In many European languages and Far East encodings 0xFF is a 
> valid code point.
> For example in KOI-8 encoding 0xFF is officially assigned value.

char[] is not Unicode, it is UTF-8. For UTF-8, 0xFF is not a valid 
value. The Unicode U00FF is not encoded into UTF-8 as FF.

"The octet values C0, C1, F5 to FF never appear." 
http://www.ietf.org/rfc/rfc3629.txt


> What is the point of current initializaton?

The point is to initialize it with an invalid value, in order to flush 
out uninitialized data errors.

> If you are doing intialization already
> and this intialization is a part of specification so why not to use
> official "Nul" values in this case?

Because 0 is a valid UTF-8 character.


> You are doing the same for floats - you are using NaNs there
>  (Null value for floats). Why not to use the same for chars?

The FF initialization does correspond (as close as we can get) with NaN 
for floats. 0 can masquerade as legitimate data, FF cannot.



More information about the Digitalmars-d mailing list