To Walter, about char[] initialization by FF
Walter Bright
newshound at digitalmars.com
Sat Jul 29 14:32:30 PDT 2006
Andrew Fedoniouk wrote:
> Following assumption (
> http://www.digitalmars.com/d/archives/digitalmars/D/3239.html):
>
> "codepoint U+FFFF is not a legitimate Unicode character, and, furthermore,
> it is guaranteed by the
> Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
> This codepoint will remain forever unassigned, precisely so that it may be
> used
> for purposes such as this."
>
> is just wrong.
>
> 1) 0xFFFF is a valid UNICODE character - it is one of the "Specials" from
> R-zone: {U+FFF0..U+FFFF} - region assigned already.
"the value FFFF is guaranteed not to be a Unicode character at all"
http://www.unicode.org/charts/PDF/UFFF0.pdf
> 2) For char[] selection of 0xFF is wrong and even worse.
> For example character with code 0xFF in Latin-I encoding is
> "y diaeresis". In many European languages and Far East encodings 0xFF is a
> valid code point.
> For example in KOI-8 encoding 0xFF is officially assigned value.
char[] is not Unicode, it is UTF-8. For UTF-8, 0xFF is not a valid
value. The Unicode U00FF is not encoded into UTF-8 as FF.
"The octet values C0, C1, F5 to FF never appear."
http://www.ietf.org/rfc/rfc3629.txt
> What is the point of current initializaton?
The point is to initialize it with an invalid value, in order to flush
out uninitialized data errors.
> If you are doing intialization already
> and this intialization is a part of specification so why not to use
> official "Nul" values in this case?
Because 0 is a valid UTF-8 character.
> You are doing the same for floats - you are using NaNs there
> (Null value for floats). Why not to use the same for chars?
The FF initialization does correspond (as close as we can get) with NaN
for floats. 0 can masquerade as legitimate data, FF cannot.
More information about the Digitalmars-d
mailing list