Shouldn't bool be initialized to 0xFF ?

Wed Aug 16 16:16:19 PDT 2006

Lionello Lunesu wrote:
> I've been following those "why's char init'ed to -1?" / "why's float 
> init'ed to NaN?" thread, and I'd have to agree with Walter: a crazy 
> initialization sure makes it obvious where the problem lies.
> 
> So: why isn't "bool" initialized to 0xFF too? In dmd v0.164, bool.init 
> is 0, which is a valid value for bool. For byte/int/long I get it, since 
> there is no invalid value for byte/int/long. But for bool there is, so 
> the same reasoning as char/float applies.
> 
> We could even name 0xFF "Not A Bool" ;)
> 
> L.

I understand why this idea might seem appealing but I am fairly certain it is 
ultimately a bad idea.

The two examples you gave are different from bool.

In the case of the reals the reason so much care is taken to assure NaNs 
propogate freely is because (unlike any other basic type) the result of almost 
any real valued operation which has valid arguments may not be representable as 
a real number. Using NaN as a default initializer is merely an elegant side 
effect. NaNs require special treatment however. In particular note:

   if( NaN <= 0 ) // always fails
   if( NaN >= 0 ) // also always fails

Obviously the elegance of NaN as an initializer prompted the realization that 
0xFF is illegal for any ubyte value in a UTF-8 encoded string. Because any 
invalid UTF-8 encoded string should be caught when it is used it is likely to 
assume an invalid encoding would raise an Exception somewhere. There is a subtle 
distinction between NaN and 0xFF. 0xFF by itself does not represent an entire 
invalid sequence. A valid sequence can have use up to 4 bytes to encode a single 
Unicode code point. Other values in various positions can also invalidate a 
sequence. 0xC0, 0xC1, 0xF5 and 0xFF are all invalid in any position. In contrast 
to the semantics of NaN, in particular note:

   if(0xF5)       // always succeeds
   if(0xFF)       // always succeeds
   if(0xFF >= 0)  // always succeeds

The biggest problem with your suggestion is the existing semantics of conditions 
would mean just changing the initializer would not ordinarily indicate a problem 
when using a bool value:

   http://www.digitalmars.com/d/statement.html#if

   /Expression/ is evaluated and must have a type that can be converted to a
   boolean. If it's true the /ThenStatement/ is transferred to, else the
   /ElseStatement/ is transferred to.

So the 'undefined' bool value 0xFF will be converted just like a char or ubyte 
and will evaluate as true! Certainly this behavior is even worse than having a 
default of false?

One might imagine it should be possible to change the way if expressions work. 
Except you would also have to change the equivalent for, foreach and while 
conditionals as well. The worst part is that the general case of conditional 
expressions are in fact testing relations on non-bool values -- so most of the 
time you will be testing to see if you should throw an Exception for bool's 
'undefined' value in cases where it is not even possible.

A further problem I see comes from my (possibly wrong?) belief that bit[] has 
become bool[]. If bool[32] occupies 4 bytes as I believe then in this case there 
is no room for bool's 'undefined'. So then one must ask what value should be 
bools default in this case?