Fixing C's Biggest Mistake

Tue Feb 7 10:57:42 UTC 2023

On Monday, 2 January 2023 at 22:53:30 UTC, Walter Bright wrote:
> On 12/31/2022 2:28 AM, Max Samukha wrote:
>> For types that require runtime construction, initializing to 
>> T.init does not result in a constructed object.
> The idea is to:
>
> 1. have construction that cannot fail. This helps avoid things 
> like double-fault exceptions

Here, it would help if you clarified what you mean by 
_double-fault exceptions_ because I just tried to look it up and 
was lead to tennis and CPU interrupts.
I have a rough idea of what you man and could guess, but I could 
just ask.

Construction that cannot fail sounds nice, and if an aggregate 
type constructor can pull it off, it should go for it. But this 
sounds a lot like `nothrow` and not something the language should 
impose. I know people dislike complicated rules because 
exceptions (to rules, not `Exception`s), but a possibility could 
be that nullary `struct` constructors must be `nothrow` or be 
annotated `throw`.

> 2. have initializers that can be placed in read only memory

I’m not saying that `init` isn’t a great idea. It’s just that 
`init` shouldn’t be used explicitly, but only as “the thing a 
constructor must act on to produce a valid object”. A “naked” 
`init` may be an object that violates its invariants.
An example would be a string optimized for short values (SSO). It 
has (at least) a `pointer` to data, a fixed-size internal 
`buffer`, and a `length` with the invariant: `pointer = 
&buffer[0]` if and only if `length <= buffer.length`. A SSO’s 
`init` cannot possibly represent the empty string unless we allow 
`pointer` to be `null` to represent it. This means that a SSO has 
two representations for the empty string. Or we interpret the 
`null` data pointer as a `null` string. In any case, we get 
something we don’t want.

> 3. have something to set a destroyed object to, in case of 
> dangling references and other bugs

If a NaN state is available, use that. (I don’t think NaN states 
are bad; I actually think that every built-in type except `bool` 
should have one: Signed and unsigned integer types could use 
`T.min` and `T.max`. Setting those to 0 is bad because in a lot 
of contexts, 0 is a perfectly reasonable value, whereas `int.min` 
and `size_t.max` rarely are.

> 4. present to a constructor an already initialized object. This 
> prevents the common C++ problem of adding a field and 
> forgetting to construct it in one of the overloaded 
> constructors, a problem that has plagued me with erratic 
> behavior

The problem is, C++ does not complain about you forgetting that 
field. (For other people:) In C++, if a struct field is of 
built-in type (e.g. `int`) and you forget to initialize it, it 
has an unspecified value. Aggregate types call a nullary 
constructor and fail to compile if no nullary constructor exists.
Now, even if there is a nullary constructor, it might not be what 
you want.

Requiring initialization of every field in every constructor is 
what C++ lacks.

> 5. provide a NaN state. I know many people don't like NaN 
> states, but if one does, the default construction is perfect 
> for implementing one.

One question is penalty for the NaN state. Floating-point NaN 
values are supported by hardware. If we declared `int.min` and 
`size_t.max` as their respective types’ NaN, we’d probably 
specify existing behavior. The issue is, making them sticky 
incurs costs.

Floating-point NaN serves two purposes: Indicate an invalid 
result and error propagation through the program execution. 
Integer min/max values are used for the former already. People 
don’t like them do the latter probably.

Another issue of floating-point NaN values is their weird 
comparison behavior. I understand the argument that `x == y` 
should be false if `x` and `y` happen to be `NaN`, but `if (x == 
double.nan)` being silently always false feels broken.

> 6. it fits in well with (future) sumtypes, where the default 
> initializer can be the error state.

I’m curious what comes out of this.

> An alternative to factory functions is to have a constructor 
> with a dummy argument. Nothing says one has to actually use the 
> parameters to a constructor.

Or we could just allow a nullary struct constructor. It should be 
backwards compatible (at least to a large degree) if D defines 
`this() @… {}` when `this()` is not defined explicitly (as 
`@disable`d or otherwise). An explicit `this()` should be 
`nothrow` if failure is a problem. With `throw` as an attribute, 
`this() throw { … }` is a kind of: “Sorry, Walter, I know you 
wanted the best for me, but for this type, it’s too wrong to be 
right.”

That way, a declaration like `T x;` will call a constructor that 
– in almost all cases – does nothing, in the remaining cases, in 
almost all cases does something that cannot fail.

Sorry for the late answer.