@safe accessing of union members

Paul Backus snarwin at gmail.com
Thu Mar 18 20:59:32 UTC 2021


On Thursday, 18 March 2021 at 18:24:21 UTC, Q. Schroll wrote:
>
> Reading DIP 1035 that you co-authored, I figured my notion of a 
> "type that has invariants" could be helpful. In an example of 
> the DIP, there's a void initialization presented as a reason 
> why a type called ShortString is not memory safe. If you look 
> at my definition of "type with invariants", ShortString would 
> be considered a type with invariants because it has private 
> variables (and has no @disable invariant).

Your definition of "type with invariants" is a bad one--not 
necessarily from a soundness perspective, but from a 
good-language-design perspective. Rather than have the compiler 
try to guess the programmer's intent based on things like the use 
of `private` or the presence of padding bytes (!), and force the 
programmer to correct the compiler when it guesses wrong (with 
`@disable invariant`), it is both much simpler and much better UX 
to let the programmer state explicitly that a particular type 
needs to have its state protected from uncontrolled mutation in 
@safe code.

> @safe should error if a clueless programmer writes and uses it 
> and accidentally introduces UB. This includes writing @trusted 
> functions that are properly written. "If ShortString.length 
> could be marked as @system, this dilemma would not exist." 
> While true, it is not obvious why a clueless programmer would 
> mark `length` @system.

There is literally nothing that the language can possibly do to 
protect a clueless programmer from writing unsound @trusted code. 
The best we can hope for is that the presence of the @trusted 
attribute serves as a big, flashing "DANGER" sign to anyone 
auditing the code.

> I missed void initialization in my post, but interestingly, 
> void initialization of a type T object is @safe if and only if 
> in the `union { T obj; ubyte[T.sizeof] bytes; }` it is valid to 
> initialize `bytes` arbitrarily and use `obj`.

A less roundabout definition: void initialization of an object of 
type T is @safe if and only if every bit-pattern that fits into 
T's in-memory representation represents a safe value [1] of type 
T.

>> Not necessarily. The statement
>>
>>     int* p = cast(int*) 0xDEADBEEF;
>>
>> ...does not have undefined behavior.
>
> The spec says you're wrong, at least for structs and classes:
> "If the invariant does not hold, then the program enters an 
> invalid state."

I am not talking about structs and classes. I am talking about a 
specific statement, whose behavior is explicitly well-defined 
according to the language spec [2].

>> Unions themselves have no invariants,
>
> They can have them. SumType is an example

Point taken. What I meant to say was that union *types*, 
themselves, have no invariants--which is true. Specific union 
*variables* may indeed have invariants imposed upon them by the 
programmer above and beyond what their types imply (although due 
to issue 20941 [3], @trusted code that relies on such invariants 
is currently unsound).

>> even when their members do, because access to those members is 
>> forbidden in @safe code,
>
> This is wrong. Accessing union members is sometimes considered 
> @safe currently although it clearly isn't.The compiler detects 
> pointer overlappings as @system, but doesn't for any other 
> invariants types have.

I agree that this is a bug. [4]

> The case with bool is an instance of the problem.
> In my opinion, void-initializing a bool should be @system.

Again, I agree.

[1] https://dlang.org/spec/function.html#safe-values
[2] https://dlang.org/spec/type.html#pointer-conversions
[3] https://issues.dlang.org/show_bug.cgi?id=20941
[4] https://issues.dlang.org/show_bug.cgi?id=21665


More information about the Digitalmars-d mailing list