@safe accessing of union members

Wed Mar 17 16:43:25 UTC 2021

There was a discussion lately about overlapping pointers in 
unions, @safe and SumType.

Obvious results: It is @safe overlapping some types with others 
and accessing them in @safe code and it is un- at safe to do so with 
other types.

I had an insight that I wanted to share: invariants. A type has 
invariants if there could be bit-patterns that are invalid. (If 
I'm not mistaken, it's really that simple.) In code, this would 
mean: If I had a mutable object obj of type T, and I did

     T obj = ...;
     ubyte[] obj_slice = (cast(ubyte*)cast(void*)(&obj))[0 .. 
T.sizeof];
     size_t i = ...;
     assert(i < obj_slice.length);
     obj_slice[i] = ...;

would it result in obj being un- at safe to use in the sense that 
using it would result in un- at safe operations? Built-in integer 
and floating point types have no invariants. Every single of the 
2³² bit-patterns an int can hold is a valid number, the same is 
true for floats (it's not un- at safe reading a NaN!). Compare that 
to bool and pointers. Basically, bool is uint, but with the 
invariant that its value is 0 or 1. Any other bit-pattern is 
invalid for a bool value. At first glance, any bit-pattern is 
valid for a pointer -- but that's not true, because what is a 
valid bit-pattern need not be fixed (like for bool). An invariant 
can be: Apart from null, it must be valid to dereference. The set 
of addresses valid to dereference changes at run-time. (Even if 
the set of valid-to-dereference addresses could become the whole 
address space, it suffices that there could be situations at 
run-time when it isn't.)

Breaking invariants incurs undefined behavior and that is 
un- at safe by definition. Practically, there's no way @trusted 
functions can work if they cannot in general assume that the 
types' invariants they deal with are met.

So, what kinds of union uses are @safe?
Answer: If all members of the union have no invariants.

There are cases like, if only the currently active union member 
is accessed, it's @safe to use it. This check needs control-flow 
in general, but can be watered down to checks that only one union 
member is ever active.

When should the language (conservatively) assume an aggregate 
type (struct, class, etc.) has invariants? (Or, contrapositivly, 
when can the language be sure an aggregate type definitely has no 
invariants?)

1. If the type is an interface or non-final class type.
2. If the type has an explicit invariant block.
3. If the type has a member variable having a type with 
invariants.
4. If the type has padding bytes between member variables.
5. If the type has non-public member variables.

Rationales:
1. Types implementing an interface or inheriting from a class 
could have invariants.
4. For optimization, the compiler should be allowed to assume 
that padding bits are always zero, unless explicitly told the 
opposite. (Cf. assuming a bool is 0 or 1 always.) This is 
debatable.
5. Even if no invariants are stated, the fact that some members 
are encapsulated in some way is a clear indication that an 
invariant likely exists. There are counter-examples, like a 
wrapper that's logging access to its only member using getter and 
setter.

Contrapositive formulation: The language can be (reasonably) 
certain that no invariants exist in an aggregate type when:
1. If the type is a class type, it is final, --- and
2. it has no invariant block, --- and
3. no member variable's type has invariants, --- and
4. the type has no padding bits, --- and
5. all member variables are public (i.e. anyone anywhere could 
write them).

If your type truly has no invariants, but fails condition 4, you 
can introduce ubyte[n] member variables that name the padding. In 
a sense, those padding arrays are implicitly private when 
compiler-generated, i.e. failing condition 5.
If your type truly has no invariants, but fails condition 5, it 
could be mitigated by allowing

     @disable invariant;

to indicate that no implicit invariant arises from private 
members.

The compiler can recognize certain overlappings as valid although 
by the rules stated, they are not. An example would be: 
Overlapping T* and S* where T and S have the same size and both 
have no invariants. The second condition is important; otherwise, 
overlapping could be used to circumvent e.g. T's invariants using 
S which has no or different invariants.