@safe accessing of union members

Q. Schroll qs.il.paperinik at gmail.com
Thu Mar 18 18:24:21 UTC 2021


On Wednesday, 17 March 2021 at 18:01:43 UTC, Paul Backus wrote:
> On Wednesday, 17 March 2021 at 16:43:25 UTC, Q. Schroll wrote:
>> There was a discussion lately about overlapping pointers in 
>> unions, @safe and SumType.
>>
>> Obvious results: It is @safe overlapping some types with 
>> others and accessing them in @safe code and it is un- at safe to 
>> do so with other types.
>>
>> I had an insight that I wanted to share: invariants. A type 
>> has invariants if there could be bit-patterns that are invalid.
>
> Yes. This is exactly what the "Background" section of the DIP 
> 1035 [1] was trying to say.

I missed that. Using DIP 1035's terms, union members must be 
implicitly @system if they have invariants (term "invariant" in 
the sense of the DIP, which could be checked the conditions I 
stated).

Reading DIP 1035 that you co-authored, I figured my notion of a 
"type that has invariants" could be helpful. In an example of the 
DIP, there's a void initialization presented as a reason why a 
type called ShortString is not memory safe. If you look at my 
definition of "type with invariants", ShortString would be 
considered a type with invariants because it has private 
variables (and has no @disable invariant).
As the language currently correctly specifies, a pointer cannot 
be void initialized in @safe code. Why? Because a pointer has 
invariants and those could be broken using void initialization. 
ShortString also has invariants, therefore void initializing one 
cannot be deemed @safe.
The DIP unfortunately nowhere states how/where to use a @system 
variable in the introductory example code. That would be helpful.

Where the DIP and this idea digress is fixing something vs 
introducing something (that among other things, helps @trusted 
review). While @system variables are useful for global variables 
for sure, I think for types like the DIP introductory examples, 
@system variables aren't really a solution to the presented 
problem. @safe should error if a clueless programmer writes and 
uses it and accidentally introduces UB. This includes writing 
@trusted functions that are properly written. "If 
ShortString.length could be marked as @system, this dilemma would 
not exist." While true, it is not obvious why a clueless 
programmer would mark `length` @system. The only part of the 
program that looks fishy is the @trusted function and that one 
cannot be changed to the better. Maintaining the invariant of an 
aggregate type necessarily includes auditing the whole module 
which has access to its private data. Annotating member variables 
@system can help with that reducing the audit to @trusted and 
@system functions in the module. Unless @system becomes the 
default for member variables, it cannot be relied upon for cases 
like ShortString. The DIP points that out in Example: 
User-Defined Slice: "Instead, every function that touches ptr and 
length, including the @safe constructor, must be manually 
checked."

I missed void initialization in my post, but interestingly, void 
initialization of a type T object is @safe if and only if in the 
`union { T obj; ubyte[T.sizeof] bytes; }` it is valid to 
initialize `bytes` arbitrarily and use `obj`.

>> Breaking invariants incurs undefined behavior
>
> Not necessarily. The statement
>
>     int* p = cast(int*) 0xDEADBEEF;
>
> ...does not have undefined behavior.

The spec says you're wrong, at least for structs and classes:
"If the invariant does not hold, then the program enters an 
invalid state."
-- https://dlang.org/spec/struct.html#StructInvariant
-- https://dlang.org/spec/class.html#invariants

> You only get undefined behavior if you actually dereference `p`.

Even if that were the case, it'd be irrelevant (it's the spec 
that decides when UB is encountered, not what one compiler 
implementation does). You cannot make dereferencing a pointer 
@system (in general), but assigning a value for that is 
(probably) invalid to dereference. That's what D currently does 
and it is the right choice IMO. The cast in your code is @system, 
not the assignment.

> In more general terms: undefined behavior doesn't come from the 
> values themselves, but from specific *operations* on those 
> values.

Technically yes, but that's practically irrelevant as pointed out 
earlier.

> The purpose of an invariant is to specify what conditions are 
> necessary to ensure defined behavior for a given operation.

I'd say, an invariant is a condition (mainly on an object) such 
that the specified behavior of that object cannot be guaranteed 
if that condition is false. Maybe this is mere semantics and you 
basically meant what I said. I view invariants not through 
operations but object state. Operations have pre- and 
post-conditions and an object's invariants are post-conditions 
for every operation and may be pre-conditions for any operation.

>> So, what kinds of union uses are @safe?
>> Answer: If all members of the union have no invariants.
>
> This is overly narrow.

Never did I say it's an equivalence. There are cases like SumType 
that use a union internally and even if a member has an 
invariant, access is valid, because SumType has invariants that 
ensure that reading a member only happens after that member was 
the one assigned last.

> Unions themselves have no invariants,

They can have them. SumType is an example. While SumType is 
(probably) implemented as a struct with a union member, it could 
be a union of structs: int × ∪Ts = ∪(int × Ts).
You can literally put invariant blocks in unions.

> even when their members do, because access to those members is 
> forbidden in @safe code,

This is wrong. Accessing union members is sometimes considered 
@safe currently although it clearly isn't. The compiler detects 
pointer overlappings as @system, but doesn't for any other 
invariants types have. What I tried to convey in this post is: 
pointers having a valid-to-dereferece value is an invariant, that 
the @safe mechanic considers, but it does not consider any other 
form of invariants.

> and there is no operation you can perform on a union instance 
> *as a whole* in @safe code whose behavior is potentially 
> undefined.

A union as a whole, i.e. not accessing a member, is near useless.

> Some examples of things you can do with a union that are always 
> @safe, regardless of its members:
>
>     union U { int* ptr; int num; }
>
>     // Initialization is always @safe
>     U a = { num: 123 };
>     U b = { ptr: new int };
>     // Copying is always @safe
>     U c = b;
>     // Bitwise comparison is always @safe
>     assert(c is b);
>     // Casting memory to const(ubyte) is always @safe
>     writefln("Raw bytes: %(%02X %)", 
> *cast(const(ubyte)[U.sizeof]*) &c);

The case I'm talking about is accessing union members. You're 
digressing.

I had a section in a draft of my post pointing out that reading a 
member that has been written last (needs control-flow analysis in 
general) is okay. I removed it because I deemed it obvious. 
Bit-wise reading is also obviously a non-problem.

>> When should the language (conservatively) assume an aggregate 
>> type (struct, class, etc.) has invariants?
>
> The rules in the language spec [2] are mostly correct in this 
> regard, though they leave out `bool` (and enum types, though 
> that's a more debatable issue).

Notice that "mostly correct" in a formal setting is a euphemism 
for "wrong". The case with bool is an instance of the problem.
In my opinion, void-initializing a bool should be @system. If the 
language specifies that a bool can have any bit-pattern, every 
use of a bool b (unless proven by some form of value-range 
propagation) must be checked for b > 1. This is obviously 
nonsense. In this case, we can just deprecate bool and use ubyte 
instead. At least, the language would be honest that way.

> [1] 
> https://github.com/dlang/DIPs/blob/c39f6ac62210e0604dcee99b0092c1930839f93a/DIPs/DIP1035.md#background
> [2] https://dlang.org/spec/function.html#safe-values

Sorry for the long post.


More information about the Digitalmars-d mailing list