[Issue 21565] @safe code allows modification of a scalar that overlaps with a pointer

Thu Jan 21 22:01:48 UTC 2021

https://issues.dlang.org/show_bug.cgi?id=21565

--- Comment #15 from Steven Schveighoffer <schveiguy at gmail.com> ---
(In reply to Paul Backus from comment #14)
> (In reply to Steven Schveighoffer from comment #12)
> > 
> > On the grounds that it's not desirable. It does not cause undefined
> > behavior, just useless behavior. We are better off disallowing it.
> 
> "I don't like it" is not a technical argument, and should have no place in a
> technical discussion.

That's not my argument.

> > What does this mean? All individual values are safe according to D.
> 
> If you really believe this, then you do not understand D's memory-safety
> system well enough to contribute usefully to this discussion, and I am
> wasting both my time and yours by continuing to respond.

Basic types and pointers are all accessible using @safe. I can access int, int
* perfectly fine in @safe code. It's the aliasing of the two that is a problem.

Frankly I think you are misinterpreting what I'm saying, or I am doing the same
for you. Wasting time might definitely be what you are doing.

> 
> > It's not about being @safe or not. That's why I said the rules are sound.
> > It's just that the rules leave us with the reality that using such unions
> > usable in @safe or @trusted code has no utility.
> 
> If it's "not about being @safe or not", then what on Earth *is* it about?

The whole point of @safe is to avoid code review. Otherwise it's a glamorized
linter. If you have to review @safe code to make sure things outside the safe
code are actually memory safe, then you have lost the battle.

Imagine that D does not have builtin slices (and get rid of the rules safe
defines around them). Then you need a structure to pass slices into a @trusted
function:

struct Array(T)
{
   T* ptr;
   size_T length;
}

A @trusted function that accepts this type has 2 options:
1. it can't do ANYTHING with the data beyond the one value pointed at by ptr,
because @safe code is allowed to set length to anything it wants.
2. It can use the data beyond the first element, but then you have to review
all @safe functions that call it.

It's the fact that the compiler disallows mutable access to length that we can
reason about what this semantically means as a parameter to a @trusted
function. Therefore, I don't have to review any array-using safe code for
memory safety because I know that the semantic invariant is held by the
compiler.

Likewise, a union of an int and an int * semantically MUST mean int today in
safe code AND TRUSTED CODE. If you access the int * after any safe code has run
with it, it must be considered memory unsafe.

So for instance:

struct S
{
  union X {
   int x;
   int *y;
  }
  X val;

  @safe {
     void a();
     void b();
     void c();
     void d();
  }
  @trusted void e() { /* use val.y */ }
}

How do you review that the usage of val.y is safe? the answer is: you review a,
b, c, d, in addition to e. Now you are reviewing safe code to make sure it's
safe in the context of val. This is useless. val might as well be an int, or a,
b, c, d might as well be marked trusted. So the logical conclusion is, e cannot
use val.y. And if it cannot use it, then what's the point of having it?

If we know that a, b, c, and d can never set the value of val.x or val.y, then
we don't have to review them at all. Now we are only reviewing e, which is the
intent of D's safety system.

I'm not arguing that the current implementation is unsafe, just that the
current semantic guarantees make using such unions pointless in the context of
safe/trusted code. The point of a union is to use all the members in it. If
there is one member that cannot be used, then it shouldn't be part of the
union.

> Personally, I think @safe should allow all code that the compiler can prove
> is memory-safe, regardless of whether you, I, or anyone else thinks it "has
> utility" or not.

The @safe rules provide a framework for proof of memory safety where we can
avoid reviewing whole sections of code. The compiler isn't proving safety, it's
just enforcing rules. We create the rules to make sure memory safety cannot be
violated even without a careful review of certain functions AND that
@safe/@trusted code is reasonable to write with those rules.

For example, let's say we changed the @safe rules to say only arrays and
references are allowed to be dereferenced in @safe code, never pointers. Now,
safe code can read and write pointers, even write arbitrary values. That's
perfectly safe. And perfectly useless. Isn't that just asking to make trusted
code even less safe? How does one use @trusted code with a pointer when you can
never know if the safe code that passed it to you has just set arbitrary
values? What do we gain as a language by allowing setting pointers up as
garbage in @safe code?

Such a rule would mean, pointers are safe to use in @trusted functions as long
as you don't use them as pointers, only as bits. This is the same rule we are
talking about. I don't see why the rule is desirable, and I am surprised that
this is a controversial position.

--