Attribute promises vs inference rules

Wed Apr 17 17:06:29 UTC 2024

The spec is rather detailed on what operations are valid and 
invalid in functions that are annotated `@safe`, `@nogc`, `pure`, 
and/or `nothrow`. However, there is a difference between 
operations that make the compiler error when an attribute is 
specified and an invalid operation is used – or (equivalently) 
make the compiler not infer the attribute in a context where 
attributes are inferred – and operations that violate the 
promises the attributes make.

Example:
```d
int x;

bool f(int* p) pure @safe
{
     return p is &(x); // Error: `pure` function `f` cannot access 
mutable static data `x`
}
```
This is not a bug: Indeed, `f` accesses `x` and indeed `x` is 
mutable data. Only by pure happenstance, `f` only uses the 
address of `x` which isn’t mutable, and never its value which is 
mutable. If it stored `&x` in a local, it could write to `x`. The 
fact that `f` doesn’t do that means that `f` is “morally” pure, 
but it’s not recognized as `pure` by the attribute spec. Don’t 
get me wrong, the spec could be changed so that accesses like 
this would be allowed, but currently, it doesn’t, which serves as 
a great example.

So, what about this:
```d
int x;

bool g(int* p) pure @safe
{
     static impl(int* p) @safe { return p is &(x); }
     enum pure_impl = () @trusted { return cast(bool function(int* 
p) pure @safe)&(impl); }();
     return pure_impl(p);
}
```

A cast that adds function attributes isn’t allowed by `@safe`, 
but we have `@trusted` for that. The question now is: Is it 
defined behavior if I cast `&impl` to `pure` using an explicit 
cast? I don’t know and I also don’t know where to look. The 
second one is an issue for D.

Let’s look at each attribute individually, in the order of (what 
I presume) the easiest to the hardest to answer.

### What is morally `@nogc`?

My sense is: If it doesn’t allocate on the GC. Even if a function 
can allocate conditionally, if you can ensure it won’t, you’re 
good. Probably. The spec doesn’t say it, but anything else would 
be a big, big surprise.

### What is morally `@safe`?

This attribute has the best answer because the question is 
essentially: What can be annotated `@trusted`? It has no simple 
answer, but at least there are discussions around it. Also, 
because `@trusted` exists, such questions are easy to phrase.

### What is morally `nothrow`?

What `nothrow` is about can be readily guessed. It’s not actually 
“cannot throw [anything]”, but rather “cannot throw 
`Exception`s”. Close enough. In all honesty, I don’t know what is 
“morally `nothrow`”, but if you asked me: “Function `foo` is not 
annotated `nothrow`, but it simply won’t throw exceptions, can I 
cast `&foo` to `nothrow`?” I’d answer: “Probably yes, but better 
use 
[`assumeWontThrow`](https://dlang.org/library/std/exception/assume_wont_throw.html).”

There could be some messy details, though. A `throw` function can 
fail recoverably, so it must be called in a way that supports 
stack unwinding; a function that can’t fail recoverably doesn’t. 
It might be an issue, I don’t know.

### What is morally `pure`?

It’s not clear at all what `pure` promises exactly and what it 
doesn’t. Contrast this to `nothrow` and especially `@nogc`, where 
it might just be a single spec paragraph that’s missing. It may 
seem as easy as: It doesn’t access mutable data. Remember the 
initial example? It’s not so easy. Even if it were, the 
guarantees that follow from “it doesn’t access mutable data” are 
manifold: Unique construction (by a `pure` function that meets 
some other criteria) allows implicit casts from mutable to 
`immutable`. Some `pure` functions may be cached without one 
being able to observe the difference. Some `pure` functions may 
be run in parallel without requiring synchronization and other 
fancy stuff.

Also consider GC allocation. A `pure` function is explicitly 
allowed to allocate on the GC heap (unless it’s also `@nogc` of 
course, but that’s orthogonal). How is that possible? The GC heap 
is definitely global state!

Now, one could argue that there is only one GC, therefore every 
(`pure`) function morally has a hidden parameter that provides 
access to the GC, and a `pure` function may access a global 
variable through a parameter. (In a sense, what `@nogc` morally 
does (to a `pure` function) is remove this hidden parameter.) If 
we’re comfortable arguing like that in the general case, the 
rules of `pure` aren’t as trivial anymore. What about custom 
global-state APIs that could be modeled similar to the GC?

What conditions does a global-state API have to meet such that 
access to it is well-defined in a `pure` function? In my 
estimation, nobody knows.

### Conclusion

For two of the four attributes, a spec paragraph is warranted. 
For `@safe`, it’s already an ongoing quest to extend as much 
UB-free code into the domain of `@safe`. For `pure`, there’s a 
whole discussion pending of what should count as “morally pure”, 
which casts are to `pure` are UB-free. This can be considered 
part of the `@safe` discussion.

As for positions, there’s one extreme point: _Morally `pure` is 
only what could have been annotated `pure` without change._ This 
is probably a good starting point from a theoretical standpoint, 
i.e. the spec could be explicit about it and say: “A pointer to a 
function that isn’t annotated `pure` can be cast to a function 
pointer type that’s additionally annotated `pure` if the pointee 
function could have been annotated `pure` i.e. the programmer 
merely ‘forgot’ to annotate where it was possible.” But what 
about `f` from the initial example? It cannot be annotated 
`pure`. Do we want to exclude it? That doesn’t seem very 
practical. It would mean that `g` introduces UB and pose the 
question: When exactly does `g` enter UB? Is the cast already UB 
or does the ill-cast function have to be called?