Escape Analysis & Owner Escape Analysis

Wed Sep 4 03:02:10 UTC 2024

On 04/09/2024 4:37 AM, Dennis wrote:
> On Tuesday, 3 September 2024 at 03:00:20 UTC, Richard (Rikki) Andrew 
> Cattermole wrote:
>> I've done an almost complete rewrite, I expect this to be close to the 
>> final version:
> 
> The description is getting clearer every revision, props for that. But 
> it's also becoming increasingly hard for me to rhyme the proposal with 
> the complaints of DIP1000.
> 
> Most of the DIP is spent on the 'multiple outputs' problem for separate 
> compilation, inventing a meticulous function signature syntax to capture 
> all kinds of possible assignments between parameters, globals, and the 
> return value. And while this does solve the limitation that a `swap` 
> function on `scope` values being impossible with DIP1000, it doesn't 
> address other woes:
> 
> ### Simple safe D
> 
> A common sentiment was "I don't care for @nogc @safe, keep the language 
> simple by just using the GC or go @system". While it may be hard to 
> believe for some, DIP1000 [is not a breaking change in 
> theory](https://forum.dlang.org/post/gnuekdxflffjhwlnnwqr@forum.dlang.org) and leaves GC-based code alone. This proposal however breaks @safe code by design - both DIP1000-based code using `scope` pointers (because of new syntax) and 'regular' GC-based code (because added @live-like semantics).
> 
> I agree @live being opt-in per function is unsound, but forcing 
> "effectively const" semantics everywhere in a new edition is not going 
> to please people just happily using the GC.

I can only see us going in one of two directions over this:

- Add a temporally safe D attribute that goes above ``@safe``, so that 
when you need it you have it, and when you don't you can use ``@safe`` 
instead.
- Add an effects system.

I don't care which of the two directions we go in, I've done an ideas 
post over the first. However I suspect the first is the one we as a 
community may like the best as it silo's the extra protection without 
forcing effects annotations on everyone else.

Mutation has the side effect of invalidating borrows, it's the only one 
we have, therefore only one in proposal.

It would be an easy enough swap to change ``@safe`` to ``@tsafe``. But 
that isn't a decision we need to make here. We can make that prior to 
launch.

But I do want to make a point here, owner escape analysis only kicks in 
and forces effectively const on the owner if:

1. You take a pointer to stack memory
2. You receive memory that has a strong relationship (perhaps done 
explicitly for reference counting!)
3. You take a pointer to a field of struct/class/union

The first two are already provided by DIP1000. That isn't new.
The third is new.

What matters about this, is as long as you are not doing pointer 
arithmetic (like taking a pointer, or by-ref), you can use GC memory 
freely without restriction. In a way its a hole in the design, but an 
intentional one as it makes for a very good user experience and doesn't 
really have a lot of down sides.

I was going to fill in that hole, but ``@system`` variables covers it 
enough that I kinda just went meh.

> ### Attribute soup
> 
> `return` and `scope` annotations are noisy / confusing, but this 
> proposal adds more and jumbles the existing ones in a way that's not 
> necessarily easier to understand. For a simple `int* f(int* x)` 
> function, the parameter attributes change in the following way**:
> 
> | DIP1000              | Escape Analysis                 |
> |----------------------|---------------------------------|
> | `return ref scope`   | `scope @escape(return)`         |
> | `return ref`         | impossible***                   |
> | `return scope`       | `@escape(return)`               |
> | `scope`              | `@escape()` / `scope`           |
> 
> It solves the `return scope` and `scope return` problem, but might have 
> problems of its own:
> - `scope` now means two unrelated things: 'strong relationship' and 
> 'default empty escape set'

This is the same meaning it has today with DIP1000. Just reworded. By 
itself it matches the definition prior to DIP1000 too.

So this is inherently well understood.

If you have a parameter or variable that is only ``scope`` it may still 
compile with this proposal without changes. If it doesn't go awry of 
owner escape analysis and doesn't compile, I'd like to know!

> - `@escape` is the opposite of `@escape()`, which could be confusing

Originally I was going to make this to mean 'inferred', but it's better 
if everything gets inferred by default.

It needs to mean something, so got an alternative?

> ** I might be wrong, but if so, that really doesn't bode well for the 
> 'communicability' aspect of the lifetime attributes, which the DIP tries 
> to address

With DIP1000, the attribute elicits both the strength and the escape set 
in the same attribute, with this it does not.

``@escape`` tells you where it can go, ``scope`` upgrades the 
relationship to a strong one.

Giving ``scope`` a default escape set is to allow it to match existing 
understanding, which does help with communicability.

So I do disagree with the statement that this is not aiding in 
communicability, its a lot easier to communicate one thing per 
attribute, rather than trying to communicate two things. With subtle 
differences between similarly looking ones.

> *** That's what I take from "Error the parameter `ptr` cannot have an 
> escape set that includes `__unknown` and be marked as having a strong 
> relationship `scope`"

Yes you are correct.

It inherently describes that there is an owner of the pointer being 
passed in and that it needs to be protected (somehow).

If you were allowed to take a pointer to a by-ref variable and then 
store it some place you are most likely escaping a pointer. And that 
would not be a good thing. This should not be allowed in ``@safe``, and 
if it does that's a bug.

> ### Composability with respect to structs
> 
> Explicitly unaddressed
> 
>> Elements in an array, fields in a class/struct/union are conflated 
>> with the variable that stores them in.

I'm going to need an example of what you think is not addressed here.

 From my perspective the field gets conflated with its containing 
instance variable and that covers composability.

> ### Transitive scope
> 
> Not mentioned.

``scope`` is not transitive, at least as far as the language knows 
transitive to mean.

Taking a value out of a field of a struct would establish a weak 
relationship between the resulting variable and the containing struct 
instance variable.

```d
struct S {
	int* field;
}

void handle(int* ptr) {
	S s;
	s.field = ptr;
	// @escape(s) ptr
}
```

```d
struct Input {
	int* ptr;
	int field;
}

Input input1 = ...;
int* output1 = &input1.field; // has a strong relationship between 
`output1` and `input1`

scope Input input2 = ...;
int* output2 = input2.ptr; // has a strong relationship between 
`output2` and `input2`
```

This works because a weak relationship can be upgraded to a strong 
relationship, without the function being annotated as such based upon 
the argument.

As a result cross-function guarantees are maintained and therefore 
transitively.

Okay this needs elaborating.

"The attribute ``scope`` is not transitive. Instead it relies upon 
cross-function analysis to make guarantees for fields access/mutation 
and function calls. If any expression causes an output to exist, this 
will inherently have a strong relationship and therefore can be typed as 
``scope``."

Uploaded.

> All in all, I feel the DIP is too focussed on addressing one issue 
> (multiple outputs) while neglecting others. The most pressing issue is 
> that many people simply don't want D to become like Rust. DIP1000 and 
> @live at least leave 'regular' GC-based D mostly alone: just don't take 
> the address of local variables in `@safe` functions and you're good. It 
> would be really good if whatever 'escape analysis' D ends up boasting 
> (if any), it would be for the benefit of specialized library types (e.g. 
> `RefCounted(T)`) without complicating common pointer/array operations in 
> `@safe` code.

I focus upon multiple outputs, because to make flattening to a function 
signature to work, you have to do this. If you don't you are not going 
to model enough code, and will be going against the literature on this 
subject making it harder to use.

Pointer arithmetic is already disallowed in ``@safe``, in a lot of ways 
_any_ taking of a pointer is unsafe without some form of escape 
analysis. This makes it safe to do both consistently.

I don't know how we could make pointers safer without throwing owner 
escape analysis at it.