DIP1000: 'return scope' ambiguity and why you can't make opIndex work
Dennis
dkorpel at gmail.com
Fri Jun 18 15:44:02 UTC 2021
You may have seen my previous dip1000 posts:
- [dip1000 + pure is a DEADLY
COMBO](https://forum.dlang.org/thread/jnkdcngzytgtobihzggj@forum.dlang.org)
- [DIP1000: The return of 'Extend Return Scope
Semantics'](https://forum.dlang.org/thread/zzovywgswjmwneqwbdnm@forum.dlang.org)
Consider this part 3 in the "fixing dip1000 series", but it's
about a different bug.
### Background
dip25 and dip1000 are supposed to provide *simple* lifetime
tracking that's still good enough to be useful. In the previous
thread [Atila Neves
mentioned](https://forum.dlang.org/post/azdzxsxyuipovrlmbhbb@forum.dlang.org) that [Lifetime Annotations like in Rust](https://carols10cents.github.io/book/ch10-03-lifetime-syntax.html#lifetime-annotations-in-function-signatures) are to be avoided. Is it simple though?
[On Wednesday, 26 May 2021 at 15:29:32 UTC, Paul Backus
wrote:](https://forum.dlang.org/post/tsjffxwlwitaawiniztv@forum.dlang.org)
> Of course, D's vision here is severely hampered in practice by
> the poor quality of its documentation (raise your hand if you
> can explain what ["return ref parameter semantics with
> additional scope parameter semantics"][1] actually means). But
> that's the idea.
>
> [1]:
> https://dlang.org/spec/function.html#ref-return-scope-parameters
Working on dip1000 made me finally able to "raise my hand", so
here's how it works:
Function parameters of a type with pointers have three possible
lifetimes: infinite, scope, or return scope. You might have heard
that `scope` is "not transitive" and think that there's only one
layer to it. However, the key insight is that there's actually
*two layers* when `ref` comes into play: then the parameter's
*address* itself also has a lifetime in addition to the *value*.
It can be demonstrated with a linked list:
```D
@safe:
struct Node {
int x;
Node* next;
}
// First layer: returning the address of the node
int* get0(return ref Node node) {
return &node.x;
}
// Second layer: returning a value of the node
int* get1(ref return scope Node node) {
return &node.next.x;
}
// Third layer and beyond: this is where scope checking ends
int* get2(ref scope Node node) {
return &node.next.next.x;
}
```
The lifetimes are determined as follows:
| Lifetime | `ref` address | value of pointer type |
|---------------|-----------------------|-----------------------|
| infinite | never | default |
| current scope | default | with `scope` keyword |
| return scope | with `return` keyword | with `return scope` |
A few code examples:
```D
@safe:
int* v0( int* x) {return x;} // allowed, no lifetime
restrictions
int* v1(return int* x) {return x;} // allowed, returned
value is `scope`
int* v2( scope int* x) {return x;} // not allowed, x is
`scope`
int* v3(return scope int* x) {return x;} // allowed, equivalent
to v1
int* r0( ref int x) {return &x;} // not allowed, `ref` is
always scope
int* r1(scope ref int x) {return &x;} // not allowed, `scope`
does nothing here
int* r2(return ref int x) {return &x;} // allowed, return applies
to `ref`
```
As you can see, `scope` always applies to the pointer value and
not to the `ref`, since `ref` is inherently `scope`. No ambiguity
there. But what if we have a `ref int*`: does `return` apply to
the address of the `ref` or the `int*` value?
That's where those confusing lines from the specification come
in, which distinguishes "return ref semantics" and "return scope
semantics". It turns out there are three important factors:
whether the function's return type is `ref`, whether the
parameter is `ref`, and whether the parameter is annotated
`scope`. Here's a table:
**Does the `return` attribute apply to the parameter's `ref` or
the pointer value?**
| | `scope` | no `scope` |
|----------------------------------|-----------|------------|
| `ref` return type / `ref` param | **`ref`** | **`ref`** |
| value return type / `ref` param | **value** | **`ref`** |
| `ref` return type / value param | **value** | **value** |
| value return type / value param | **value** | **value** |
If you're still confused, I don't blame you: I'm still confusing
myself regularly when reading signatures with `return` and `ref`.
Anyway, is this difficulty problematic?
[On Wednesday, 15 May 2019 at 08:32:09 UTC, Walter Bright
wrote:](https://forum.dlang.org/post/qbgf95$2071$1@digitalmars.com)
> On 5/15/2019 12:21 AM, Dukc wrote:
>> Could be worth a try even without docs, but in the long run we
>> definitely need some explaining.
>
> True, but I've tried fairly hard with the error messages.
> Please post your experiences with them.
>
> Also, there shouldn't be any caveats with using it. If it
> passes the compiler, it should be good to go. (Much like const
> and pure.)
All you need to do is see if the compiler complains, try adding
`return` and/or `scope`, and see if the errors go away. Well...
```D
@safe:
struct S {
int x;
}
int* f(ref return scope S s) {
return &s.x; // Error: returning `&s.x` escapes a reference
to parameter `s`
// perhaps annotate the parameter with `return`
}
```
That's a confusing supplemental error, the parameter *is*
annotated `return`. The actual problem is that `return` applies
to the value, not the `ref` parameter, since there is no `ref`
return.
```D
struct T {
int x;
int* y; // <- pointer member added
}
int* g(ref return scope T t) {
return &t.x; // No error
}
```
And now the compiler accepts invalid code. Indeed, even the
compiler doesn't always know what the `return` storage class
actually applies to. See [bugzilla issue
21868](https://issues.dlang.org/show_bug.cgi?id=21868).
### The issue
While fixing [issue
21868](https://issues.dlang.org/show_bug.cgi?id=21868), the CI
uncovered that [dub package 'automem' relies on the current
accepts-invalid
behavior](https://github.com/dlang/dmd/pull/12665#issuecomment-858836483). Here's the reduced code:
```D
struct Vector {
float[] _elements;
ref float opIndex(size_t i) scope return {
return this._elements[i];
}
}
```
With the patch I made, the error becomes:
```
source/automem/vector.d(212,25): Error: scope parameter `this`
may not be returned
source/automem/vector.d(212,25): note that `return`
applies to `ref`, not the value
```
My new supplemental error message is working, yay! But how to fix
it?
One way is to pass the `Vector` by value instead of by reference,
but `opIndex` must be a member function to work as an operator
overload and member functions pass `this` by reference. Another
way is to return by value instead of by reference, but that means
accessing array elements introduces a copy, and `&vector[0]`
won't work anymore.
dip1000 simply can't express a 'return scope' `opIndex` returning
by `ref`.
So it turns out the double duty of the `return` storage class is
neither simple, nor expressive enough. Do you have any ideas how
to move forward, and express the `Vector.opIndex` method without
making the attribute soup worse? Keep in mind that dip25 (with
`return ref`) is already in the language, but dip1000 (with
`return scope`) is still behind a preview switch.
More information about the Digitalmars-d
mailing list