move+forward as intrinsics, incl. revised forward semantics for perfect forwarding
kinke
noone at nowhere.com
Sun Oct 13 10:43:27 UTC 2024
IMO we need to make `core.lifetime.{move,forward}` compiler
intrinsics, to enable further optimizations that aren't possible
with a library solution.
#### Move
* semantics: move an lvalue to a new rvalue, at a new memory
address, 'hijacking' the lvalue resources; the lvalue is reset to
T.init (blit, not assignment!) afterwards
* will be complete with move ctor; syntax needs to be decided,
but signature is `(ref T)` (yes, must be an explicit ref)
* allows to opt out of the default blit (memcpy struct
payload), e.g., to fix up interior pointers
* move ctor interop with C++ should be doable (just getting the
extern(C++) mangle right)
* problem: handle/avoid all compiler-implicit moves/blits
(would have to call move ctor and dtor now; emplace FTW!)
* would be nice as intrinsic:
* not to have to import `core.lifetime` everywhere and end up
with complicated template bloat for a basically trivial operation
* potential optimization: elide lvalue reset to T.init and its
destruction iff:
* it is a local (can skip destruction)
* and not used after the move
* and the destruction of T.init is a noop (modulo mods to the
struct's own payload), so its elision not observable
#### When move isn't sufficient: perfect forwarding
forward must become an intrinsic:
* for vars with `ref` storage class: as-is, yields the original
lvalue
* non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no
move, and accordingly no destruction after forwarding (because
the rvalue will already be destructed earlier)
* only valid for locals (incl. params), the destruction of
other lvalues cannot be skipped
* invalid/undefined to access the original lvalue after
forwarding it (has been destructed already)
* probably only valid:
* as function call argument expressions (glue layer needs to
treat it like a frontend-generated temporary, passing it directly
by ref)
* as assignment right-hand-sides, for move-assign (`dst =
forward!src;` => `dst.opAssign(forward!src);`)
* as return expressions, for move-constructions (but prefer
NRVO if possible, for direct emplace)
* probably needs to keep template syntax (`forward!x`, not
`forward(x)`) for backwards compatibility with druntime template
Let's take a look at an example:
```D
import core.stdc.stdio;
import core.lifetime;
struct S {
int x;
this(int x) {
this.x = x;
printf("ctor: %p\n", &this);
}
this(this) {
printf("copy: %p\n", &this);
}
~this() {
printf("dtor: %p\n", &this);
}
}
void main() {
{
auto lval = S(1);
printf("lval: %p\n", &lval);
const r = bar1(lval);
printf(" r: %p\n", &r);
}
{
printf("\nrvalue:\n");
const r = bar1(S(2));
printf(" r: %p\n", &r);
}
}
S bar1()(auto ref S s) {
printf("bar1: %p\n", &s);
return bar2(forward!s);
}
S bar2()(auto ref S s) {
printf("bar2: %p\n", &s);
return bar3(forward!s);
}
S bar3()(auto ref S s) {
printf("bar3: %p\n", &s);
return bar4(forward!s);
}
S bar4()(auto ref S s) {
printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
return s; // copy parameter lvalue to return value
}
```
Output with DMD (and GDC), no backend optimizations:
```
ctor: 0x7ffebea26460
lval: 0x7ffebea26460
bar1: 0x7ffebea26460
bar2: 0x7ffebea26460
bar3: 0x7ffebea26460
bar4: 0x7ffebea26460, got a ref: 1
copy: 0x7ffebea263d0
r: 0x7ffebea26464
dtor: 0x7ffebea26464
dtor: 0x7ffebea26460
rvalue:
ctor: 0x7ffebea2647c
bar1: 0x7ffebea26488
bar2: 0x7ffebea26424
bar3: 0x7ffebea263e4
bar4: 0x7ffebea263a4, got a ref: 0
copy: 0x7ffebea26358
dtor: 0x7ffebea263a4
dtor: 0x7ffebea263e4
dtor: 0x7ffebea26424
dtor: 0x7ffebea26488
r: 0x7ffebea26478
dtor: 0x7ffebea26478
```
What we see is that current `core.lifetime.forward` propagates
the ref-ness of the parameter, but has to `core.lifetime.move` it
in the non-ref case, creating 3 explicit moves + destructions.
We also see that there are compiler-implicit moves ('optimized',
i.e., no reset+destruction of the moved-from value):
* when passing the `S(2)` rvalue to `bar1` (not sure why, seems
like a bug) - note the different addresses of `ctor` and `bar1`
* for the return values - the addresses of `copy` and `r` diverge
(constructed @ 0x7ffebea26358, destructed @ 0x7ffebea26478)
With LDC, we at least already get perfectly forwarded return
values (the addresses of `copy` and `r` are identical):
```
ctor: 0x7ffda922edbc
lval: 0x7ffda922edbc
bar1: 0x7ffda922edbc
bar2: 0x7ffda922edbc
bar3: 0x7ffda922edbc
bar4: 0x7ffda922edbc, got a ref: 1
copy: 0x7ffda922edb8
r: 0x7ffda922edb8
dtor: 0x7ffda922edb8
dtor: 0x7ffda922edbc
rvalue:
ctor: 0x7ffda922eda0
bar1: 0x7ffda922ed6c
bar2: 0x7ffda922ed1c
bar3: 0x7ffda922eccc
bar4: 0x7ffda922ecc8, got a ref: 0
copy: 0x7ffda922eda4
dtor: 0x7ffda922ecc8
dtor: 0x7ffda922eccc
dtor: 0x7ffda922ed1c
dtor: 0x7ffda922ed6c
r: 0x7ffda922eda4
dtor: 0x7ffda922eda4
```
The compiler needs to implement RVO (Return Value Optimization,
different to Named-RVO!) to enable perfect forwarding of the
return values. In this example, `r` is allocated in `main`, then
its address passed and forwarded as hidden pointer all the way to
`bar4`, where it gets copy-constructed.
With the proposed `forward` semantics, we'd get perfect
forwarding of the `s` parameters too, without the 3 explicit
moves and destructions. The `S(2)` rvalue would be created in
`main`, then passed and forwarded directly by ref all the way to
`bar4`, where it would get destructed when the `s` param goes out
of scope.
#### Cherry on top: Last-use optimization from DIP 1040
This would make the compiler automatically `forward` suited
lvalues. In the example, we wouldn't have to use a single
explicit `forward` in the `barN` trampolines, *and* the
copy-construction of the return value in the non-ref version of
`bar4` would be optimized to a move-construction (`return
forward!s`).
More information about the Digitalmars-d
mailing list