move+forward as intrinsics, incl. revised forward semantics for perfect forwarding

Sun Oct 13 10:43:27 UTC 2024

IMO we need to make `core.lifetime.{move,forward}` compiler 
intrinsics, to enable further optimizations that aren't possible 
with a library solution.

#### Move

* semantics: move an lvalue to a new rvalue, at a new memory 
address, 'hijacking' the lvalue resources; the lvalue is reset to 
T.init (blit, not assignment!) afterwards
* will be complete with move ctor; syntax needs to be decided, 
but signature is `(ref T)` (yes, must be an explicit ref)
   * allows to opt out of the default blit (memcpy struct 
payload), e.g., to fix up interior pointers
   * move ctor interop with C++ should be doable (just getting the 
extern(C++) mangle right)
   * problem: handle/avoid all compiler-implicit moves/blits 
(would have to call move ctor and dtor now; emplace FTW!)
* would be nice as intrinsic:
   * not to have to import `core.lifetime` everywhere and end up 
with complicated template bloat for a basically trivial operation
   * potential optimization: elide lvalue reset to T.init and its 
destruction iff:
     * it is a local (can skip destruction)
     * and not used after the move
     * and the destruction of T.init is a noop (modulo mods to the 
struct's own payload), so its elision not observable

#### When move isn't sufficient: perfect forwarding

forward must become an intrinsic:
* for vars with `ref` storage class: as-is, yields the original 
lvalue
* non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no 
move, and accordingly no destruction after forwarding (because 
the rvalue will already be destructed earlier)
   * only valid for locals (incl. params), the destruction of 
other lvalues cannot be skipped
   * invalid/undefined to access the original lvalue after 
forwarding it (has been destructed already)
   * probably only valid:
     * as function call argument expressions (glue layer needs to 
treat it like a frontend-generated temporary, passing it directly 
by ref)
     * as assignment right-hand-sides, for move-assign (`dst = 
forward!src;` => `dst.opAssign(forward!src);`)
     * as return expressions, for move-constructions (but prefer 
NRVO if possible, for direct emplace)
* probably needs to keep template syntax (`forward!x`, not 
`forward(x)`) for backwards compatibility with druntime template

Let's take a look at an example:
```D
import core.stdc.stdio;
import core.lifetime;

struct S {
     int x;

     this(int x) {
         this.x = x;
         printf("ctor: %p\n", &this);
     }

     this(this) {
         printf("copy: %p\n", &this);
     }

     ~this() {
         printf("dtor: %p\n", &this);
     }
}

void main() {
     {
         auto lval = S(1);
         printf("lval: %p\n", &lval);
         const r = bar1(lval);
         printf("   r: %p\n", &r);
     }

     {
         printf("\nrvalue:\n");
         const r = bar1(S(2));
         printf("   r: %p\n", &r);
     }
}

S bar1()(auto ref S s) {
     printf("bar1: %p\n", &s);
     return bar2(forward!s);
}

S bar2()(auto ref S s) {
     printf("bar2: %p\n", &s);
     return bar3(forward!s);
}

S bar3()(auto ref S s) {
     printf("bar3: %p\n", &s);
     return bar4(forward!s);
}

S bar4()(auto ref S s) {
     printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
     return s; // copy parameter lvalue to return value
}
```

Output with DMD (and GDC), no backend optimizations:
```
ctor: 0x7ffebea26460
lval: 0x7ffebea26460
bar1: 0x7ffebea26460
bar2: 0x7ffebea26460
bar3: 0x7ffebea26460
bar4: 0x7ffebea26460, got a ref: 1
copy: 0x7ffebea263d0
    r: 0x7ffebea26464
dtor: 0x7ffebea26464
dtor: 0x7ffebea26460

rvalue:
ctor: 0x7ffebea2647c
bar1: 0x7ffebea26488
bar2: 0x7ffebea26424
bar3: 0x7ffebea263e4
bar4: 0x7ffebea263a4, got a ref: 0
copy: 0x7ffebea26358
dtor: 0x7ffebea263a4
dtor: 0x7ffebea263e4
dtor: 0x7ffebea26424
dtor: 0x7ffebea26488
    r: 0x7ffebea26478
dtor: 0x7ffebea26478
```

What we see is that current `core.lifetime.forward` propagates 
the ref-ness of the parameter, but has to `core.lifetime.move` it 
in the non-ref case, creating 3 explicit moves + destructions.

We also see that there are compiler-implicit moves ('optimized', 
i.e., no reset+destruction of the moved-from value):
* when passing the `S(2)` rvalue to `bar1` (not sure why, seems 
like a bug) - note the different addresses of `ctor` and `bar1`
* for the return values - the addresses of `copy` and `r` diverge 
(constructed @ 0x7ffebea26358, destructed @ 0x7ffebea26478)

With LDC, we at least already get perfectly forwarded return 
values (the addresses of `copy` and `r` are identical):
```
ctor: 0x7ffda922edbc
lval: 0x7ffda922edbc
bar1: 0x7ffda922edbc
bar2: 0x7ffda922edbc
bar3: 0x7ffda922edbc
bar4: 0x7ffda922edbc, got a ref: 1
copy: 0x7ffda922edb8
    r: 0x7ffda922edb8
dtor: 0x7ffda922edb8
dtor: 0x7ffda922edbc

rvalue:
ctor: 0x7ffda922eda0
bar1: 0x7ffda922ed6c
bar2: 0x7ffda922ed1c
bar3: 0x7ffda922eccc
bar4: 0x7ffda922ecc8, got a ref: 0
copy: 0x7ffda922eda4
dtor: 0x7ffda922ecc8
dtor: 0x7ffda922eccc
dtor: 0x7ffda922ed1c
dtor: 0x7ffda922ed6c
    r: 0x7ffda922eda4
dtor: 0x7ffda922eda4
```

The compiler needs to implement RVO (Return Value Optimization, 
different to Named-RVO!) to enable perfect forwarding of the 
return values. In this example, `r` is allocated in `main`, then 
its address passed and forwarded as hidden pointer all the way to 
`bar4`, where it gets copy-constructed.

With the proposed `forward` semantics, we'd get perfect 
forwarding of the `s` parameters too, without the 3 explicit 
moves and destructions. The `S(2)` rvalue would be created in 
`main`, then passed and forwarded directly by ref all the way to 
`bar4`, where it would get destructed when the `s` param goes out 
of scope.

#### Cherry on top: Last-use optimization from DIP 1040

This would make the compiler automatically `forward` suited 
lvalues. In the example, we wouldn't have to use a single 
explicit `forward` in the `barN` trampolines, *and* the 
copy-construction of the return value in the non-ref version of 
`bar4` would be optimized to a move-construction (`return 
forward!s`).