move+forward as intrinsics, incl. revised forward semantics for perfect forwarding
Timon Gehr
timon.gehr at gmx.ch
Sun Oct 13 21:26:24 UTC 2024
On 10/13/24 12:43, kinke wrote:
> IMO we need to make `core.lifetime.{move,forward}` compiler intrinsics,
> to enable further optimizations that aren't possible with a library
> solution.
> ...
Thanks for writing this up! I think this is a good starting point, but I
would make some small tweaks.
> #### Move
>
> * semantics: move an lvalue to a new rvalue, at a new memory address,
> 'hijacking' the lvalue resources; the lvalue is reset to T.init (blit,
> not assignment!) afterwards
Makes sense, though if the compiler can determine that something is a
last use, it can optimize out the address change.
> * will be complete with move ctor; syntax needs to be decided, but
> signature is `(ref T)` (yes, must be an explicit ref)
I can see either idea work here. What is most important is that it is in
fact treated as a constructor.
I guess the benefit of `this(S)` is uniformity with `this(ref S)`, and
the benefit of `=this(ref S)` or `opMove(ref S)` is that it is obvious
that the destructor will be called by the caller, potentially much later.
> * allows to opt out of the default blit (memcpy struct payload),
> e.g., to fix up interior pointers
> * move ctor interop with C++ should be doable (just getting the
> extern(C++) mangle right)
> * problem: handle/avoid all compiler-implicit moves/blits (would have
> to call move ctor and dtor now; emplace FTW!)
> * would be nice as intrinsic:
> * not to have to import `core.lifetime` everywhere and end up with
> complicated template bloat for a basically trivial operation
> * potential optimization: elide lvalue reset to T.init and its
> destruction iff:
> * it is a local (can skip destruction)
> * and not used after the move
> * and the destruction of T.init is a noop (modulo mods to the
> struct's own payload), so its elision not observable
> ...
Well, as I alluded to earlier, I think in such cases the object should
just keep its original address and the move constructor does not need to
be called at all. It reduces to a safe version of `__rvalue` in this case.
> #### When move isn't sufficient: perfect forwarding
>
> forward must become an intrinsic:
> * for vars with `ref` storage class: as-is, yields the original lvalue
> * non-ref lvalues (NEW semantics): 're-interpret as rvalue' - no move,
> and accordingly no destruction after forwarding (because the rvalue will
> already be destructed earlier)
> * only valid for locals (incl. params), the destruction of other
> lvalues cannot be skipped
> * invalid/undefined to access the original lvalue after forwarding it
> (has been destructed already)
I think it would be better to do a `move`, where the `move` will usually
be optimized to a safe `__rvalue` as above. I think unsafe `__rvalue`
should be possible, but not `@safe`.
> * probably only valid:
> * as function call argument expressions (glue layer needs to treat
> it like a frontend-generated temporary, passing it directly by ref)
> * as assignment right-hand-sides, for move-assign (`dst = forward!
> src;` => `dst.opAssign(forward!src);`)
> * as return expressions, for move-constructions (but prefer NRVO if
> possible, for direct emplace)
> * probably needs to keep template syntax (`forward!x`, not `forward(x)`)
> for backwards compatibility with druntime template
>
> Let's take a look at an example:
> ```D
> import core.stdc.stdio;
> import core.lifetime;
>
> struct S {
> int x;
>
> this(int x) {
> this.x = x;
> printf("ctor: %p\n", &this);
> }
>
> this(this) {
> printf("copy: %p\n", &this);
> }
>
> ~this() {
> printf("dtor: %p\n", &this);
> }
> }
>
> void main() {
> {
> auto lval = S(1);
> printf("lval: %p\n", &lval);
> const r = bar1(lval);
> printf(" r: %p\n", &r);
> }
>
> {
> printf("\nrvalue:\n");
> const r = bar1(S(2));
> printf(" r: %p\n", &r);
> }
> }
>
> S bar1()(auto ref S s) {
> printf("bar1: %p\n", &s);
> return bar2(forward!s);
> }
>
> S bar2()(auto ref S s) {
> printf("bar2: %p\n", &s);
> return bar3(forward!s);
> }
>
> S bar3()(auto ref S s) {
> printf("bar3: %p\n", &s);
> return bar4(forward!s);
> }
>
> S bar4()(auto ref S s) {
> printf("bar4: %p, got a ref: %d\n", &s, __traits(isRef, s));
> return s; // copy parameter lvalue to return value
> }
> ```
>
> Output with DMD (and GDC), no backend optimizations:
> ```
> ctor: 0x7ffebea26460
> lval: 0x7ffebea26460
> bar1: 0x7ffebea26460
> bar2: 0x7ffebea26460
> bar3: 0x7ffebea26460
> bar4: 0x7ffebea26460, got a ref: 1
> copy: 0x7ffebea263d0
> r: 0x7ffebea26464
> dtor: 0x7ffebea26464
> dtor: 0x7ffebea26460
>
> rvalue:
> ctor: 0x7ffebea2647c
> bar1: 0x7ffebea26488
> bar2: 0x7ffebea26424
> bar3: 0x7ffebea263e4
> bar4: 0x7ffebea263a4, got a ref: 0
> copy: 0x7ffebea26358
> dtor: 0x7ffebea263a4
> dtor: 0x7ffebea263e4
> dtor: 0x7ffebea26424
> dtor: 0x7ffebea26488
> r: 0x7ffebea26478
> dtor: 0x7ffebea26478
> ```
>
> What we see is that current `core.lifetime.forward` propagates the ref-
> ness of the parameter, but has to `core.lifetime.move` it in the non-ref
> case, creating 3 explicit moves + destructions.
>
> We also see that there are compiler-implicit moves ('optimized', i.e.,
> no reset+destruction of the moved-from value):
> * when passing the `S(2)` rvalue to `bar1` (not sure why, seems like a
> bug) - note the different addresses of `ctor` and `bar1`
> * for the return values - the addresses of `copy` and `r` diverge
> (constructed @ 0x7ffebea26358, destructed @ 0x7ffebea26478)
>
> With LDC, we at least already get perfectly forwarded return values (the
> addresses of `copy` and `r` are identical):
> ```
> ctor: 0x7ffda922edbc
> lval: 0x7ffda922edbc
> bar1: 0x7ffda922edbc
> bar2: 0x7ffda922edbc
> bar3: 0x7ffda922edbc
> bar4: 0x7ffda922edbc, got a ref: 1
> copy: 0x7ffda922edb8
> r: 0x7ffda922edb8
> dtor: 0x7ffda922edb8
> dtor: 0x7ffda922edbc
>
> rvalue:
> ctor: 0x7ffda922eda0
> bar1: 0x7ffda922ed6c
> bar2: 0x7ffda922ed1c
> bar3: 0x7ffda922eccc
> bar4: 0x7ffda922ecc8, got a ref: 0
> copy: 0x7ffda922eda4
> dtor: 0x7ffda922ecc8
> dtor: 0x7ffda922eccc
> dtor: 0x7ffda922ed1c
> dtor: 0x7ffda922ed6c
> r: 0x7ffda922eda4
> dtor: 0x7ffda922eda4
> ```
>
> The compiler needs to implement RVO (Return Value Optimization,
> different to Named-RVO!) to enable perfect forwarding of the return
> values. In this example, `r` is allocated in `main`, then its address
> passed and forwarded as hidden pointer all the way to `bar4`, where it
> gets copy-constructed.
>
> With the proposed `forward` semantics, we'd get perfect forwarding of
> the `s` parameters too, without the 3 explicit moves and destructions.
> The `S(2)` rvalue would be created in `main`, then passed and forwarded
> directly by ref all the way to `bar4`, where it would get destructed
> when the `s` param goes out of scope.
>
> #### Cherry on top: Last-use optimization from DIP 1040
>
> This would make the compiler automatically `forward` suited lvalues. In
> the example, we wouldn't have to use a single explicit `forward` in the
> `barN` trampolines, *and* the copy-construction of the return value in
> the non-ref version of `bar4` would be optimized to a move-construction
> (`return forward!s`).
Sounds good, but I think simple cases like this one should be a
priority. Even if there is no data-flow analysis as advanced as the one
proposed in DIP1040, I think it is important that there is no copy in
`bar4`.
More information about the Digitalmars-d
mailing list