What ever happened to move semantics?

Wed Feb 28 14:17:44 UTC 2024

On 2/28/24 02:06, Walter Bright wrote:
> On 2/27/2024 4:42 PM, Timon Gehr wrote:
>> FWIW I have been pushing this a couple times at the DLF meetings, but 
>> in the end somebody will have to put in the work to implement it in 
>> the compiler and I cannot spend the time required for that atm.
>>
>> The move hole is also an issue for tuple unpacking though.
> 
> Reviewing the DIP would be a big help if that can work for you.
> 
> https://github.com/dlang/DIPs/blob/master/DIPs/DIP1040.md

Sure! A lot of good stuff in there. Here's my review.

Points 1 to 15 respond to the DIP contents. The main issue I see is the 
way move construction and assignment are declared by special-casing 
existing syntax that already means something else _and changing its 
observable behavior_. To fix this, I think there should be separate 
syntax for suppressing the destructor call.
Furthermore, partial moving in general does not work in the way it is 
specified in the DIP, it bypasses the destructor of the enclosing struct 
without participation of that struct.

Point 16 to 18 point out things that are missing from the DIP. The main 
issue I see here is that destructuring is missing from the DIP. This is 
crucial in order to be able to transform data from one type into data 
from another type while using only moves and no copies or destruction.

1. Regarding last use:

> ```d
> S s;
> f(s); // copy
> f(s); // copy
> f(s); // move
> ```

It would be useful to show examples with dynamic control flow (edit: I 
see some examples occur later too), such as:

```d
S s
foreach(i;0..3){
     f(s); // ?
}
```

I assume the line marked "?" will always copy? Maybe it would be better 
to allow implementation-defined copy elision (also see 11.).

```d
S s;
f(s); // copy
f(s); // ?
if(uniform(0,2))
     return;
f(s); // move
```

I assume the line marked "?" will always copy? Maybe it would be better 
to allow implementation-defined copy elision (also see 11.).

2. Regarding Existing State in D:

- It would make sense to elaborate on `@disable`d copy constructors. 
This is similar to not implementing the `Copy` trait in Rust. The 
resulting values can only be moved.

- In D, you can also have a `private` destructor. As far as I can tell, 
this is currently useless, but with move semantics this can be used to 
enforce explicit destruction via move, which is a nice way to design a 
library interface.

3. Regarding declaration syntax of Move Constructors and Move Assignment 
Operators

I would highly recommend to use a distinct syntax for suppressing 
destruction of the argument. I will argue here specifically for the case 
of Move Constructors, but Move Assignment operators have exactly the 
same issue.

> 
> A Move Constructor is a struct member constructor that moves, rather than copies, the argument corresponding to its first parameter into the object to be constructed. The argument is invalid after this move, and is not destructed.
> 
> A Move Constructor for struct S is declared as:
> 
> ```d
> this(S s) { ... }
> ```

This is a breaking language change.

Also, consider

```d
struct S{
     ...
     this(T)(T t){ ... }
     ...
}
```

This constructor will be a move constructor iff T=S. Therefore, that the 
destructor is not called on the argument in some cases may be very 
surprising to programmers.

A similar example is this one

```d
struct S{
     ...
     this(T...)(S s, T args){ ... }
     ...
}
```

Here, the constructor is a move constructor iff no additional `args` are 
passed.

Overall, the proposed syntax introduces a surprising special case.

Also, what is the syntax for a copy constructor?
Would it be `this(ref S s){ ... }` ?

4. Regarding `nothrow` on Move Constructors and Move Assignment Operators.

> The Move Constructor is always nothrow, even if nothrow is not explicitly specified. A Move Constructor that throws is illegal.

This special case should be motivated in the DIP. I assume the 
motivation is that because the argument is not destructed, throwing is 
particularly error-prone here.

In general, I would advise against built-in requirements on specified 
attributes unless absolutely necessary.

5. Regarding Default Move Constructor

> If a Move Constructor is not defined for a struct that has a Move Constructor in one or more of its fields, a default one is defined, and fields without a Move Constructor are moved using a bit copy.

This is missing a specification of what the default move constructor 
does. (I assume it is implemented as a move for each field, in lexical 
order, where fields without a Move Constructor are moved using a bit copy.)

6. Regarding Default Move Constructor and Default Move Assignment Operator.

> If a Move Constructor is not defined for a struct that has a Move Assignment Operator, a default Move Constructor is defined and implemented as a move for each of its fields, in lexical order.
> 
This generated move constructor will often do the wrong thing.

A correct way to do it would be to default-initialize a new instance and 
then call the Move Assignment Operator on it.

It is also worth considering if instead, a Move Constructor Operator 
should not just be required to be defined explicitly in any struct that 
has an explicit Move Assignment Operator defined.

> If a Move Assignment Operator is not defined for a struct that has a Move Assignment Operator in one or more of its fields, a default Move Assignment Operator is defined, and fields without a Move Assignment Operator are moved using a bit copy.
> 
> If a Move Assignment Operator is not defined for a struct that has a Move Constructor, a default Move Assignment Operator is defined and implemented as a move for each of its fields, in lexical order.

This generated move assignment operator will usually do the wrong thing.

A correct but inefficient way to do it would be to destroy the current 
object and reconstruct it using the Move Constructor.

It is also worth considering if instead, a Move Assignment Operator 
should not just be required to be defined explicitly in any struct that 
has an explicit Move Constructor defined.

7. Regarding EMO

> An EMO is a struct that has both a Move Constructor and a Move Assignment Operator. An EMO defaults to exhibiting move behavior when passed and returned from functions rather than the copy behavior of non-EMO objects.

This definition is not self-contained and should therefore refer to the 
discussion further below for clarification.

8. Regarding Move Ref

> A Move Ref is a parameter that is a reference to an EMO. (The ref is not used.)
For small structs, the additional indirection from the implicit 
reference will introduce overhead.

9. Regarding NRVO of EMO objects

> If NRVO cannot be performed, s is copied to the return value on the caller's stack.

This is surprising to me. I would have expected `s` to be moved to the 
return value on the caller's stack instead.

10. Regarding Returning an EMO by Move Ref

This is too cute, because it changes the meaning of `return` in one 
specific special case. Consider:

```d
struct S{
     int* ptr;
     this(S s){ this.ptr=s.ptr; }
     void opAssign(S s){ this.ptr=s.ptr; }
}
S func(return S s){
     return S(s);
}
```

The `return` annotation is needed because the pointer again appears in 
the return value. Note that this is a simplified example, but we could 
think of similar ones with multiple involved pointers that need to be 
permuted (though I don't know how to implement that without 
destructuring or destruction).

11. Regarding Copy Elision

Maybe it would be better to specify explicitly that an implementation is 
allowed to optimize the pattern:

```d
auto s = t; // (copy)
... // arbitrary code not referring to `t`
destroy(t);
```

to:

```d
auto s = move(t);
```

12. Regarding lifetimes.

You make a point about nested functions and lambdas. However, this is 
not the only problem. Consider:

```d
struct S{
     int x;
}
int foo()@safe{
     S s;
     scope p = &s.x;
     bar(s); // last use of s, moved
     return *p; // bad memory access
}
```

13. Regarding partial move.

> Therefore, the generalized rule is that an access to an EMO field of an aggregate will be moved only if that is the last access of the containing variable.

This does not work. You cannot elide the entire destructor of `S` based 
on moving a single field of `S`.

14. Regarding Destruction

This is a bit inconsistent with what was presented earlier. I agree that 
implementation-defined copy elision is probably a good idea (see 11.).

15. Regarding C++ interop.

I do not see anything obviously wrong, except that the requirement to 
opt out of rvalue references seems error prone. I think Manu has more 
expertise here.

Also, it would be good to specify `@value` as a standalone thing in the 
DIP, as it may be useful beyond C++ interop (also see point 8.).

What is missing from the DIP?

16. Missing: Redeclaration after Move

```d
S s, t;
func(s); // moved, `s` no longer accessible
S s = t; // explicit construction via redeclaration
```

A nice feature of this is that the type of a variable can be changed on 
redeclaration. Note that Rust allows this.

17. Missing: Destructuring

This is partially attempted in the DIP via partial move (which does not 
work).

However, there must be a way to implement the following:

```d
struct U(T...){
     T fields;
}

struct S(T...){
     T fields;

     @disable ~this();
     ... // need support from S to bypass destructor
}

// fields of resulting U must be moved from the fields of S
U fromS(S s){ ... }
```

18. Missing: Moving the receiver

```d
struct S{
     T foo()@rvalue{ ... }
     @disable ~this();
}
```

void main(){
     S s;
     auto t=s.foo(); // last use of s
}
```