What ever happened to move semantics?
Timon Gehr
timon.gehr at gmx.ch
Wed Feb 28 14:17:44 UTC 2024
On 2/28/24 02:06, Walter Bright wrote:
> On 2/27/2024 4:42 PM, Timon Gehr wrote:
>> FWIW I have been pushing this a couple times at the DLF meetings, but
>> in the end somebody will have to put in the work to implement it in
>> the compiler and I cannot spend the time required for that atm.
>>
>> The move hole is also an issue for tuple unpacking though.
>
> Reviewing the DIP would be a big help if that can work for you.
>
> https://github.com/dlang/DIPs/blob/master/DIPs/DIP1040.md
Sure! A lot of good stuff in there. Here's my review.
Points 1 to 15 respond to the DIP contents. The main issue I see is the
way move construction and assignment are declared by special-casing
existing syntax that already means something else _and changing its
observable behavior_. To fix this, I think there should be separate
syntax for suppressing the destructor call.
Furthermore, partial moving in general does not work in the way it is
specified in the DIP, it bypasses the destructor of the enclosing struct
without participation of that struct.
Point 16 to 18 point out things that are missing from the DIP. The main
issue I see here is that destructuring is missing from the DIP. This is
crucial in order to be able to transform data from one type into data
from another type while using only moves and no copies or destruction.
1. Regarding last use:
> ```d
> S s;
> f(s); // copy
> f(s); // copy
> f(s); // move
> ```
It would be useful to show examples with dynamic control flow (edit: I
see some examples occur later too), such as:
```d
S s
foreach(i;0..3){
f(s); // ?
}
```
I assume the line marked "?" will always copy? Maybe it would be better
to allow implementation-defined copy elision (also see 11.).
```d
S s;
f(s); // copy
f(s); // ?
if(uniform(0,2))
return;
f(s); // move
```
I assume the line marked "?" will always copy? Maybe it would be better
to allow implementation-defined copy elision (also see 11.).
2. Regarding Existing State in D:
- It would make sense to elaborate on `@disable`d copy constructors.
This is similar to not implementing the `Copy` trait in Rust. The
resulting values can only be moved.
- In D, you can also have a `private` destructor. As far as I can tell,
this is currently useless, but with move semantics this can be used to
enforce explicit destruction via move, which is a nice way to design a
library interface.
3. Regarding declaration syntax of Move Constructors and Move Assignment
Operators
I would highly recommend to use a distinct syntax for suppressing
destruction of the argument. I will argue here specifically for the case
of Move Constructors, but Move Assignment operators have exactly the
same issue.
>
> A Move Constructor is a struct member constructor that moves, rather than copies, the argument corresponding to its first parameter into the object to be constructed. The argument is invalid after this move, and is not destructed.
>
> A Move Constructor for struct S is declared as:
>
> ```d
> this(S s) { ... }
> ```
This is a breaking language change.
Also, consider
```d
struct S{
...
this(T)(T t){ ... }
...
}
```
This constructor will be a move constructor iff T=S. Therefore, that the
destructor is not called on the argument in some cases may be very
surprising to programmers.
A similar example is this one
```d
struct S{
...
this(T...)(S s, T args){ ... }
...
}
```
Here, the constructor is a move constructor iff no additional `args` are
passed.
Overall, the proposed syntax introduces a surprising special case.
Also, what is the syntax for a copy constructor?
Would it be `this(ref S s){ ... }` ?
4. Regarding `nothrow` on Move Constructors and Move Assignment Operators.
> The Move Constructor is always nothrow, even if nothrow is not explicitly specified. A Move Constructor that throws is illegal.
This special case should be motivated in the DIP. I assume the
motivation is that because the argument is not destructed, throwing is
particularly error-prone here.
In general, I would advise against built-in requirements on specified
attributes unless absolutely necessary.
5. Regarding Default Move Constructor
> If a Move Constructor is not defined for a struct that has a Move Constructor in one or more of its fields, a default one is defined, and fields without a Move Constructor are moved using a bit copy.
This is missing a specification of what the default move constructor
does. (I assume it is implemented as a move for each field, in lexical
order, where fields without a Move Constructor are moved using a bit copy.)
6. Regarding Default Move Constructor and Default Move Assignment Operator.
> If a Move Constructor is not defined for a struct that has a Move Assignment Operator, a default Move Constructor is defined and implemented as a move for each of its fields, in lexical order.
>
This generated move constructor will often do the wrong thing.
A correct way to do it would be to default-initialize a new instance and
then call the Move Assignment Operator on it.
It is also worth considering if instead, a Move Constructor Operator
should not just be required to be defined explicitly in any struct that
has an explicit Move Assignment Operator defined.
> If a Move Assignment Operator is not defined for a struct that has a Move Assignment Operator in one or more of its fields, a default Move Assignment Operator is defined, and fields without a Move Assignment Operator are moved using a bit copy.
>
> If a Move Assignment Operator is not defined for a struct that has a Move Constructor, a default Move Assignment Operator is defined and implemented as a move for each of its fields, in lexical order.
This generated move assignment operator will usually do the wrong thing.
A correct but inefficient way to do it would be to destroy the current
object and reconstruct it using the Move Constructor.
It is also worth considering if instead, a Move Assignment Operator
should not just be required to be defined explicitly in any struct that
has an explicit Move Constructor defined.
7. Regarding EMO
> An EMO is a struct that has both a Move Constructor and a Move Assignment Operator. An EMO defaults to exhibiting move behavior when passed and returned from functions rather than the copy behavior of non-EMO objects.
This definition is not self-contained and should therefore refer to the
discussion further below for clarification.
8. Regarding Move Ref
> A Move Ref is a parameter that is a reference to an EMO. (The ref is not used.)
For small structs, the additional indirection from the implicit
reference will introduce overhead.
9. Regarding NRVO of EMO objects
> If NRVO cannot be performed, s is copied to the return value on the caller's stack.
This is surprising to me. I would have expected `s` to be moved to the
return value on the caller's stack instead.
10. Regarding Returning an EMO by Move Ref
This is too cute, because it changes the meaning of `return` in one
specific special case. Consider:
```d
struct S{
int* ptr;
this(S s){ this.ptr=s.ptr; }
void opAssign(S s){ this.ptr=s.ptr; }
}
S func(return S s){
return S(s);
}
```
The `return` annotation is needed because the pointer again appears in
the return value. Note that this is a simplified example, but we could
think of similar ones with multiple involved pointers that need to be
permuted (though I don't know how to implement that without
destructuring or destruction).
11. Regarding Copy Elision
Maybe it would be better to specify explicitly that an implementation is
allowed to optimize the pattern:
```d
auto s = t; // (copy)
... // arbitrary code not referring to `t`
destroy(t);
```
to:
```d
auto s = move(t);
```
12. Regarding lifetimes.
You make a point about nested functions and lambdas. However, this is
not the only problem. Consider:
```d
struct S{
int x;
}
int foo()@safe{
S s;
scope p = &s.x;
bar(s); // last use of s, moved
return *p; // bad memory access
}
```
13. Regarding partial move.
> Therefore, the generalized rule is that an access to an EMO field of an aggregate will be moved only if that is the last access of the containing variable.
This does not work. You cannot elide the entire destructor of `S` based
on moving a single field of `S`.
14. Regarding Destruction
This is a bit inconsistent with what was presented earlier. I agree that
implementation-defined copy elision is probably a good idea (see 11.).
15. Regarding C++ interop.
I do not see anything obviously wrong, except that the requirement to
opt out of rvalue references seems error prone. I think Manu has more
expertise here.
Also, it would be good to specify `@value` as a standalone thing in the
DIP, as it may be useful beyond C++ interop (also see point 8.).
What is missing from the DIP?
16. Missing: Redeclaration after Move
```d
S s, t;
func(s); // moved, `s` no longer accessible
S s = t; // explicit construction via redeclaration
```
A nice feature of this is that the type of a variable can be changed on
redeclaration. Note that Rust allows this.
17. Missing: Destructuring
This is partially attempted in the DIP via partial move (which does not
work).
However, there must be a way to implement the following:
```d
struct U(T...){
T fields;
}
struct S(T...){
T fields;
@disable ~this();
... // need support from S to bypass destructor
}
// fields of resulting U must be moved from the fields of S
U fromS(S s){ ... }
```
18. Missing: Moving the receiver
```d
struct S{
T foo()@rvalue{ ... }
@disable ~this();
}
```
void main(){
S s;
auto t=s.foo(); // last use of s
}
```
More information about the Digitalmars-d
mailing list