__rvalue and Move Semantics first draft
Quirin Schroll
qs.il.paperinik at gmail.com
Thu Jan 16 02:53:51 UTC 2025
On Saturday, 9 November 2024 at 09:33:24 UTC, Walter Bright wrote:
> https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md
From the DIP:
> An rvalue argument is considered to be owned by the function
> called. Hence, if an lvalue is matched to the rvalue argument,
> a copy is made of the lvalue to be passed to the function. The
> function will then call the destructor (if any) on the
> parameter at the conclusion of the function. An rvalue argument
> is not copied, as it is assumed to already be unique, and is
> also destroyed at the conclusion of the function. The
> destruction is automatically appended to the function body by
> the compiler.
>
> The function cannot know if its parameter originated as an
> rvalue or is a copy of an lvalue.
>
> This means that an `__rvalue(lvalue expression)` argument
> destroys the expression upon function return. Attempts to
> continue to use the lvalue expression are invalid. The compiler
> won't always be able to detect a use after being passed to the
> function, which means that the destructor for the object must
> reset the object's contents to its initial value, or at least a
> benign value.
I think that sections need revising. As I understand it, a
function binds an argument by reference or by value:
```d
void f(ref T reference); // binds by reference
void g(T value); // binds by value
```
In my mind, function parameters are essentially local variables
of the function that are assigned by the caller (by providing
arguments). If argument passing does not work exactly like
initializing (local) variables, I’d consider that a flaw of the
language.
This means:
If a parameter is bound by value, it will be destroyed as `g`
returns (whether that is done by the caller or the callee is an
implementation detail and not part of the language). If the
caller passes `x` or `__rvalue(x)` is completely irrelevant for
the callee. It only ever sees its parameter initialized and is
responsible for its destruction. It cannot care where it came
from.
If an argument is bound by reference, passing `__rvalue(x)` is
either invalid or, if the `rvaluerefparam` preview is active,
binds a temporary initialized in the stack frame of the caller by
`__rvalue(x)`. It does not bind `x`, that would be extremely
confusing. In that case, the caller is responsible for the
destruction of the temporary. (The callee knows nothing about the
creation of the temporary.)
We could introduce a parameter storage class `__rvalue ref` that:
* Corresponds to C++ rvalue references
* Allows binding rvalues only, and for `__rvalue(x)` arguments,
no temporary is created.
That would allow a function to freely move from an argument:
```d
void tryAdd(__rvalue ref T x)
{
if (…) this.x = __rvalue(x);
}
```
Contrary to the above, `void tryAdd(T x)` requires a move to pass
an rvalue argument and another move to assign `this.x`. However,
if moving a `T` is reasonably cheap, pass-by-value can make sense
if binding lvalue arguments should be supported.
By itself, `__rvalue(x)` should do nothing. Only if an operation
on it distinguishes rvalues and lvalues does it matter, which is
its use case; then that ***usually*** leaves `x` in a moved-from
state, but as shown above, there’s a use case for not moving from
the variable. Thus, after `tryAdd(__rvalue(x))` the variable `x`
contains a valid `T` object or a moved-from `T` object.
A moved-from `T` object need not support all operations `T`
allows, but in C++, it must allow for two operations:
- being assigned
- being destroyed
Most types can support an empty state, and moving from an object
would put it in that state.
---
It seems your DIP Draft conflates moving and relocation (C++
lingo). A relocation is a move followed by destruction of the
source. The notion of relocation is meaningful because there are
types for which relocation is trivial but moving is not.
For example, a `std::unique_ptr` has a non-trivial move: It must
set the source `std::unique_ptr` in a null state (such that it
can be assigned again or destroyed without releasing the managed
resource, which has a new owner). A `std::unique_ptr` has a
trivial relocation, though. If we simply copy the internal
pointer and do not run the destructor on the source, the managed
resource has a new owner and we don’t waste time setting the
source null and then checking if the source is null (to skip the
freeing of a possible managed resource.)
An example for a type that is not trivially relocatable is a type
with an internal pointer (such as `std::string` usually). It has
to readjust that pointer the relocation.
Using a moved-from object is reasonable; C++ requires assignment
to be valid, usually more/all operations are allowed for most
types. D can require a moved-from object to be fully usable.
Using a relocated-from object(!) is fundamentally invalid. It is
already destroyed (that is, conceptually destroyed, an actual
destructor need not have run). Using the variable is valid for
taking its address or using the storage (e.g. for placement new)
are valid.
For reference, the [Circle C++ language
extension](https://github.com/seanbaxter/circle/blob/master/new-circle/README.md#relocate) implements relocation as a built-in operation.
Relocation and placement new make lifetimes non-lexical. Moving,
on the other hand, does not disturb lexical lifetime.
The last paragraph of the quote again:
> This means that an `__rvalue(lvalue expression)` argument
> destroys the expression upon function return. Attempts to
> continue to use the lvalue expression are invalid. The compiler
> won't always be able to detect a use after being passed to the
> function, which means that the destructor for the object must
> reset the object's contents to its initial value, or at least a
> benign value.
That is probably not a good idea. It would render `__rvalue` a
`@system` feature. Either the compiler can guarantee it’s safe to
use or it can’t. Reliably recognizing use after destruction is
probably impossible (definitely in `@system` code, and in purely
`@safe` code, it at least requires difficult data-flow analysis).
In C++, one is content saying it’s UB and moves on. D, with it’s
focus on `@safe`, can’t do that (or rather shouldn’t, as it would
make `__rvalue` immediately `@system`).
My suggestion: Require all D objects to be valid after being
moved from (whatever the reason for a move was).
If you really want to explore relocation in the DIP, add
`__relocate(x)` for that:
- Requires the result is used (assigned to something, initializes
something, or passed by value(!) as a function argument).
- Removes the destructor call of `x` if it is a local and hasn’t
used a placement `new` on it afterwards.
- `__relocate` could maybe be `@safe` in very constrained
circumstances: The argument must be a local and there must not
exist any references or aliases.
More information about the dip.development
mailing list