__rvalue and Move Semantics first draft

Thu Jan 16 02:53:51 UTC 2025

On Saturday, 9 November 2024 at 09:33:24 UTC, Walter Bright wrote:
> https://github.com/WalterBright/documents/blob/5dbf6728d7d0ae46a411c720ec41e3603310172b/rvalue.md

 From the DIP:
> An rvalue argument is considered to be owned by the function 
> called. Hence, if an lvalue is matched to the rvalue argument, 
> a copy is made of the lvalue to be passed to the function. The 
> function will then call the destructor (if any) on the 
> parameter at the conclusion of the function. An rvalue argument 
> is not copied, as it is assumed to already be unique, and is 
> also destroyed at the conclusion of the function. The 
> destruction is automatically appended to the function body by 
> the compiler.
>
> The function cannot know if its parameter originated as an 
> rvalue or is a copy of an lvalue.
>
> This means that an `__rvalue(lvalue expression)` argument 
> destroys the expression upon function return. Attempts to 
> continue to use the lvalue expression are invalid. The compiler 
> won't always be able to detect a use after being passed to the 
> function, which means that the destructor for the object must 
> reset the object's contents to its initial value, or at least a 
> benign value.

I think that sections need revising. As I understand it, a 
function binds an argument by reference or by value:
```d
void f(ref T reference); // binds by reference
void g(T value); // binds by value
```

In my mind, function parameters are essentially local variables 
of the function that are assigned by the caller (by providing 
arguments). If argument passing does not work exactly like 
initializing (local) variables, I’d consider that a flaw of the 
language.

This means:

If a parameter is bound by value, it will be destroyed as `g` 
returns (whether that is done by the caller or the callee is an 
implementation detail and not part of the language). If the 
caller passes `x` or `__rvalue(x)` is completely irrelevant for 
the callee. It only ever sees its parameter initialized and is 
responsible for its destruction. It cannot care where it came 
from.

If an argument is bound by reference, passing `__rvalue(x)` is 
either invalid or, if the `rvaluerefparam` preview is active, 
binds a temporary initialized in the stack frame of the caller by 
`__rvalue(x)`. It does not bind `x`, that would be extremely 
confusing. In that case, the caller is responsible for the 
destruction of the temporary. (The callee knows nothing about the 
creation of the temporary.)

We could introduce a parameter storage class `__rvalue ref` that:
* Corresponds to C++ rvalue references
* Allows binding rvalues only, and for `__rvalue(x)` arguments, 
no temporary is created.

That would allow a function to freely move from an argument:
```d
void tryAdd(__rvalue ref T x)
{
     if (…) this.x = __rvalue(x);
}
```

Contrary to the above, `void tryAdd(T x)` requires a move to pass 
an rvalue argument and another move to assign `this.x`. However, 
if moving a `T` is reasonably cheap, pass-by-value can make sense 
if binding lvalue arguments should be supported.

By itself, `__rvalue(x)` should do nothing. Only if an operation 
on it distinguishes rvalues and lvalues does it matter, which is 
its use case; then that ***usually*** leaves `x` in a moved-from 
state, but as shown above, there’s a use case for not moving from 
the variable. Thus, after `tryAdd(__rvalue(x))` the variable `x` 
contains a valid `T` object or a moved-from `T` object.

A moved-from `T` object need not support all operations `T` 
allows, but in C++, it must allow for two operations:
- being assigned
- being destroyed

Most types can support an empty state, and moving from an object 
would put it in that state.

---

It seems your DIP Draft conflates moving and relocation (C++ 
lingo). A relocation is a move followed by destruction of the 
source. The notion of relocation is meaningful because there are 
types for which relocation is trivial but moving is not.

For example, a `std::unique_ptr` has a non-trivial move: It must 
set the source `std::unique_ptr` in a null state (such that it 
can be assigned again or destroyed without releasing the managed 
resource, which has a new owner). A `std::unique_ptr` has a 
trivial relocation, though. If we simply copy the internal 
pointer and do not run the destructor on the source, the managed 
resource has a new owner and we don’t waste time setting the 
source null and then checking if the source is null (to skip the 
freeing of a possible managed resource.)

An example for a type that is not trivially relocatable is a type 
with an internal pointer (such as `std::string` usually). It has 
to readjust that pointer the relocation.

Using a moved-from object is reasonable; C++ requires assignment 
to be valid, usually more/all operations are allowed for most 
types. D can require a moved-from object to be fully usable.

Using a relocated-from object(!) is fundamentally invalid. It is 
already destroyed (that is, conceptually destroyed, an actual 
destructor need not have run). Using the variable is valid for 
taking its address or using the storage (e.g. for placement new) 
are valid.

For reference, the [Circle C++ language 
extension](https://github.com/seanbaxter/circle/blob/master/new-circle/README.md#relocate) implements relocation as a built-in operation.

Relocation and placement new make lifetimes non-lexical. Moving, 
on the other hand, does not disturb lexical lifetime.

The last paragraph of the quote again:
> This means that an `__rvalue(lvalue expression)` argument 
> destroys the expression upon function return. Attempts to 
> continue to use the lvalue expression are invalid. The compiler 
> won't always be able to detect a use after being passed to the 
> function, which means that the destructor for the object must 
> reset the object's contents to its initial value, or at least a 
> benign value.

That is probably not a good idea. It would render `__rvalue` a 
`@system` feature. Either the compiler can guarantee it’s safe to 
use or it can’t. Reliably recognizing use after destruction is 
probably impossible (definitely in `@system` code, and in purely 
`@safe` code, it at least requires difficult data-flow analysis). 
In C++, one is content saying it’s UB and moves on. D, with it’s 
focus on `@safe`, can’t do that (or rather shouldn’t, as it would 
make `__rvalue` immediately `@system`).

My suggestion: Require all D objects to be valid after being 
moved from (whatever the reason for a move was).

If you really want to explore relocation in the DIP, add 
`__relocate(x)` for that:
- Requires the result is used (assigned to something, initializes 
something, or passed by value(!) as a function argument).
- Removes the destructor call of `x` if it is a local and hasn’t 
used a placement `new` on it afterwards.
- `__relocate` could maybe be `@safe` in very constrained 
circumstances: The argument must be a local and there must not 
exist any references or aliases.