Borrowing and Ownership

Sun Oct 27 22:36:30 UTC 2019

I finally got around to writing up some thoughts on @safe borrowing and 
ownership in D. I didn't spend nearly enough time on this post, so the 
details of this proposal might not be optimal yet, and it is likely to 
miss a few details. The TLDR is that `scope` pointers and built-in 
references should behave like Rust borrowed pointers. (Except lifetimes 
will be tracked through function calls and data structures a lot less 
precisely, at least initially.) The meaning of `T*` should not change 
from what it is today.

First, note that even though there is a lot of confusion around this, 
`@safe` is currently not inherently broken. It provides memory safety 
(modulo implementation bugs in the compiler). The problem we want to 
solve is that @safe code does not support exposing direct references 
into the guts of data structures that use memory management schemes 
other than tracing GC. @trusted is currently broken, however (see 
further below in this post).

Basic assumptions:
- We want to start with simple rules that ensure memory safety of 
slightly more expressive @safe code instead of comprehensive ones that 
ensure both safety and very high expressiveness. (I have more ambitious 
ideas than what I discuss here, but I doubt those are realistic for D 
right now.)
- With DIP 1021 accepted, `scope` is headed to mean controlled lifetime 
without mutable aliasing. (`ref` implies `scope`).
- Tracing GC is a successful way to write @safe programs and should be 
continued to be supported as an option.

In particular, @live is a dead end, because:
- It either provides no guarantees or it breaks memory safety of @safe code.
- It wants to change the meaning of `T*` based on a function attribute.
- It breaks D programs that want to use the GC.

The next steps should instead be roughly as follows:

Clarify the meaning of `T*` in impure `@safe` code:

- A non-`scope` built-in pointer in impure `@safe` code points to a 
value whose lifetime (e.g. a GC pointer or a pointer into the data 
segment) and unrestricted aliasing. The same holds true for non-`scope` 
class references. This is true today, but should be explicitly stated in 
the language specification to prevent confusion.

- In @system code, `T*` is a pointer with arbitrary lifetime, and 
@trusted code needs to ensure @safe code cannot access a `T*` whose 
lifetime may be less than the last possible time that @safe code might 
access the pointer.

Improve `@trusted`:

- The problem with `@trusted` is that it has no defense against `@safe` 
code destroying its invariants or accessing raw pointers that are only 
meant to be manipulated by `@trusted` code. There should therefore be a 
way to mark data as `@trusted` (or equivalent), such that `@safe` code 
can not access it.

Change the meaning of `scope`:

- `scope` should apply to all types of data equally, not only built-in 
pointers and references. The most obvious use case for this is @safe 
interfacing with a C library that exposes handles as structs with an 
integer field but specifies undefined behavior if those handles are 
mismanaged. Not everything that is a manually-managed reference to 
something is a built-in pointer or reference.

- Non-immutable non-scope values may not be assigned to `scope` values. 
In particular, non-`immutable` `scope` member functions cannot accept a 
non-`scope` receiver. This is necessary, because otherwise you 
immediately break the aliasing guarantee DIP 1021 aims to introduce.

- `scope` on a struct does not imply its fields are `scope`. (It is 
perfectly fine to store a GC pointer within something with a scoped 
lifetime.)

- Fields can be `scope`. `scope` fields cannot be accessed through a 
non-`scope` receiver. The lifetime of `scope` fields ends when the 
lifetime of the enclosing object ends.

- `scope` has to be a type constructor.

- A non-`scope` pointer cannot be dereferenced if that would yield a 
`scope` value. (However, such a `scope` value can be moved somewhere 
else through a non-scope pointer.)

Add borrowing rules:

- When copying a mutable `scope` value to another mutable `scope` value, 
access to the original value has to be disabled until the copy's 
lifetime ends.

- When copying a mutable `scope` value to a `const` `scope` value, the 
original value has to become `const` until the copy's lifetime ends.

- When copying a `const` `scope` value to a `const` `scope` value, the 
original value only has to outlive the copy.

- In particular, when taking the address of a value on the stack, the 
resulting `scope`d pointer will restrict access to that variable 
according to those rules until its lifetime ends. The `return` 
annotation can be used to track such assignments through function calls.

- For stack values, data flow analysis can be used to detect values that 
can be temporarily promoted to `scope`. Overloaded functions should 
prefer the `scope` overload.

Example: Library implementation of Unique pointers with @safe borrowing 
(`const`/`immutable`/`class` interactions left out for simplicity):

---
struct Unique(T){
     @trusted private scope T* payload;
     @disable this(this);
     auto borrow()@trusted return{ // (`return` refers to `ref this`)
         // potentially many references to unique pointer exist,
         // need runtime check
         // here, we'll just temporarily null out the Unique reference.
         static struct Borrowed{
             @trusted private scope Unique!T* self;
             @trusted private scope T* payload;
             @disable this(this);
             ~this()@trusted{ self.payload=payload; }
             return scope(T*) borrow()@trusted scope{
                 return payload;
             }
             alias borrow this;
         }
         auto borrowed=payload;
         payload=null;
         return scope(Borrowed)(&this,borrowed);
     }
     scope(T*) borrow()@trusted scope return{
         // only one reference to unique pointer exists,
         // just return payload
	// note that while this does not actually return
	// a reference to `this`, we want the calling `@safe`
	// code to treat it as if it did, so that this can be
	// a `@trusted` function
         return payload;
     }
     ~this(){
         destroy((()@trusted=>payload)());
         ()@trusted{
             free(payload);
             payload=null;
         }
     }
     alias borrow this; // enable implicit borrowing
}
Unique!T makeUnique(T,A...)(A args){
     auto p=malloc(...);
     ...;
     return Unique!T(p);
}
---

---
void main(){
     auto p=makeUnique!int(3);
     ++*p; // ok, p is temporarily promoted to `scope` and `++` is
           // evaluated on a borrowed p.
     {
         scope Unique!int* q=[p].ptr;
         ++*p; // error, p is borrowed by q
     }
     ++*p; // ok, q went out of scope
     Unique!int* q=[p].ptr; // ok
     ++*p; // ok
     // however, this line used the non-scope overload of `borrow` as
     // `p`can no longer be promoted to `scope`
     auto r=q; // ok
     ++**q; // ok
     static void foo(ref int x, Unique!int* y){
        assert((*y).borrow() is null); // reference disabled temporarily
        ++x; // ok
     }
     foo((*q).borrow(),r);
     foo((*r).borrow(),q);
}
---

Similar strategies work for manually-allocated arrays and reference 
counting.
For @safe reference counting for mutable payloads, there always needs to 
be a runtime check on borrow, similar to the first implementation of the 
`borrow` function above. This could be implemented by reserving a bit in 
the reference count for keeping track of such mutable borrows. To enable 
both const and mutable borrows, one would probably need two reference 
counts, one for normal references and one for const borrows. (Note that 
Rust uses similar runtime checks for safe reference counting.)

The main drawback of this proposal is that it doesn't separate control 
of lifetime and control of aliasing, doing so would however require 
adding another type qualifier and does not have precedent in Rust.