Prototype of Ownership/Borrowing System for D

Sat Nov 23 23:40:05 UTC 2019

On 21.11.19 12:29, Walter Bright wrote:
> On 11/20/2019 4:59 PM, Timon Gehr wrote:
>> On 20.11.19 23:45, Walter Bright wrote:
>>> On 11/20/2019 4:16 AM, Timon Gehr wrote:
>>>> - What do you want to achieve with borrowing/ownership in D?
>>>
>>> I want to prevent the following common issues with pointer code:
>>>
>>> 1. use after free
>>> 2. neglecting to free
>>> 3. double free
>>> ...
>>
>> GC prevents those,
> 
> That's right. The GC is memory safe.
> 
>> and those problems cannot appear in @safe code.
> 
> @safe code has to call free() some time when manually managing memory.
> ...

@safe code cannot call free, because free is not @safe. In particular, 
I'm not supposed to free a GC pointer or a pointer into the static data 
segment. @live is useless in @safe code.

If you want @live to mean: "do these additional checks", that is fine, 
if people indeed want to write @system code with those checks without a 
guarantee that their code is safe if the checks pass.

>> @live doesn't prevent them at the interface between @live and 
>> non- at live code.
> 
> @live relies on any function it calls obeying @live conventions for its 
> interface. This allows incremental adoption of @live code.
> ...

What it allows is one of the following:

1. a split of the language in two parts that cannot interoperate safely, 
in a language that claims to support memory safety.

2. @live checks provide no guarantees because they are optional and you 
can't rely on your callers to obey your desired borrowing/ownership 
interface.

If you want incremental adoption, just add the missing language 
features, and let them interoperate with existing code. New code will be 
written to take advantage of the new features. Don't change the meaning 
of existing language features based on a function attribute.

> 
>> What about user-defined types? What about allowing internal pointers 
>> into manually-managed memory to be exposed in @safe code?
> 
> Exposing an internal pointer in @live code is considered "borrowing" 
> from the root of its container.
> ...

It makes no sense to let the caller decide. The entity exposing the 
internal pointer should say whether it is borrowed out or not. The data 
structure manages its invariants, not the caller.

> 
>>>> - How will I write a compiler-checked memory safe program that uses 
>>>> varied allocation strategies, including plain malloc,
>>>
>>> I'm not sure what clarification you want about plain malloc/free, 
>>> although there are limitations outlined in ob.md.
>>> ...
>>
>> I.e., it is not planned that we will be able to write such programs?
> 
> I believe I covered that in ob.md. What am I missing?
> ...

You have previously attacked and dismissed my _sound_ designs for 
ostensibly not being checkable by the compiler (even though they are). I 
don't understand why this is not a concern for _your_ designs, which 
freely admit to being uncheckable and unsound. You can't say "@safe 
means memory safe and this is checked by the compiler" and at the same 
time "@live @safe relies on unchecked conventions to ensure memory 
safety across the @live boundary".

> 
>> The worry is that @live _removes_ value from tracing GC. If every 
>> pointer is owns its data, how do I express a pointer to GC-owned 
>> memory? Do I need to write a "smart" pointer data type that's just a 
>> shallow wrapper for a GC pointer? Also, if I do that, how do I make 
>> sure different GC-backed pointers don't lend out the same owning 
>> pointer at the same time?
> 
> @live does not distinguish a GC-allocated raw pointer from a 
> malloc-allocated raw pointer. This means you'll be able to write generic 
> @live code that can handle both equally.

I don't think this is the case. The GC-allocated raw pointer allows 
aliasing while @live does not allow aliasing. They have incompatible 
aliasing restrictions. It's like having a mutable and an immutable 
reference to the same memory location.

> Of course, if all you're using is the GC, you won't need to bother with @live at all.
> ...

I hope _nobody_ will have to bother with @live, but if they will, it 
will inevitably infect libraries and suddenly, yes, I will have to deal 
with it. Also, the GC is not _all_ I am using. I am using the GC when 
that makes sense and I am not using the GC when using the GC does not 
make sense. For example, I have code that is unsafe because it uses 
compile-time reflection to gain access to the internal array backing 
std.container.Array, because std.container.Array has no safe way to lend 
out that array to a caller, so it does not do it at all. Note that for 
this use case, it would be enough if there was some way for 
std.container.Array to state to the compiler that no invalidating 
operation may be called while the reference to the internal array is 
borrowed out. If std.container.Array can rely on all @safe code being 
checked that way, the function that borrows out the internal reference 
can be @safe. If it can't rely on that, no such @safe function can be 
written.

> 
>>> There's been a lot of progress with this with the addition of DIP25, 
>>> DIP1000, and DIP1012. This further improves it by making the 
>>> protections transitive.
>> As far as I can tell, @live doesn't bring us closer to @safe RC, 
>> because it applies to built-in pointers instead of library-defined 
>> smart pointers. I think this is completely backwards. Every owning 
>> pointer also needs to know the allocation strategy. Therefore, 
>> allowing built-in pointers to own their memory is vastly less useful 
>> than allowing library-defined smart pointers to do so.
> 
> Nothing about @live stops programmers from using library-defined smart 
> pointers.

What about the fact that it is _optional_ for a /caller/ to respect the 
smart pointer's desired ownership restrictions? That's very restrictive 
for the smart pointer! It won't be able to provide @safe borrowing 
functionality.

> The smart pointer would be the owning pointer,

Why do you _need_ an unsound @live construct to let a smart pointer be 
an owning pointer?

> and if it 
> exposed an internal pointer that internal pointer would be treated as 
> "borrowing" from the owner and further access to the smart pointer would 
> be denied until the borrower's last use.
> ...

Great. I want that in @safe code _if the smart pointer requests it_. No 
@live needed.

> 
>>> and conflating different allocators, which I don't have a good idea on.
>> Do the checks for library-defined smart pointers instead of built-in 
>> pointers. Built-in pointers shouldn't care about lifetime nor allocator.
> 
> People use raw pointers, and that isn't going away (because 
> performance).

How about because the `new` operator returns pointers?

If there is a difference in performance between a T* and a `struct { T* 
payload; }` that's an issue with the backend and/or the ABI.

> Telling people "just use smart pointers" is like telling 
> C++ people to do that.

C++ does not have @safe!

> It doesn't work reliably.
> ...

It works in @safe code because the smart pointer will be the only way 
for the @safe code to manually manage memory. E.g.:

struct MP(T){ // owning, malloc-backed pointer
     private T* payload;
     @disable this();
     @disable this(T*); // can't construct
     @disable this(this); // can't copy (move-only type; therefore track
                                         this type like you track
                                         pointers in @live now)
     pragma(inline,true){
         private @system ~this(){} // only current module can drop
                                   // values, in @system or @trusted code
         ref T borrow()return{ return *payload; }
     }
     // can borrow out internal pointer
     alias borrow this;
}

@safe MP!T malloc(); // type tracks allocator
@trusted void free(MP!T); // @safe because pointer is known to be unique 
and malloc'd

In order to call the safe free function you have to pass a pointer that 
was allocated with the matching smart pointer type. Note that by "smart" 
in this case I just mean it knows about the underlying allocator and it 
prevents the pointer from being leaked. There is no runtime behavior, 
it's all in the types.

I.e., this pointer type would use language features to precisely tell 
the compiler what restrictions its users have to obey. In this case, 
they may not invent new MP's, they may not copy MP's and they have to 
explicitly dispose of the MP. This is essentially what @live now does 
for all raw pointers, but maybe some data types only need a subset of 
the restrictions. In particular, raw pointers in @safe code need none of 
those restrictions.

There are potential issues if you try to borrow from some entity that is 
potentially referenced from somewhere else, so that should be 
disallowed. To bridge the gap, you can implement runtime checks, like 
Rust's cell does: https://doc.rust-lang.org/std/cell/

This is discussed in my ownership/borrowing post. (Note that my 
ownership/borrowing post assumes that `scope` pointers and `ref` 
pointers cannot alias. Aliasing restrictions could also be moved into a 
separate attribute for backwards-compatibility and better expressiveness.)

> The checks on smart pointers can be done with RAII and reference 
> counting, and the dips already implemented.
> ...

Not sure what this is referring to.

> 
>> The point of adding restrictions is to gain expressiveness. It's why 
>> type systems are a good idea. In this case, the point of borrowing 
>> restrictions should be to enable @safe code to manipulate interior 
>> pointers into manually-managed data structures.
> 
> They can do that now as long as the container only exposes interior 
> pointers as 'ref'.
> 

Not really, because aliasing is not considered. @live just assumes it 
does not exist and non- at live can introduce it arbitrarily. @live is all 
checks, no derived guarantees.