Smart pointers instead of GC?

Tue Feb 4 15:03:34 PST 2014

On Tuesday, 4 February 2014 at 22:30:39 UTC, Walter Bright wrote:
> I wonder how Rust deals with this.

The only time ownership matters is if you are going to store the 
pointer. It is like the difference between a container and a 
range.

An algorithm doesn't need to know about the specifics of a 
container. Let's use average for example. We might write it in D:

int average(InputRange)(InputRange r) {
     int count = 0;
     int sum;
     while(!r.empty) {
          count++;
          sum += r.front;
          r.popFront();
     }
     return sum / count;
}

Now, this being a template, D will generate new code for a 
variety of types... but even if we replaced InputRange with a 
specific thing, let's call it int[], it is still usable by a 
variety of containers:

int average(int[] r) { /* same impl */ }

D has two containers built in that provide this range:

int[50] staticArray;
int[] dynamicArray = new int[](50);

average(staticArray[]); // works
average(dynamicArray); // works

Pointers also offer this:

int* pointer = cast(int*) malloc(50 * int.sizeof);
average(pointer[0 .. 50]);

Moreover, user-defined types can also provide this range:

struct Numbers {
     int[] opSlice() { return [1,2,3]; }
}

Numbers numbers;
average(numbers[]); // works

In theory, we could provide either an inputRangeObject or a slice 
for linked lists, lazy generators, anything. One function, any 
kind of input.

Of course, we could slice memory from any allocator. Heck, we saw 
three different allocations right here (with three different 
types! stack, gc, and malloc) all using the same function, 
without templating.

I'm sure none of this is new to you... and this is basically how 
the rust thing works too. Our usage of int[] (or the input range) 
are borrowed pointers. Algorithms are written in their terms.

The ownership type only matters when you store it. And turns out, 
this matters in D as well:

struct ManualArray(T) {
     size_t length;
     T* data;

     this(size_t len) { data = malloc(T.sizeof * len); length = 
len; }
     ~this() { free(data); }
     T[] opSlice() { return data[0 .. length]; }
     @disable this(this); // copying this is wrong, don't allow it!
}

void main() {
     auto array = ManualArray!int(50);
     average(array[]); // works, reusing our pointer
}

But, borrowed comes into play if we store it:

int[] globalArray;
void foo(int[] array) {
     globalArray = array;
}

void bar() {
     auto array = ManualArray!int(50);
     foo(array[]); // uh oh
}

void main() {
    bar();
    globalArray[0] = 10; // crash likely, memory safety violated
}

Again, I'm sure none of this is new to you, but it illustrates 
owned vs borrowed: ManualArray is owned. Storing it is safe - it 
ensures its internal pointer is valid throughout its entire life 
time.

But ManualArray.opSlice returns a borrowed reference. Great for 
algorithms or any processing that doesn't escape the reference. 
Anything that would be written in terms of an input range is 
probably correct with this.

However, we stored the borrowed reference, which is a no-no. 
array went out of scope, freeing the memory, leaving the escaped 
borrowed reference in an invalid state.

Let's say we did want to store it. There's a few options: we 
could make our own copy or store the pre-made copy.

GC!(int[]) globalArray;
void foo(GC!(int[]) array) { globalArray = array; }

That's sane, the GC owns it and we specified that so storing it 
is cool.

We could also take a RefCounted!(int[]), if that's how we wanted 
to store it.

But let's say we wanted to store it with a different method. 
There's only two sane options:

void foo(int[] array) { globalArray = array.dup; }

Take a borrowed reference and make a copy of it. The function foo 
is in charge of allocating (here, we made a GC managed copy).

OR, don't implement that and force the user to decide:

void foo(GC!(int[]) array) {...}

user:

foo(ownedArray[]); // error, cannot implicitly convert int[] to 
GC!(int[])
int[50] stackArray;
foo(stackArray[]); // error, cannot implicitly convert int[] to 
GC!int[]

Now, the user makes the decision. It is going to be stored, the 
function signature says that up front by asking for a 
non-borrowed reference. They won't get a surprise crash when the 
globalArray later accesses stack or freed data. They have to deal 
with the error. They might not call the function, or they might 
do the .dup themselves. Either way, memory safety is preserved 
and inefficiencies are visible.

So, a function that stores a reference would only ever come in 
one or two signatures, regardless of how many:

1) the exact match for the callee's allocation strategy. The 
callee, knowing what the strategy is, can also be sanely 
responsible for freeing it. (A struct dtor, for example, knows 
that its members are malloced and can thus call free)

2) A generic borrowed type, e.g. input range or slice, which it 
then makes a private copy of it internally. Since these are 
arguably hidden allocations you might not even like these. 
Calling .dup (or whatever) at the call sight keeps the 
allocations visible.

So bottom line, you don't duplicate functions for the different 
types. You borrow references for processing (analogous to 
implementing algorithms with ranges) and own references for 
storing... which you need to know about, so only one type makes 
sense.