Smart pointers instead of GC?
Adam D. Ruppe
destructionator at gmail.com
Tue Feb 4 15:03:34 PST 2014
On Tuesday, 4 February 2014 at 22:30:39 UTC, Walter Bright wrote:
> I wonder how Rust deals with this.
The only time ownership matters is if you are going to store the
pointer. It is like the difference between a container and a
range.
An algorithm doesn't need to know about the specifics of a
container. Let's use average for example. We might write it in D:
int average(InputRange)(InputRange r) {
int count = 0;
int sum;
while(!r.empty) {
count++;
sum += r.front;
r.popFront();
}
return sum / count;
}
Now, this being a template, D will generate new code for a
variety of types... but even if we replaced InputRange with a
specific thing, let's call it int[], it is still usable by a
variety of containers:
int average(int[] r) { /* same impl */ }
D has two containers built in that provide this range:
int[50] staticArray;
int[] dynamicArray = new int[](50);
average(staticArray[]); // works
average(dynamicArray); // works
Pointers also offer this:
int* pointer = cast(int*) malloc(50 * int.sizeof);
average(pointer[0 .. 50]);
Moreover, user-defined types can also provide this range:
struct Numbers {
int[] opSlice() { return [1,2,3]; }
}
Numbers numbers;
average(numbers[]); // works
In theory, we could provide either an inputRangeObject or a slice
for linked lists, lazy generators, anything. One function, any
kind of input.
Of course, we could slice memory from any allocator. Heck, we saw
three different allocations right here (with three different
types! stack, gc, and malloc) all using the same function,
without templating.
I'm sure none of this is new to you... and this is basically how
the rust thing works too. Our usage of int[] (or the input range)
are borrowed pointers. Algorithms are written in their terms.
The ownership type only matters when you store it. And turns out,
this matters in D as well:
struct ManualArray(T) {
size_t length;
T* data;
this(size_t len) { data = malloc(T.sizeof * len); length =
len; }
~this() { free(data); }
T[] opSlice() { return data[0 .. length]; }
@disable this(this); // copying this is wrong, don't allow it!
}
void main() {
auto array = ManualArray!int(50);
average(array[]); // works, reusing our pointer
}
But, borrowed comes into play if we store it:
int[] globalArray;
void foo(int[] array) {
globalArray = array;
}
void bar() {
auto array = ManualArray!int(50);
foo(array[]); // uh oh
}
void main() {
bar();
globalArray[0] = 10; // crash likely, memory safety violated
}
Again, I'm sure none of this is new to you, but it illustrates
owned vs borrowed: ManualArray is owned. Storing it is safe - it
ensures its internal pointer is valid throughout its entire life
time.
But ManualArray.opSlice returns a borrowed reference. Great for
algorithms or any processing that doesn't escape the reference.
Anything that would be written in terms of an input range is
probably correct with this.
However, we stored the borrowed reference, which is a no-no.
array went out of scope, freeing the memory, leaving the escaped
borrowed reference in an invalid state.
Let's say we did want to store it. There's a few options: we
could make our own copy or store the pre-made copy.
GC!(int[]) globalArray;
void foo(GC!(int[]) array) { globalArray = array; }
That's sane, the GC owns it and we specified that so storing it
is cool.
We could also take a RefCounted!(int[]), if that's how we wanted
to store it.
But let's say we wanted to store it with a different method.
There's only two sane options:
void foo(int[] array) { globalArray = array.dup; }
Take a borrowed reference and make a copy of it. The function foo
is in charge of allocating (here, we made a GC managed copy).
OR, don't implement that and force the user to decide:
void foo(GC!(int[]) array) {...}
user:
foo(ownedArray[]); // error, cannot implicitly convert int[] to
GC!(int[])
int[50] stackArray;
foo(stackArray[]); // error, cannot implicitly convert int[] to
GC!int[]
Now, the user makes the decision. It is going to be stored, the
function signature says that up front by asking for a
non-borrowed reference. They won't get a surprise crash when the
globalArray later accesses stack or freed data. They have to deal
with the error. They might not call the function, or they might
do the .dup themselves. Either way, memory safety is preserved
and inefficiencies are visible.
So, a function that stores a reference would only ever come in
one or two signatures, regardless of how many:
1) the exact match for the callee's allocation strategy. The
callee, knowing what the strategy is, can also be sanely
responsible for freeing it. (A struct dtor, for example, knows
that its members are malloced and can thus call free)
2) A generic borrowed type, e.g. input range or slice, which it
then makes a private copy of it internally. Since these are
arguably hidden allocations you might not even like these.
Calling .dup (or whatever) at the call sight keeps the
allocations visible.
So bottom line, you don't duplicate functions for the different
types. You borrow references for processing (analogous to
implementing algorithms with ranges) and own references for
storing... which you need to know about, so only one type makes
sense.
More information about the Digitalmars-d
mailing list