Interior pointers and fast GC
Chris Wright via Digitalmars-d
digitalmars-d at puremagic.com
Fri Jan 13 20:37:01 PST 2017
Interior pointers are a barrier to performant garbage collection.
Here, I'm brainstorming about the problem and not really coming to any
conclusions.
The world today
===============
D's garbage collector is significantly less performant than Java's, and
this is partly the result of D's features. Interior pointers are costly.
What's the deal with interior pointers?
---------------------------------------
A heap object (for this discussion) is a series of bytes on the GC heap.
An interior pointer is a pointer to some memory location within a heap
object, but not to the beginning of the heap object.
A garbage collector that never has to deal with internal pointers has
several options for storing metadata related to a heap object:
* In a segment of data immediately before the heap object
* In a hashmap
* In a binary search tree, a prefix tree, or the like
The first strategy is O(1) and has good data locality relative to the
data pointed at. Eclipse OMR assumes you use this strategy (though it
supports other methods).
The second strategy is O(1) in the expected case. By default, it has poor
data locality -- though you can improve that with careful management of
virtual memory.
The third strategy is O(log N) in the expected case and has similar data
locality to the hashtable.
Obviously, we prefer the first strategy. Unfortunately, given an interior
pointer, you can't identify the base of its heap object in constant time.
I believe D's garbage collector uses a binary search tree.
D allows unrestricted interior pointers. This makes it difficult to
experiment with other garbage collectors, except for those designed for D
or C++.
Addressof
---------
D lets you take the address of anything:
class Paddock {
uint goats, sheep;
}
auto paddock = new Paddock;
auto goatPtr = &paddock.goats;
The creation of interior pointers this way can be detected at compile
time, if that benefits us.
Arrays
------
A D array is a struct containing a pointer and a length.
D encourages array slicing:
auto str = "hello world!";
foreach (word; str.split(' ')) {
// This is an interior pointer
assert(word.ptr >= str.ptr);
assert(word.ptr < str.ptr + str.length);
}
Array slicing is splendid and worthwhile, and eliminating it would be a
huge cost.
We can at least potentially detect non-pathological cases of this.
Closures
--------
Closures can create interior pointers, depending on the compiler's
implementation:
class Paddock
{
uint goats, sheep;
void trackCattle() {
cattleEnter.connect((flock) {
if (flock.isGoats) goats += flock.size;
if (flock.isSheep) sheep += flock.size;
});
}
}
The compiler might include a copy of the `this` pointer in the closure,
or it might include direct pointers to `goats` and `sheep`. Both produce
equivalent results aside from the creation of interior pointers.
When the closure is defined in a struct method, it is impossible to
ensure that the contents of the closure do not include an interior
pointer -- the struct itself might be a member of another heap object.
What options do we have?
========================
If we want to reduce interior pointers, we have a number of options. We
would likely need several to get reasonable results.
Forbid user-generated interior pointers
---------------------------------------
We might simply forbid taking the address of a member of a heap object,
turning it into a compile-time error, or merely leaving it as undefined
behavior.
This is unlikely to be feasible. D's lifeblood is array slicing, and that
produces interior pointers left and right.
Augment arrays
--------------
We can augment arrays to eliminate interior pointers for common
operations:
struct Array(T)
{
private T* baseptr;
size_t offset;
size_t length;
T* ptr() { return baseptr + offset; }
}
It is possible to construct an array from a pointer, and any pointer
might be an interior pointer, but this strategy would at least reduce the
number of interior pointers we have floating around.
Change struct this pointer to pointer+offset
--------------------------------------------
We convert struct `this` pointers to a pointer plus an offset in order to
eliminate interior pointers from closures involving struct members. This
probably requires adding a hidden field to each struct, which breaks C
compatibility. It also requires a lot of bookkeeping, and I'm not sure
that's always possible. Uninitialized struct values break this.
Mark allocations that might contain interior pointers
-----------------------------------------------------
Since we can detect many things that might create interior pointers at
compile time, we can explicitly mark them as such. Conversely, we can
prove that many things do not contain interior pointers, and we could
mark those.
When we scan a heap object that is marked as not containing interior
pointers, we can use a fast path to locate the metadata associated with
that heap object. Otherwise, we revert to the current, slow path.
This will probably be useless unless we take steps to reduce the number
of interior pointers. By default, if an aggregate contains a pointer or
an array, it contains a possible interior pointer. If we augment arrays
to avoid interior pointers, this might be useful.
This is always an optimization because we can tell immediately (O(1)
time, reading values that are already in memory) whether to use the slow
or fast path.
Mark potential interior pointers
--------------------------------
By managing virtual memory explicitly, we can use the high order bit of a
pointer into the GC heap to indicate whether that is potentially an
interior pointer. We can potentially memory map multiple preselected
virtual address ranges to the same physical range (I'm uncertain about
this). However, on systems where that is not possible, every pointer read
requires invoking a runtime function.
This is not an option on 32-bit systems, where we have limited address
space to use. It would reduce us to ~1.5GB of GC heap (since the OS
typically reserves 0.5GB of address space). ASLR adds more work.
If implemented, this would give us an O(1) means for determining whether
to use the slow case or not, assuming we also managed to identify the
construction of all potential interior pointers.
This is also problematic with C/C++ interop and inline assembly. You can
trivially construct interior pointers in either, and they would be
unmarked. If pointer reads require invoking a runtime function, that is a
barrier with passing pointers into C/C++ code.
Hashtable with search tree fallback
-----------------------------------
We can use a hashtable of allocations first and fall back to a binary
search tree if necessary.
This has us maintaining two search data structures when we'd prefer zero.
However, it's the most easily implemented version.
The utility of this is proportional to the relative frequency of interior
pointers. If they are rare, this strategy pays off. If they are common,
this strategy adds overhead for little gain. I anticipate that this would
be slower with current DMD.
More information about the Digitalmars-d
mailing list