Interior pointers and fast GC

Fri Jan 13 20:37:01 PST 2017

Interior pointers are a barrier to performant garbage collection.

Here, I'm brainstorming about the problem and not really coming to any 
conclusions.

The world today
===============
D's garbage collector is significantly less performant than Java's, and 
this is partly the result of D's features. Interior pointers are costly.

What's the deal with interior pointers?
---------------------------------------
A heap object (for this discussion) is a series of bytes on the GC heap.

An interior pointer is a pointer to some memory location within a heap 
object, but not to the beginning of the heap object.

A garbage collector that never has to deal with internal pointers has 
several options for storing metadata related to a heap object:

 * In a segment of data immediately before the heap object
 * In a hashmap
 * In a binary search tree, a prefix tree, or the like

The first strategy is O(1) and has good data locality relative to the 
data pointed at. Eclipse OMR assumes you use this strategy (though it 
supports other methods).

The second strategy is O(1) in the expected case. By default, it has poor 
data locality -- though you can improve that with careful management of 
virtual memory.

The third strategy is O(log N) in the expected case and has similar data 
locality to the hashtable.

Obviously, we prefer the first strategy. Unfortunately, given an interior 
pointer, you can't identify the base of its heap object in constant time. 
I believe D's garbage collector uses a binary search tree.

D allows unrestricted interior pointers. This makes it difficult to 
experiment with other garbage collectors, except for those designed for D 
or C++.

Addressof
---------
D lets you take the address of anything:

  class Paddock {
      uint goats, sheep;
  }
  auto paddock = new Paddock;
  auto goatPtr = &paddock.goats;

The creation of interior pointers this way can be detected at compile 
time, if that benefits us.

Arrays
------
A D array is a struct containing a pointer and a length.

D encourages array slicing:

  auto str = "hello world!";
  foreach (word; str.split(' ')) {
      // This is an interior pointer
      assert(word.ptr >= str.ptr);
      assert(word.ptr < str.ptr + str.length);
  }

Array slicing is splendid and worthwhile, and eliminating it would be a 
huge cost.

We can at least potentially detect non-pathological cases of this.

Closures
--------
Closures can create interior pointers, depending on the compiler's 
implementation:

  class Paddock
  {
      uint goats, sheep;
      void trackCattle() {
          cattleEnter.connect((flock) {
              if (flock.isGoats) goats += flock.size;
              if (flock.isSheep) sheep += flock.size;
          });
      }
  }

The compiler might include a copy of the `this` pointer in the closure, 
or it might include direct pointers to `goats` and `sheep`. Both produce 
equivalent results aside from the creation of interior pointers.

When the closure is defined in a struct method, it is impossible to 
ensure that the contents of the closure do not include an interior 
pointer -- the struct itself might be a member of another heap object.

What options do we have?
========================
If we want to reduce interior pointers, we have a number of options. We 
would likely need several to get reasonable results.

Forbid user-generated interior pointers
---------------------------------------
We might simply forbid taking the address of a member of a heap object, 
turning it into a compile-time error, or merely leaving it as undefined 
behavior.

This is unlikely to be feasible. D's lifeblood is array slicing, and that 
produces interior pointers left and right.

Augment arrays
--------------
We can augment arrays to eliminate interior pointers for common 
operations:

  struct Array(T)
  {
      private T* baseptr;
      size_t offset;
      size_t length;
      T* ptr() { return baseptr + offset; }
  }

It is possible to construct an array from a pointer, and any pointer 
might be an interior pointer, but this strategy would at least reduce the 
number of interior pointers we have floating around.

Change struct this pointer to pointer+offset
--------------------------------------------
We convert struct `this` pointers to a pointer plus an offset in order to 
eliminate interior pointers from closures involving struct members. This 
probably requires adding a hidden field to each struct, which breaks C 
compatibility. It also requires a lot of bookkeeping, and I'm not sure 
that's always possible. Uninitialized struct values break this.

Mark allocations that might contain interior pointers
-----------------------------------------------------
Since we can detect many things that might create interior pointers at 
compile time, we can explicitly mark them as such. Conversely, we can 
prove that many things do not contain interior pointers, and we could 
mark those.

When we scan a heap object that is marked as not containing interior 
pointers, we can use a fast path to locate the metadata associated with 
that heap object. Otherwise, we revert to the current, slow path.

This will probably be useless unless we take steps to reduce the number 
of interior pointers. By default, if an aggregate contains a pointer or 
an array, it contains a possible interior pointer. If we augment arrays 
to avoid interior pointers, this might be useful.

This is always an optimization because we can tell immediately (O(1) 
time, reading values that are already in memory) whether to use the slow 
or fast path.

Mark potential interior pointers
--------------------------------
By managing virtual memory explicitly, we can use the high order bit of a 
pointer into the GC heap to indicate whether that is potentially an 
interior pointer. We can potentially memory map multiple preselected 
virtual address ranges to the same physical range (I'm uncertain about 
this). However, on systems where that is not possible, every pointer read 
requires invoking a runtime function.

This is not an option on 32-bit systems, where we have limited address 
space to use. It would reduce us to ~1.5GB of GC heap (since the OS 
typically reserves 0.5GB of address space). ASLR adds more work.

If implemented, this would give us an O(1) means for determining whether 
to use the slow case or not, assuming we also managed to identify the 
construction of all potential interior pointers.

This is also problematic with C/C++ interop and inline assembly. You can 
trivially construct interior pointers in either, and they would be 
unmarked. If pointer reads require invoking a runtime function, that is a 
barrier with passing pointers into C/C++ code.

Hashtable with search tree fallback
-----------------------------------
We can use a hashtable of allocations first and fall back to a binary 
search tree if necessary.

This has us maintaining two search data structures when we'd prefer zero. 
However, it's the most easily implemented version.

The utility of this is proportional to the relative frequency of interior 
pointers. If they are rare, this strategy pays off. If they are common, 
this strategy adds overhead for little gain. I anticipate that this would 
be slower with current DMD.