[D-runtime] Precise garbage collection

Fri Jun 21 11:36:23 PDT 2013

Hi,

I want to make precise garbage collection as presented at the D 
conference ready for inclusion into druntime. I have recently updated 
the branch at https://github.com/rainers/druntime/tree/gcx_precise to 
merge with Leandros changes to the GC module layout.

Before creating a pull request, I'd like to hear opinions on whether 
this should be included, if other choices would be better and where it 
should be improved.

Precise garbage collection must be able to work with different memory 
areas, namely the heap, global/thread data in the binary and 
stack/registers per thread. I think we do not have a feasible solution 
for the latter pair, so let's focus on the former two.

1. Heap

In a nutshell, the implementation uses the RTInfo template to generate a 
bitmap of booleans that indicate whether the corresponding field in a 
struct/class might be a pointer or not. For built-in types, this 
information is predefined in object.d and the typeinfo.ti_* modules.
When memory is allocated, the TypeInfo for the allocated type is passed 
to the garbage collector, which copies the bitmap into memory alongside 
the pages allocated by the pool (another GCBits member with one bit per 
word). During a collection, scanning the heap can then look up this 
information to detect false pointers and avoid keeping garbage alive.

There are a number of issues that should be discussed:

a. the compiler sometimes does not generate the RTInfo for a struct, but 
instead generates 0/1 into the respective m_RTInfo field, depending on 
whether this struct contains references or not. (As far as I can tell 
this happens if it is only the backend that needs the TypeInfo, e.g. 
when generating an array concatenation call.) That's why there are 
rtinfoNoPointers/rtinfoHasPointers enums also used by the TypeInfos for 
the builtin types.

I consider this behaviour a bug that should be fixed, as it also 
disallows other usages of the RTInfo.

There is also an issue with not creating RTInfo for associative array 
types, but there is an "easy" workaround.

b. there are already other application of the RTInfo template, so there 
should probably be some way to combine multiple "generators". My idea is 
to let RTInfo!T point to some immutable(RTInfoData) struct that can then 
have multiple members for different generators. It'd be nice to change 
the return type of TypeInfo.rtInfo() from void* to 
immutable(RTInfoData)* to avoid casting.

c. The GC interface has to be extended to pass type information to
the GC. I guess just passing the respective TypeInfo pointer is obvious 
and correct.

d. The alternative to using a pointer bitmap alongside the pool memory 
is to store the TypeInfo pointer instead. It's major advantage is that 
it adds minimal performance overhead to the allocation.

d1. This needs more memory for small allocations, but less for larger. 
If it is stored in the same memory as the allocation itself, it should 
be at the end to avoid alignment issues (at the beginning it always adds 
16 bytes). This would mostly reuse unused memory due to the alignment of 
allocations to a power of 2. The worst effect would be for allocations 
of just below or equal to a power of 2. We could mitigate that effect by 
allowing other sizes of allocations aswell, or by not reserving space 
for the TypeInfo pointer if the block is allocated NOSCAN.

d2. Both dynamic and associative arrays currently allocate memory chunks 
and use them in a "non-standard" way that cannot be described by a 
simple TypeInfo. For example, dynamic arrays keep track of the allocated 
size of the array by placing it at the very end of allocations < 4k, but 
at the beginning for allocations >= 4k, moving the data to an offset of 
16 in the latter case. Associative arrays combine hash-list-node, key 
and value into a single allocation, sometimes even with the value type 
unavailable.

My implementation solves these issues by "emplacing" the appropriate 
type information at the given address, assuming pointers if no type 
information is available (using the new gc_emplace function).

If only a TypeInfo pointer is available, I'm not sure how this can be 
solved without _allocating_ a new TypeInfo. My best guess is that
it could be done if a generic array scan function could be called.

d3. These leads to the idea to generate a scanning function for each 
type instead of the pointer bitmap. Depending on the sparseness of 
pointers this can be shorter or can create a lot of code bloat. As a 
compromise a generic version might use the pointer bitmap, but it can be 
overloaded to implement arrays or even unions. On the downside, it makes 
using std.emplace much more complicated if you want precise scanning.

e. Currently, there is only one TypeInfo_Class for both a class instance 
and a class reference. There are currently assumptions made which one is 
meant depending on context (if used as a "root" it is usually an 
instance, when following TypeInfo.next it is assumed a reference). This 
does no longer work reliably when combined with "emplace". I think we 
need to add TypeInfo_Reference to describe the field that is a pointer 
to a class instance. To be honest I have no idea how much code might 
break by adding this indirection when traversing type information.

f. Currently, the precise GC can be versioned in/out, but I think making 
it a runtime option is preferable. A problem here is that the user has 
no way to change the default configuration before the GC is initialized.
I could imagine always starting with the precise GC, but the user can 
opt out anytime. The GC could then release any additional memory it has 
allocated for precise scanning. If the user opts back in, the data 
structures are rebuilt assuming everything allocated so far as void[].

Another option might be to make it a link time option, but that would 
mean the "standard" gc could not be part of the runtime library to be 
exchangeable.

This text is already too long, so I postpone discussing the global/tls 
data section until later.

Summing up the questions raised:

1. Assuming RTInfo generation is fixed, should we go for adding an 
indiretion through RTInfoData to allow multiple "generators"?

2. Are you ok with extending the gc_* function with TypeInfo parameters 
where appropriate?

3. Do you think the pointer bitmap aside the pool memory and adding 
gc_emplace is ok? Or should we investigate other alternatives first?

4. Would you object adding TypeInfo_Reference?

5. How do you prefer enabling/disabling precise garbage collection? 
Versioning/linking/runtime?

Best,
Rainer