[D-runtime] Precise garbage collection

Sat Jun 22 00:11:17 PDT 2013

On 6/21/2013 11:36 AM, Rainer Schuetze wrote:
> Hi,
>
> I want to make precise garbage collection as presented at the D conference 
> ready for inclusion into druntime. I have recently updated the branch at 
> https://github.com/rainers/druntime/tree/gcx_precise to merge with Leandros 
> changes to the GC module layout.

Yes.

>
> Before creating a pull request, I'd like to hear opinions on whether this 
> should be included, if other choices would be better and where it should be 
> improved.
>
> Precise garbage collection must be able to work with different memory areas, 
> namely the heap, global/thread data in the binary and stack/registers per 
> thread. I think we do not have a feasible solution for the latter pair, so 
> let's focus on the former two.

I believe we can successfully ignore precise scanning of the stack.

>
> 1. Heap
>
> In a nutshell, the implementation uses the RTInfo template to generate a 
> bitmap of booleans that indicate whether the corresponding field in a 
> struct/class might be a pointer or not. For built-in types, this information 
> is predefined in object.d and the typeinfo.ti_* modules.
> When memory is allocated, the TypeInfo for the allocated type is passed to the 
> garbage collector, which copies the bitmap into memory alongside the pages 
> allocated by the pool (another GCBits member with one bit per word).

I assume you mean 1 bit per (void*).sizeof bytes.

> During a collection, scanning the heap can then look up this information to 
> detect false pointers and avoid keeping garbage alive.
>
> There are a number of issues that should be discussed:
>
> a. the compiler sometimes does not generate the RTInfo for a struct, but 
> instead generates 0/1 into the respective m_RTInfo field, depending on whether 
> this struct contains references or not. (As far as I can tell this happens if 
> it is only the backend that needs the TypeInfo, e.g. when generating an array 
> concatenation call.) That's why there are rtinfoNoPointers/rtinfoHasPointers 
> enums also used by the TypeInfos for the builtin types.
>
> I consider this behaviour a bug that should be fixed, as it also disallows 
> other usages of the RTInfo.

I'm pretty sure it's a bug, too.

>
> There is also an issue with not creating RTInfo for associative array types, 
> but there is an "easy" workaround.
>
> b. there are already other application of the RTInfo template, so there should 
> probably be some way to combine multiple "generators". My idea is to let 
> RTInfo!T point to some immutable(RTInfoData) struct that can then have 
> multiple members for different generators. It'd be nice to change the return 
> type of TypeInfo.rtInfo() from void* to immutable(RTInfoData)* to avoid casting.

I don't think the casting is a problem, and I like the self-documenting aspect 
of void* making clear that the compiler doesn't know or care what that data 
actually is.

>
> c. The GC interface has to be extended to pass type information to
> the GC. I guess just passing the respective TypeInfo pointer is obvious and 
> correct.

Yup.

>
> d. The alternative to using a pointer bitmap alongside the pool memory is to 
> store the TypeInfo pointer instead. It's major advantage is that it adds 
> minimal performance overhead to the allocation.

But it'll subtract performance from the scanner. Which is better can only be 
determined with testing.

>
> d1. This needs more memory for small allocations, but less for larger.

Storing the bitmap is a fixed 128/64 bytes for a page for 32/64 bit pointers. 
Notably for 64 bit pointers, it's only one pointer size. I think copying the 
bitmap is a net win, at least on size.

> If it is stored in the same memory as the allocation itself, it should be at 
> the end to avoid alignment issues (at the beginning it always adds 16 bytes). 
> This would mostly reuse unused memory due to the alignment of allocations to a 
> power of 2. The worst effect would be for allocations of just below or equal 
> to a power of 2. We could mitigate that effect by allowing other sizes of 
> allocations aswell, or by not reserving space for the TypeInfo pointer if the 
> block is allocated NOSCAN.

I think it should be stored separately. Storing it with the allocation is 
inefficient, as (for example) only 2 bits are needed for a 16 byte alloc. Also, 
storing it with the allocation precludes "extending" an allocation in place if 
the next chunk is free.

>
> d2. Both dynamic and associative arrays currently allocate memory chunks and 
> use them in a "non-standard" way that cannot be described by a simple 
> TypeInfo. For example, dynamic arrays keep track of the allocated size of the 
> array by placing it at the very end of allocations < 4k, but at the beginning 
> for allocations >= 4k, moving the data to an offset of 16 in the latter case. 
> Associative arrays combine hash-list-node, key and value into a single 
> allocation, sometimes even with the value type unavailable.
>
> My implementation solves these issues by "emplacing" the appropriate type 
> information at the given address, assuming pointers if no type information is 
> available (using the new gc_emplace function).
>
> If only a TypeInfo pointer is available, I'm not sure how this can be solved 
> without _allocating_ a new TypeInfo. My best guess is that
> it could be done if a generic array scan function could be called.
>
> d3. These leads to the idea to generate a scanning function for each type 
> instead of the pointer bitmap. Depending on the sparseness of pointers this 
> can be shorter or can create a lot of code bloat. As a compromise a generic 
> version might use the pointer bitmap, but it can be overloaded to implement 
> arrays or even unions. On the downside, it makes using std.emplace much more 
> complicated if you want precise scanning.

I spent some time thinking about this a while back. While it is a very 
attractive idea, I suspected the performance would be terrible as it would 
require two or more indirect jumps per chunk. The scanning code all needs to be 
present, in the cache, and predictable for high speed scanning.

>
> e. Currently, there is only one TypeInfo_Class for both a class instance and a 
> class reference. There are currently assumptions made which one is meant 
> depending on context (if used as a "root" it is usually an instance, when 
> following TypeInfo.next it is assumed a reference). This does no longer work 
> reliably when combined with "emplace". I think we need to add 
> TypeInfo_Reference to describe the field that is a pointer to a class 
> instance. To be honest I have no idea how much code might break by adding this 
> indirection when traversing type information.
>
> f. Currently, the precise GC can be versioned in/out, but I think making it a 
> runtime option is preferable. A problem here is that the user has no way to 
> change the default configuration before the GC is initialized.
> I could imagine always starting with the precise GC, but the user can opt out 
> anytime. The GC could then release any additional memory it has allocated for 
> precise scanning. If the user opts back in, the data structures are rebuilt 
> assuming everything allocated so far as void[].

I don't think a runtime option is practical. It would be so "expert only" that 
those experts should be able to rebuild the library as required.

>
> Another option might be to make it a link time option, but that would mean the 
> "standard" gc could not be part of the runtime library to be exchangeable.

Users really only want one gc.

>
>
> This text is already too long, so I postpone discussing the global/tls data 
> section until later.
>
> Summing up the questions raised:
>
> 1. Assuming RTInfo generation is fixed, should we go for adding an indiretion 
> through RTInfoData to allow multiple "generators"?

No. Indirection makes it slower.

>
> 2. Are you ok with extending the gc_* function with TypeInfo parameters where 
> appropriate?

Yes.

>
> 3. Do you think the pointer bitmap aside the pool memory and adding gc_emplace 
> is ok? Or should we investigate other alternatives first?
>
> 4. Would you object adding TypeInfo_Reference?

Don't have a good answer for that.

>
> 5. How do you prefer enabling/disabling precise garbage collection? 
> Versioning/linking/runtime?

We should pick precise collection and commit to it. And then add Leandro's 
concurrent collector on top!