Transitioning to a type aware Garbage Collector

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Tue Jan 23 12:33:41 PST 2007


Sean Kelly wrote:
> Paul Findlay wrote:
>> Walter Bright wrote:
>>> To improve GC performance, we need to transition to a GC that is 
>>> aware of the types of what it is allocating.
>> Does this mean the D garbage collector will be an exact (or precise) 
>> garbage collector?
> 
> If a block contains pointers then the entire block will still be scanned 
> instead of just the pointers in that block, so It will still be a 
> conservative collector but with fewer false positives.

I thought that by "a GC that is aware of the types of what it is 
allocating" Walter meant a (mostly) precise GC.

In most cases, the locations of pointers can be deduced from the type, 
as long as the compiler generates appropriate RTTI. The exceptions (that 
I'm aware of) are unions with overlapping pointers and non-pointers, and 
void[]s (deliberately).

Even types of variables in stack frames can be determined I think, as 
long as you format them in the standard way (return eip & previous ebp 
at the start). Then it's basically a linked list:
-----
struct StackFrame
{
     // local variables are *before* this data (stack grows down)
     StackFrame* ebp;    // pointer to next StackFrame
     size_t eip;         // return address
}
-----
If you do a lookup on eip you should be able to determine what variables 
(or at least their "pointerness") the stack frame contains, with some 
help from compiler-generated metadata. Unknown stack frames (and data 
between the defined ranges of two consecutive stackframes) should be 
considered ambiguous.
Again, ambiguous union members and (static) void arrays are exceptions.

An alternative to eip-based lookup would be to push an additional value 
at the beginning of each function call, but that would
(a) break with non-D functions
and
(b) cost performance outside of the GC (even if the GC is disabled)

> I think this 
> could be changed so that just the pointers were scanned, but there would 
> likely be a noticeable performance hit for doing so.

I'm not so sure about that "noticeable performance hit" you speak of. I 
think that would depend on the format of the RTTI and the relative 
amounts of pointers & non-pointers. (the latter is application-specific, 
by the way...)
To be sure about this we'd need to run some benchmarks and see how it 
fares in practice...

> If false positives 
> are still a concern (and I'm not convinced they will be), a better 
> approach may be to pursue a different GC design such as the CBGC 
> mentioned yesterday.

False positives are a concern for some people; look at the thread "The 
problem with the D GC" in digitalmars.D[1]. I'm pretty sure you've seen 
it already. For one thing, it contains about 10 of your posts ;).

Note: I haven't read that article on CBGC yet, but I plan to.


[1]: digitalmars.D:46407, 
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=46407



More information about the Digitalmars-d-announce mailing list