Type Aware GC Questions

Mon Jan 29 23:12:48 PST 2007

Sean Kelly wrote:
> Frank's suggestion seems reasonable.  There's no way to pass a 'mask' to 
> the GC for a specific block at the moment.
> 

I think some GC designs have exactly that.  I've thought about this and 
it seems like two cases should cover most of it:

1. Short mask that repeats (to cover larger blocks).
2. Number that describes how much of the class contains pointers.

Item #1 would be a mask of each pointer-sized slice of memory.  The 
repeat part means that (for example) an array of struct can be 
represented by one mask, overlaying the size of the struct.

All arrays of non-pointer primitives could be represented as '0' sz=1.
Arrays of arrays could be represented as '01' or '10', sz=2, depending 
on whether the pointer comes first.

Small struct items (maybe under 16 words) could share the masks to 
reduce duplication.

Item #2 allows allocations like this, which are common in C:

#define NUM 100

struct Foo {
     int a;
     int * b;
     char * d;
     char buf[NUM];
};

The "size" lets the 'buf' section not be represented in the mask, 
however because there may be arrays of Foo, it would be good to combine 
the mask-size and repeat-size elements to use short masks for long 
objects and still repeat at long intervals.

A corollary of #2 is that the definitions of most classes could be 
'sorted' to move pointy stuff toward the top.  This would be 'soft' ie 
some things cant be rearranged (internal struct members, arrays) even in 
a class.  It would have two nice effects:

1. Sorted classes will tend to use far fewer mask combinations.
2. GC runs can find all class pointers in fewer L1/L2 cache lines, 
because the pointers are packed.

Both of these are a strict 'extension' of the current 'might have' bit, 
since "might_have" represents '1', sz=1, and 'might_not' represents '0', 
sz=1.

Automatic (and free) memory consistency checking (maybe):

Another neat idea --- you can introduce a 'must-have-pointer' marker for 
the field mask, which means this field *must* be a valid pointer or null 
(in debug mode only, or maybe only if enabled by the user?).

This would provide across the board pointer checking (a la valgrind) for 
essentially zero costs (but only at GC time), because a conservative GC 
must already be able to test if a pointer is within the allocated 
heapspace in order to do its mark/sweep, right?

Either the pointer points to a valid object, or you get an assert.  But 
this would need to special case around deleted objects in case the 
pointer aliasing is known-but-harmless.

Kevin