[RFC]Proposal for better garbage collection

Thu Feb 23 06:11:55 PST 2012

On Wed, 22 Feb 2012 15:32:37 -0600, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:
> On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
> [...]
>> 2) Tracking references on the stack:
>>
>> The D compiler always needs to emit a full stack frame so that the
>> GC can walk up the stack at any time in the program. The stack frame
>> of every function generated by the D compiler starts which a
>> bitfield (usually the size of a machine register) where each bit
>> indicates that these bytes are a pointer / reference. The bitfield
>> needs to be large enough to cover the whole stack frame of the
>> function.
> [...]
>
> I was thinking about this a bit more, and I had an idea: why bother with
> storing bitfields on the stack? Any function's local pointer variables
> are known at compile-time. So store a function ID (probably a pointer)
> that maps to some static storage where this information is stored. Then
> we always only need 1 word of extra storage on the stack frame, and the
> GC can follow the pointer to get the info it needs. A recursively called
> function won't incur the cost of duplicated copies of bitfields, its ID
> points to same place. You can even have two different functions share
> the same ID if they have pointer variables in exactly the same places.
>
> The static storage can then be an array of relative stack offsets to the
> function's pointer variables, so the GC can easily use this info to find
> roots. No need to complicate the GC with manipulating bitfields, it's
> just an int[].
>
> If you want to get fancy, have the compiler reorder local variables so
> that pointers are clustered together in blocks, then in the static
> storage you can just encode pointer blocks by offset+length. (Although
> this may not help much with (pointer,length) pairs on the stack, like
> slices.) Or the compiler can reorder variables to maximize ID merges.
>
> The same thing can be done for scopes, since their local variables are
> also all known at compile-time.
>
>
> T

The break even point between bit-fields and pointers is 512 bytes. Although, if one is thinking about on stack storage this probably doesn't matter since for alignment purposes you'll always end up using at least 1 word if not 2. However, a lot of functions use less then 512 (or 1024) bytes of of stack space. I'd think it would be much more space efficient to have a separate bitfield for the stack. Cache efficiency should be about the same as a on stack representation, and scanning would, in theory, be quicker. IIRC, a separate bit-field was the approach used by at least one precise C GC.