D GC theory

Tue Feb 24 07:22:01 PST 2015

On Tuesday, 24 February 2015 at 08:39:02 UTC, Kagamin wrote:
> On Tuesday, 24 February 2015 at 00:30:43 UTC, ketmar wrote:
>> On Mon, 23 Feb 2015 21:11:22 +0000, Sativa wrote:
>>
>>> How hard would it be to modify D's GC to do the following two 
>>> things:
>>> 
>>> 1. Scan the heap in the BG on another thread/cpu for 
>>> compactification.
>>
>> needs read/write barriers added to generated code. a major 
>> slowdown for
>> ALL memory access.
>
> Only modifications of pointers, which introduce new cross-block 
> dependencies (so that GC knows to recheck the new dependency). 
> Other memory access goes without slowdown.

But this type of thinking is the reason why the current GC is in 
the state it is.

The compiler knows which pointers are "free" and which ones are 
"bound". Bound pointers are pointers that are not assigned freely 
by the user. e.g., a pointer to an array who's address never is 
arbitrarily set by the user is bound. The compiler knows where 
and how the pointer is assigned. Most pointers are this way.

Bound pointers are pointers the GC can easily clean up because it 
knows when and how they are used. In this way, if all pointers of 
a program were bound, the GC can work in the background and never 
pause the state to clean up. (essentially the compiler would need 
to insert special code) most pointers are bound pointers.

Free pointers are more difficult as they can, say, be randomly 
initiated and point to anywhere on the heap and have to be looked 
in a locked way. (to prevent them changing in the middle of some 
GC operation)

But if one distinguishes bound and free pointers(Easily done with 
a bit in the pointers) and has the compiler keep track of when 
free pointers are used(by having a dirty bit when they are 
written to), then one can more easily scan the heap in the 
background.

In fact, one could potentially get away from all synching issues 
by doing the following:

When ever free pointers are used a simple spin lock is used. The 
spin lock checks a flag in the free pointers table that signals 
that a pointer is being changed by the code. When this is true, 
the free pointers table is in a state of flux and can't be relied 
on. In the mean time, the GC can build up information about the 
heap for the bound pointers. It can figure out what needs to be 
changed, setup buffering(which can be done using bits in the 
pointer), etc all in the background because the bound pointers 
are "stable" and deterministically change.

When the free pointers table's dirty flag is unset it means that 
the free pointers are not changing in the program and the GC can 
lock the table using another flag. When the flag is set the spin 
lock kicks in and pauses the program while the GC is working on 
the free pointers table. (or to be more efficient, the program 
can yield to some other background task code)

By having multiple tables of free pointers one can reduce the 
overhead. The GC looks at on a piece at a time and locks on a 
fraction of the code at any point in time. The compiler can 
distribute the locks vs pages in an optimized way through 
profiling.