D GC theory
Sativa via Digitalmars-d
digitalmars-d at puremagic.com
Tue Feb 24 07:22:01 PST 2015
On Tuesday, 24 February 2015 at 08:39:02 UTC, Kagamin wrote:
> On Tuesday, 24 February 2015 at 00:30:43 UTC, ketmar wrote:
>> On Mon, 23 Feb 2015 21:11:22 +0000, Sativa wrote:
>>
>>> How hard would it be to modify D's GC to do the following two
>>> things:
>>>
>>> 1. Scan the heap in the BG on another thread/cpu for
>>> compactification.
>>
>> needs read/write barriers added to generated code. a major
>> slowdown for
>> ALL memory access.
>
> Only modifications of pointers, which introduce new cross-block
> dependencies (so that GC knows to recheck the new dependency).
> Other memory access goes without slowdown.
But this type of thinking is the reason why the current GC is in
the state it is.
The compiler knows which pointers are "free" and which ones are
"bound". Bound pointers are pointers that are not assigned freely
by the user. e.g., a pointer to an array who's address never is
arbitrarily set by the user is bound. The compiler knows where
and how the pointer is assigned. Most pointers are this way.
Bound pointers are pointers the GC can easily clean up because it
knows when and how they are used. In this way, if all pointers of
a program were bound, the GC can work in the background and never
pause the state to clean up. (essentially the compiler would need
to insert special code) most pointers are bound pointers.
Free pointers are more difficult as they can, say, be randomly
initiated and point to anywhere on the heap and have to be looked
in a locked way. (to prevent them changing in the middle of some
GC operation)
But if one distinguishes bound and free pointers(Easily done with
a bit in the pointers) and has the compiler keep track of when
free pointers are used(by having a dirty bit when they are
written to), then one can more easily scan the heap in the
background.
In fact, one could potentially get away from all synching issues
by doing the following:
When ever free pointers are used a simple spin lock is used. The
spin lock checks a flag in the free pointers table that signals
that a pointer is being changed by the code. When this is true,
the free pointers table is in a state of flux and can't be relied
on. In the mean time, the GC can build up information about the
heap for the bound pointers. It can figure out what needs to be
changed, setup buffering(which can be done using bits in the
pointer), etc all in the background because the bound pointers
are "stable" and deterministically change.
When the free pointers table's dirty flag is unset it means that
the free pointers are not changing in the program and the GC can
lock the table using another flag. When the flag is set the spin
lock kicks in and pauses the program while the GC is working on
the free pointers table. (or to be more efficient, the program
can yield to some other background task code)
By having multiple tables of free pointers one can reduce the
overhead. The GC looks at on a piece at a time and locks on a
fraction of the code at any point in time. The compiler can
distribute the locks vs pages in an optimized way through
profiling.
More information about the Digitalmars-d
mailing list