Potential GSoC project - GC improvements

Sat Mar 12 17:56:39 PST 2016

On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

To start off, let's talk terminology. You seem to be using nonstandard 
terminology and possibly misunderstanding standard terminology.

A GC scan is the mark phase of a mark/sweep collector (and specifically 
the part where the GC examines roots and allocated memory, if that 
matters). The GC searches for pointers to GC memory. Simple GCs only need 
to record whether there is a pointer to a given allocation, not where 
those pointers are coming from.

A conservative GC is one that will encounter some things that aren't 
pointers but must be scanned as if they were pointers. In D, for 
instance, a union between a pointer and a size_t forces collectors to be 
conservative.

A false pointer is something that a conservative GC must treat as a 
pointer and appears to point to an object allocated by that GC. For 
instance, unions in D allow us to create false pointers.

A precise GC is a GC that never mistakes a non-pointer for a pointer.

A moving GC is a GC that can relocate objects.

A generational GC is one that can perform a collection on part of the 
heap without dealing with the full heap. Those parts tend to be organized 
by how long an object has survived. This does not require a precise or 
moving GC, though it works better with a precise, moving GC.

A write barrier is a way to record which sections of memory have been 
altered since the last GC collection. This lets the GC scan only the 
changed data -- useful to any GC, especially useful to generational ones.

A "partially moving" GC does not exist, as far as I know.

> Jeremy DeHaan wrote:
>> On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
>>> If I may make a suggestion. The lock free work is unlikely to require
>>> the entirety of GSoC. And the precise GC is the next most important
>>> thing on your list and will have the biggest impact on GC performance.
>>
>> Rainer has two different precise GC's in pull requests right now and
>> both are slower than the current one unless there are false pointers. 
>> I would expect anything I come up with to largely act the same. The
>> reason I keep pushing for a generational garbage collector is because I
>> think it would be where we would see the most benefit in terms of
>> general performance.
>>
>>
> I would contend that it's not quite that simple. Of course the precise
> GC is slower, it's scanning more stuff.

Precise GCs are slower because they must reference some data telling them 
which items on the stack and in allocations represent pointers and which 
do not. This data takes up cache space, and having less cache for memory 
being scanned slows things down. Correlating two pieces of data takes 
extra time, too.

The hope with a precise GC is to avoid false pointers. A false pointer to 
an object that holds references to many other objects can be costly, both 
in GC time and in excess memory usage.

In D, we have a tiny bit of type information surfaced to the GC to reduce 
the incidence of false pointers. We could additionally take advantage of 
the fact that numbers tend to be small and people tend to use 4 to 8 byte 
numbers: we could arrange our address space to avoid ranges where false 
pointers are likely. But that's probably unnecessary.

> The current conservative GC by
> definition can't figure out large chunks of memory so it won't even
> bother scanning them.

It must scan anything that might be a pointer. A precise GC reduces the 
number of things that might be pointers but aren't to zero. Therefore a 
precise GC scans less.

If a conservative GC didn't scan something that might be a pointer, it 
would sometimes free an object that still had live references to it. This 
would result in memory corruption or segmentation faults.

> What it does to those things it can't scan. And
> what it can't scan it has to pin, which means the memory can't be moved.

If a moving GC cannot precisely identify all pointers to an object, it is 
not allowed to move that object. If it can, it is allowed to move that 
object.

A conservative GC can be a moving GC. It must classify things as 
"definitely pointer", "maybe pointer", and "not pointer". It cannot move 
an object with a "maybe pointer" pointing to it, but it can move an 
object with several "definitely pointers" pointing to it.

In D, thanks to unions, we can never create a fully precise collector, 
but we can create a moving collector. Objects pointed to from a union 
can't be moved, but few objects will be pointed to from a union.

> You can't build a fully generational GC until you can move everything.

You can build a more efficient generational GC if you can move objects. 
However, it's not impossible to build a non-moving generational GC. 
Instead of having large blocks of address space for each generation, you 
would have each Pool indicate its generation. Then you have to find the 
Pool for a pointer and check that flag.

That's comparatively slow, but it'd still be faster to do a Gen 0 scan 
that way than to do a full scan with D's current collector (assuming you 
implement write barriers).

> They talk about this problem in the books. However, once you have a
> precise GC, a fully moving GC becomes possible and only then do the
> performance benefits of a generational GC become realized.

The performance benefit of a moving and generational GC over a non-moving 
generational GC is that you can have a contiguous section of address 
space for each generation, allocated densely. You get this with less 
overhead.

However, a moving GC can be a pessimization for user code. If you 
allocate several objects in sequence, with many GCs you'll get memory 
allocated for them that is contiguous or nearly contiguous. For small 
objects, this is probably going to be on the same page of memory. But a 
moving GC is free to move them to different pages. So you need to 
exercise caution.

>> I have concerns about doing any compaction
>> though, mostly because D can have both references and pointers to
>> objects, and I don't know how we would want to go about updating
>> pointers. Especially since pointers to D objects can exists in C and
>> C++
>> code.
>>
>>
> Erm, a generational GC is, in practice, a moving GC as it must move the
> blobs around from the various heaps.

The Boehm collector is generational, conservative, and not moving. You 
probably conflate "moving" and "generational" because of the JVM.

> As for the C/C++ code problem various solutions have been proposed here.
> For one, D knows the difference between the two, so pinning is possible
> there.

There's nothing in the type system that says whether you can pass a 
variable to a C/C++ function. So, no, D *doesn't* know the difference 
between them. And there's no GC.pin function, so people aren't currently 
telling your GC when something's accessible to C/C++ code.

That means a moving collector in D will produce segmentation faults and 
memory corruption in existing, currently working code.

> AFIAK each OS allows you to specify multiple heaps

You get one address space. You get to use mmap to ask for memory at a 
given address. There's no heap_t data structure to provide to malloc or 
mmap.

You might have been confused by literature referring to multiple heaps. 
That's just a single allocator deciding to divide its address space 
logically. There's no OS-level enforcement.

> A concurrent GC without precision is mostly useless

The Boehm collector is not precise. It is concurrent. Its concurrency 
reduces the amount of time a GC collection takes. This is mostly 
orthogonal to the benefits of a precise collector in reducing the amount 
of memory a process uses.

> because the GC roots
> are primarily derived from the stack which the thread was spawned and
> the stack roots are not scanned in a conservative GC.

They *are* scanned in a conservative GC. If you don't scan memory that 
might contain pointers (such as the stack), your GC will inevitably cause 
segmentation faults and memory corruption. D's GC is conservative and 
does not cause segmentation faults or memory corruption, therefore it 
must scan the stack.