GC.forget() (Was: O(N) Garbage collection?)

Sat Feb 19 15:17:16 PST 2011

Yeh, I rebuilt the same model in my head over the past few hours (like 
you, I had a good mental model of the GC at one point but have slowly 
forgotten it).  Unfortunately it looks like there's no easy fix.  It 
also seems like gcbits are allocated as 1 bit for every 16 bytes of heap 
space, no matter what, and that a scan touches all of these, no matter 
what.  This is another place for O(N) to creep in.  About the only thing 
that would fix these is a rewrite, since addressing these problems would 
have ripple effects everywhere in the GC.

The obvious but ugly workaround is to use the C heap for large, 
long-lived data structures, especially ones that live until the program 
terminates (because then you don't have to worry about freeing them). 
This sucks horribly, though, because it forces you to think about memory 
management in exactly the kind of way the GC is meant to automate, and 
prevents the use of standard library functions and builtin array 
appending, etc. for building said data structures.

Maybe what we need to remedy this is a function called GC.forget(). 
This function would allow you to build your data structures as regular 
GC objects.  Then you'd call forget on the head pointer, and the GC 
would remove all information about the data structure (or move it to a 
completely unscanned, etc. forgottenData data structure, in which case 
you could call remind() to bring it back to normal GC memory), 
transitively if it's something more than just a simple array.  The data 
would now act as if it was allocated by the C heap.

In a way this is poor man's generational GC.  It's poor man's in the 
sense that it requires the user to manually specify which objects are 
long-lived.  On the other hand, it could be better than generational GC 
in some ways because you wouldn't need tons of compiler infrastructure 
or pay tons of performance penalties to update the GC data structures on 
simple pointer assignment.

Then again, I don't imagine GC.forget being particularly easy to 
implement either, except in the simplest case where you have a huge 
array that takes up a whole pool.  You'd have to do things like split 
pools in half if forget() was called on an object in the middle of a pool.

On 2/19/2011 5:34 PM, Steven Schveighoffer wrote:
> On Sat, 19 Feb 2011 00:03:27 -0500, dsimcha <dsimcha at yahoo.com> wrote:
>
>> I've been trying out D's new 64-bit compiler and a serious barrier to
>> using it effectively seems to be abysmal garbage collection
>> performance with large heaps. It seems like the time for a garbage
>> collection to run scales linearly with the size of the heap *even if
>> most of the heap is marked as NO_SCAN*. I'm running a program with a
>> heap size of ~6GB, almost all of which is strings (DNA sequences),
>> which are not scanned by the GC. It's spending most of its time in GC,
>> based on pausing it every once in a while in GDB and seeing what's at
>> the top of the stack.
>>
>> Here's a test program and the results for a few runs.
>>
>> import std.stdio, std.datetime, core.memory, std.conv;
>>
>> void main(string[] args) {
>> if(args.length < 2) {
>> stderr.writeln("Need size.");
>> return;
>> }
>>
>> immutable mul = to!size_t(args[1]);
>> auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);
>>
>> auto sw = StopWatch(autoStart);
>> GC.collect();
>> immutable msec = sw.peek.msecs;
>> writefln("Collected a %s megabyte heap in %s milliseconds.",
>> mul, msec);
>> }
>>
>> Outputs for various sizes:
>>
>> Collected a 10 megabyte heap in 1 milliseconds.
>> Collected a 50 megabyte heap in 4 milliseconds.
>> Collected a 200 megabyte heap in 16 milliseconds.
>> Collected a 500 megabyte heap in 41 milliseconds.
>> Collected a 1000 megabyte heap in 80 milliseconds.
>> Collected a 5000 megabyte heap in 397 milliseconds.
>> Collected a 10000 megabyte heap in 801 milliseconds.
>> Collected a 30000 megabyte heap in 2454 milliseconds.
>> Collected a 50000 megabyte heap in 4096 milliseconds.
>>
>> Note that these tests were run on a server with over 100 GB of
>> physical RAM, so a shortage of physical memory isn't the problem.
>> Shouldn't GC be O(1) with respect to the size of the unscanned portion
>> of the heap?
>
> Having recently constructed the GC model in my head (and it's rapidly
> deteriorating from there, believe me), here is a stab at what I see
> could be a bottleneck.
>
> The way the GC works is you have this giant loop (in pseudocode):
> bool changed;
> while(changed)
> {
> changed = false;
>
> foreach(memblock in heap)
> {
> if(memblock.marked && memblock.containsPointers)
> foreach(pointer in memblock)
> {
> auto memblock2 = heap.findBlock(pointer);
> if(memblock2 && !memblock2.marked)
> {
> memblock2.mark();
> changed = true;
> }
> }
> }
> }
>
> So you can see two things. First, every iteration of the outer while
> loop loops through *all* memory blocks, even if they do not contain
> pointers. This has a non-negligible cost.
> Second, there looks like the potential for the while loop to mark one,
> or at least a very small number, of blocks, so the algorithm worst case
> degenerates into O(n^2). This may not be happening, but it made me a
> little uneasy.
>
> The part in my mind that already deteriorated is whether marked blocks
> which have already been scanned are scanned again. I would guess not,
> but if that's not the case, that would be a really easy thing to fix.
>
> Also note that the findPointer function is a binary search I think. So
> you are talking O(lg(n)) there, not O(1).
>
> Like I said, I may not remember exactly how it works.
>
> -steve