O(N) Garbage collection?

Steven Schveighoffer schveiguy at yahoo.com
Sat Feb 19 14:34:38 PST 2011


On Sat, 19 Feb 2011 00:03:27 -0500, dsimcha <dsimcha at yahoo.com> wrote:

> I've been trying out D's new 64-bit compiler and a serious barrier to  
> using it effectively seems to be abysmal garbage collection performance  
> with large heaps. It seems like the time for a garbage collection to run  
> scales linearly with the size of the heap *even if most of the heap is  
> marked as NO_SCAN*.  I'm running a program with a heap size of ~6GB,  
> almost all of which is strings (DNA sequences), which are not scanned by  
> the GC.  It's spending most of its time in GC, based on pausing it every  
> once in a while in GDB and seeing what's at the top of the stack.
>
> Here's a test program and the results for a few runs.
>
> import std.stdio, std.datetime, core.memory, std.conv;
>
> void main(string[] args) {
>      if(args.length < 2) {
>          stderr.writeln("Need size.");
>          return;
>      }
>
>      immutable mul = to!size_t(args[1]);
>      auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);
>
>      auto sw = StopWatch(autoStart);
>      GC.collect();
>      immutable msec = sw.peek.msecs;
>      writefln("Collected a %s megabyte heap in %s milliseconds.",
>          mul, msec);
> }
>
> Outputs for various sizes:
>
> Collected a 10 megabyte heap in 1 milliseconds.
> Collected a 50 megabyte heap in 4 milliseconds.
> Collected a 200 megabyte heap in 16 milliseconds.
> Collected a 500 megabyte heap in 41 milliseconds.
> Collected a 1000 megabyte heap in 80 milliseconds.
> Collected a 5000 megabyte heap in 397 milliseconds.
> Collected a 10000 megabyte heap in 801 milliseconds.
> Collected a 30000 megabyte heap in 2454 milliseconds.
> Collected a 50000 megabyte heap in 4096 milliseconds.
>
> Note that these tests were run on a server with over 100 GB of physical  
> RAM, so a shortage of physical memory isn't the problem.  Shouldn't GC  
> be O(1) with respect to the size of the unscanned portion of the heap?

Having recently constructed the GC model in my head (and it's rapidly  
deteriorating from there, believe me), here is a stab at what I see could  
be a bottleneck.

The way the GC works is you have this giant loop (in pseudocode):
bool changed;
while(changed)
{
    changed = false;

    foreach(memblock in heap)
    {
       if(memblock.marked && memblock.containsPointers)
          foreach(pointer in memblock)
          {
             auto memblock2 = heap.findBlock(pointer);
             if(memblock2 && !memblock2.marked)
             {
                memblock2.mark();
                changed = true;
             }
          }
    }
}

So you can see two things.  First, every iteration of the outer while loop  
loops through *all* memory blocks, even if they do not contain pointers.   
This has a non-negligible cost.
Second, there looks like the potential for the while loop to mark one, or  
at least a very small number, of blocks, so the algorithm worst case  
degenerates into O(n^2).  This may not be happening, but it made me a  
little uneasy.

The part in my mind that already deteriorated is whether marked blocks  
which have already been scanned are scanned again.  I would guess not, but  
if that's not the case, that would be a really easy thing to fix.

Also note that the findPointer function is a binary search I think.  So  
you are talking O(lg(n)) there, not O(1).

Like I said, I may not remember exactly how it works.

-steve


More information about the Digitalmars-d mailing list