O(N) Garbage collection?

dsimcha dsimcha at yahoo.com
Fri Feb 18 21:03:27 PST 2011


I've been trying out D's new 64-bit compiler and a serious barrier to 
using it effectively seems to be abysmal garbage collection performance 
with large heaps. It seems like the time for a garbage collection to run 
scales linearly with the size of the heap *even if most of the heap is 
marked as NO_SCAN*.  I'm running a program with a heap size of ~6GB, 
almost all of which is strings (DNA sequences), which are not scanned by 
the GC.  It's spending most of its time in GC, based on pausing it every 
once in a while in GDB and seeing what's at the top of the stack.

Here's a test program and the results for a few runs.

import std.stdio, std.datetime, core.memory, std.conv;

void main(string[] args) {
     if(args.length < 2) {
         stderr.writeln("Need size.");
         return;
     }

     immutable mul = to!size_t(args[1]);
     auto ptr = GC.malloc(mul * 1_048_576, GC.BlkAttr.NO_SCAN);

     auto sw = StopWatch(autoStart);
     GC.collect();
     immutable msec = sw.peek.msecs;
     writefln("Collected a %s megabyte heap in %s milliseconds.",
         mul, msec);
}

Outputs for various sizes:

Collected a 10 megabyte heap in 1 milliseconds.
Collected a 50 megabyte heap in 4 milliseconds.
Collected a 200 megabyte heap in 16 milliseconds.
Collected a 500 megabyte heap in 41 milliseconds.
Collected a 1000 megabyte heap in 80 milliseconds.
Collected a 5000 megabyte heap in 397 milliseconds.
Collected a 10000 megabyte heap in 801 milliseconds.
Collected a 30000 megabyte heap in 2454 milliseconds.
Collected a 50000 megabyte heap in 4096 milliseconds.

Note that these tests were run on a server with over 100 GB of physical 
RAM, so a shortage of physical memory isn't the problem.  Shouldn't GC 
be O(1) with respect to the size of the unscanned portion of the heap?


More information about the Digitalmars-d mailing list