The problem with the D GC

Oskar Linde oskar.lindeREM at OVEgmail.com
Mon Jan 8 04:22:00 PST 2007


After having fought a while with D programs with runaway memory leaks, 
I've unfortunately had to come to the conclusion that the D GC is not 
ready for production use. The problem is what I'd call "spurious 
pointers". That is random data (strings, numbers, image data, audio or 
whatever) appearing to the GC to be full of pointers to all over the 
memory space.

Consider this simple program. It is designed to have a memory footprint 
of about 20 mb and then continuously process data.

import std.random;

void main() {
         // The real memory use, ~20 mb
         uint[] data;
         data.length = 5_000_000;
         foreach(inout x; data)
                 x = rand();
         while(1) {
		// simulate reading a few kb of data
                 uint[] incoming;
                 incoming.length = 1000 + rand() % 5000;
                 foreach(inout x; incoming)
                         x = rand();
                 // do something with the data...
         }
}

The result may not be as expected. The program will use up all available 
memory (for me crashing at about 2.7 gb of memory usage) and at the same 
time run extremely slow due to the panicked GC scanning all memory over 
and over.

The reason is the 20 mb of random data and the small 32-bit memory 
address range of 4 GB. To understand how bad this is, 20 mb of random 
data will result in _each_ 4k memory page on average having 5 random 
pointers to it. Those spurious pointers are laying a dense mine-field 
effectively disabling the GC.

This means that each time you rely on the GC (array appending/resizing, 
Phobos function calls etc), you have a potential memory leak. (That is 
unless all the program data is nothing but valid pointers/references or 
all non-pointer data is hidden from the GC.)

The above program is of course just a toy illustrating the phenomena. In 
a text processing program of mine the bulk of the data is short char[] 
strings. The program still has runaway memory leaks leading to an 
inevitable crash. I have absolutely no idea how to handle text 
processing using the D recommended char[] and CoW idiom without getting 
severe memory leaks.

The definite solution has to be a GC that only scans memory containing 
pointers. Sean's patches to make the GC skip scanning memory known to 
contain elements smaller than sizeof(void*) will probably help 
tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
float or double data, or the DMD associative array implementation)

-- 
/Oskar



More information about the Digitalmars-d mailing list