The problem with the D GC
Oskar Linde
oskar.lindeREM at OVEgmail.com
Mon Jan 8 04:22:00 PST 2007
After having fought a while with D programs with runaway memory leaks,
I've unfortunately had to come to the conclusion that the D GC is not
ready for production use. The problem is what I'd call "spurious
pointers". That is random data (strings, numbers, image data, audio or
whatever) appearing to the GC to be full of pointers to all over the
memory space.
Consider this simple program. It is designed to have a memory footprint
of about 20 mb and then continuously process data.
import std.random;
void main() {
// The real memory use, ~20 mb
uint[] data;
data.length = 5_000_000;
foreach(inout x; data)
x = rand();
while(1) {
// simulate reading a few kb of data
uint[] incoming;
incoming.length = 1000 + rand() % 5000;
foreach(inout x; incoming)
x = rand();
// do something with the data...
}
}
The result may not be as expected. The program will use up all available
memory (for me crashing at about 2.7 gb of memory usage) and at the same
time run extremely slow due to the panicked GC scanning all memory over
and over.
The reason is the 20 mb of random data and the small 32-bit memory
address range of 4 GB. To understand how bad this is, 20 mb of random
data will result in _each_ 4k memory page on average having 5 random
pointers to it. Those spurious pointers are laying a dense mine-field
effectively disabling the GC.
This means that each time you rely on the GC (array appending/resizing,
Phobos function calls etc), you have a potential memory leak. (That is
unless all the program data is nothing but valid pointers/references or
all non-pointer data is hidden from the GC.)
The above program is of course just a toy illustrating the phenomena. In
a text processing program of mine the bulk of the data is short char[]
strings. The program still has runaway memory leaks leading to an
inevitable crash. I have absolutely no idea how to handle text
processing using the D recommended char[] and CoW idiom without getting
severe memory leaks.
The definite solution has to be a GC that only scans memory containing
pointers. Sean's patches to make the GC skip scanning memory known to
contain elements smaller than sizeof(void*) will probably help
tremendously. (I'd just have to make sure I'm not using dchar[] strings,
float or double data, or the DMD associative array implementation)
--
/Oskar
More information about the Digitalmars-d
mailing list