The problem with the D GC
Lionello Lunesu
lio at lunesu.remove.com
Mon Jan 8 05:15:19 PST 2007
Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks,
> I've unfortunately had to come to the conclusion that the D GC is not
> ready for production use. The problem is what I'd call "spurious
> pointers". That is random data (strings, numbers, image data, audio or
> whatever) appearing to the GC to be full of pointers to all over the
> memory space.
>
> Consider this simple program. It is designed to have a memory footprint
> of about 20 mb and then continuously process data.
>
> import std.random;
>
> void main() {
> // The real memory use, ~20 mb
> uint[] data;
> data.length = 5_000_000;
> foreach(inout x; data)
> x = rand();
> while(1) {
> // simulate reading a few kb of data
> uint[] incoming;
> incoming.length = 1000 + rand() % 5000;
> foreach(inout x; incoming)
> x = rand();
> // do something with the data...
> }
> }
>
> The result may not be as expected. The program will use up all available
> memory (for me crashing at about 2.7 gb of memory usage) and at the same
> time run extremely slow due to the panicked GC scanning all memory over
> and over.
>
> The reason is the 20 mb of random data and the small 32-bit memory
> address range of 4 GB. To understand how bad this is, 20 mb of random
> data will result in _each_ 4k memory page on average having 5 random
> pointers to it. Those spurious pointers are laying a dense mine-field
> effectively disabling the GC.
>
> This means that each time you rely on the GC (array appending/resizing,
> Phobos function calls etc), you have a potential memory leak. (That is
> unless all the program data is nothing but valid pointers/references or
> all non-pointer data is hidden from the GC.)
>
> The above program is of course just a toy illustrating the phenomena. In
> a text processing program of mine the bulk of the data is short char[]
> strings. The program still has runaway memory leaks leading to an
> inevitable crash. I have absolutely no idea how to handle text
> processing using the D recommended char[] and CoW idiom without getting
> severe memory leaks.
>
> The definite solution has to be a GC that only scans memory containing
> pointers. Sean's patches to make the GC skip scanning memory known to
> contain elements smaller than sizeof(void*) will probably help
> tremendously. (I'd just have to make sure I'm not using dchar[] strings,
> float or double data, or the DMD associative array implementation)
I've run into similar problems back when I was messing around with the
Universal Machine for that programing contest. It would run slower and
slower. Skipping GC checks on arrays without pointers is a must, if you
ask me.
L.
More information about the Digitalmars-d
mailing list