The problem with the D GC

Sean Kelly sean at f4.ca
Mon Jan 8 09:04:38 PST 2007


Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, 
> I've unfortunately had to come to the conclusion that the D GC is not 
> ready for production use. The problem is what I'd call "spurious 
> pointers". That is random data (strings, numbers, image data, audio or 
> whatever) appearing to the GC to be full of pointers to all over the 
> memory space.
> 
> Consider this simple program. It is designed to have a memory footprint 
> of about 20 mb and then continuously process data.
> 
> import std.random;
> 
> void main() {
>         // The real memory use, ~20 mb
>         uint[] data;
>         data.length = 5_000_000;
>         foreach(inout x; data)
>                 x = rand();
>         while(1) {
>         // simulate reading a few kb of data
>                 uint[] incoming;
>                 incoming.length = 1000 + rand() % 5000;
>                 foreach(inout x; incoming)
>                         x = rand();
>                 // do something with the data...
>         }
> }
> 
> The result may not be as expected. The program will use up all available 
> memory (for me crashing at about 2.7 gb of memory usage) and at the same 
> time run extremely slow due to the panicked GC scanning all memory over 
> and over.
> 
> The reason is the 20 mb of random data and the small 32-bit memory 
> address range of 4 GB. To understand how bad this is, 20 mb of random 
> data will result in _each_ 4k memory page on average having 5 random 
> pointers to it. Those spurious pointers are laying a dense mine-field 
> effectively disabling the GC.
> 
> This means that each time you rely on the GC (array appending/resizing, 
> Phobos function calls etc), you have a potential memory leak. (That is 
> unless all the program data is nothing but valid pointers/references or 
> all non-pointer data is hidden from the GC.)
> 
> The above program is of course just a toy illustrating the phenomena. In 
> a text processing program of mine the bulk of the data is short char[] 
> strings. The program still has runaway memory leaks leading to an 
> inevitable crash. I have absolutely no idea how to handle text 
> processing using the D recommended char[] and CoW idiom without getting 
> severe memory leaks.
> 
> The definite solution has to be a GC that only scans memory containing 
> pointers. Sean's patches to make the GC skip scanning memory known to 
> contain elements smaller than sizeof(void*) will probably help 
> tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
> float or double data, or the DMD associative array implementation)

Since the patch keys on element size, the above code would still leak 
horribly by default.  However, the user can set/clear this "no scan" 
flag explicitly, so if there are any memory blocks that are still 
scanned by default, you can indicate that the GC should not scan them. 
I think between the two, we should be in pretty good shape.


Sean



More information about the Digitalmars-d mailing list