The problem with the D GC

Lionello Lunesu lio at lunesu.remove.com
Mon Jan 8 05:15:19 PST 2007


Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, 
> I've unfortunately had to come to the conclusion that the D GC is not 
> ready for production use. The problem is what I'd call "spurious 
> pointers". That is random data (strings, numbers, image data, audio or 
> whatever) appearing to the GC to be full of pointers to all over the 
> memory space.
> 
> Consider this simple program. It is designed to have a memory footprint 
> of about 20 mb and then continuously process data.
> 
> import std.random;
> 
> void main() {
>         // The real memory use, ~20 mb
>         uint[] data;
>         data.length = 5_000_000;
>         foreach(inout x; data)
>                 x = rand();
>         while(1) {
>         // simulate reading a few kb of data
>                 uint[] incoming;
>                 incoming.length = 1000 + rand() % 5000;
>                 foreach(inout x; incoming)
>                         x = rand();
>                 // do something with the data...
>         }
> }
> 
> The result may not be as expected. The program will use up all available 
> memory (for me crashing at about 2.7 gb of memory usage) and at the same 
> time run extremely slow due to the panicked GC scanning all memory over 
> and over.
> 
> The reason is the 20 mb of random data and the small 32-bit memory 
> address range of 4 GB. To understand how bad this is, 20 mb of random 
> data will result in _each_ 4k memory page on average having 5 random 
> pointers to it. Those spurious pointers are laying a dense mine-field 
> effectively disabling the GC.
> 
> This means that each time you rely on the GC (array appending/resizing, 
> Phobos function calls etc), you have a potential memory leak. (That is 
> unless all the program data is nothing but valid pointers/references or 
> all non-pointer data is hidden from the GC.)
> 
> The above program is of course just a toy illustrating the phenomena. In 
> a text processing program of mine the bulk of the data is short char[] 
> strings. The program still has runaway memory leaks leading to an 
> inevitable crash. I have absolutely no idea how to handle text 
> processing using the D recommended char[] and CoW idiom without getting 
> severe memory leaks.
> 
> The definite solution has to be a GC that only scans memory containing 
> pointers. Sean's patches to make the GC skip scanning memory known to 
> contain elements smaller than sizeof(void*) will probably help 
> tremendously. (I'd just have to make sure I'm not using dchar[] strings, 
> float or double data, or the DMD associative array implementation)

I've run into similar problems back when I was messing around with the 
Universal Machine for that programing contest. It would run slower and 
slower. Skipping GC checks on arrays without pointers is a must, if you 
ask me.

L.



More information about the Digitalmars-d mailing list