The problem with the D GC

Johan Granberg lijat.meREM at OVE.gmail.com
Mon Jan 8 04:44:41 PST 2007


Oskar Linde wrote:

> After having fought a while with D programs with runaway memory leaks,
> I've unfortunately had to come to the conclusion that the D GC is not
> ready for production use. The problem is what I'd call "spurious
> pointers". That is random data (strings, numbers, image data, audio or
> whatever) appearing to the GC to be full of pointers to all over the
> memory space.
> 
> Consider this simple program. It is designed to have a memory footprint
> of about 20 mb and then continuously process data.
> 
> import std.random;
> 
> void main() {
>          // The real memory use, ~20 mb
>          uint[] data;
>          data.length = 5_000_000;
>          foreach(inout x; data)
>                  x = rand();
>          while(1) {
> // simulate reading a few kb of data
>                  uint[] incoming;
>                  incoming.length = 1000 + rand() % 5000;
>                  foreach(inout x; incoming)
>                          x = rand();
>                  // do something with the data...
>          }
> }
> 
> The result may not be as expected. The program will use up all available
> memory (for me crashing at about 2.7 gb of memory usage) and at the same
> time run extremely slow due to the panicked GC scanning all memory over
> and over.
> 
> The reason is the 20 mb of random data and the small 32-bit memory
> address range of 4 GB. To understand how bad this is, 20 mb of random
> data will result in _each_ 4k memory page on average having 5 random
> pointers to it. Those spurious pointers are laying a dense mine-field
> effectively disabling the GC.
> 
> This means that each time you rely on the GC (array appending/resizing,
> Phobos function calls etc), you have a potential memory leak. (That is
> unless all the program data is nothing but valid pointers/references or
> all non-pointer data is hidden from the GC.)
> 
> The above program is of course just a toy illustrating the phenomena. In
> a text processing program of mine the bulk of the data is short char[]
> strings. The program still has runaway memory leaks leading to an
> inevitable crash. I have absolutely no idea how to handle text
> processing using the D recommended char[] and CoW idiom without getting
> severe memory leaks.
> 
> The definite solution has to be a GC that only scans memory containing
> pointers. Sean's patches to make the GC skip scanning memory known to
> contain elements smaller than sizeof(void*) will probably help
> tremendously. (I'd just have to make sure I'm not using dchar[] strings,
> float or double data, or the DMD associative array implementation)
> 

I have observed the same behavior but did not realize why it happened
(thought it was a gc bug on osx or something). Something that helped the
problem for me was to call fullCollect very often (40 times a second) this
reduced the leak from 1mb a second to almost nothing.



More information about the Digitalmars-d mailing list