The problem with the D GC

Tue Jan 9 11:09:15 PST 2007

Oskar Linde wrote:
> After having fought a while with D programs with runaway memory leaks, 
> I've unfortunately had to come to the conclusion that the D GC is not 
> ready for production use. The problem is what I'd call "spurious 
> pointers". That is random data (strings, numbers, image data, audio or 
> whatever) appearing to the GC to be full of pointers to all over the 
> memory space.

I feel kind of bad for making it sound like this is a problem related 
specifically to the D garbage collector. It is rather a general and 
apparently well known problem of all conservative garbage collectors. 
The D garbage collector is still excellent for a large domain of problems.

Lots of people seem to be having similar problems though, so a better 
understanding of under what conditions a conservative garbage collector 
will and will not work seems to be called for.

I'd guess that a lot of problems we have with the conservative GC is 
because we are at the end of the life time of the 32-bit address space. 
As time has passed, our programs have used a larger and larger portion 
of the available memory space. As one anonymous %u said, one solution 
would be "increasing the amount of memory needed for pointers by the 
factor two", which I interpret as moving to a 64-bit memory space.

I did some calculations to better understand what is going on. I've 
probably done several errors though.

<theoretical rambling>
The probability of one single uniformly random spurious pointer 
preventing the collection of an object of size s is

p = s/B,

with B being the size of the address space (2^b for b = 32 or 64 bits). 
On a 64 bit architecture, the probability is obviously drastically reduced.

if g is the amount (bytes) of independent random data an application 
needs, the risk of an object of size s not being collected is

P(g) = 1 - (1 - s/B) ^ (g/(b/8))

As one can see, the risk increases considerably as the objects gets 
bigger. Increasing the amount of random data the application contains, 
reduces the size of objects the GC will be able to handle satisfactory.

My example was kind of nasty in that it not only accumulated additional 
random garbage in each iteration, but also caused heap fragmentation. A 
kinder example would always create objects of the same size. The simpler 
example would be:

* start with a static set of g bytes of random data
* for each iteration create n objects of size s (for simplicity, let 
each object contain s bytes of random data)
* after each iteration, none of the n objects are used anymore. This is 
a good time to call gc.fullCollect();

After each such iteration, i, P_i*n_i objects will remain uncollected, 
and will be added to the pool of random spurious pointers. I disregard 
the small portion of objects that are uncollected because of spurious 
pointers appearing in other objects left uncollected in the same 
iteration. If this portion would be significant, you'd really have a 
problem anyway and this will show up in later iterations.

g_{i+1} ~ g_i + P_i*n_i*s.

I generously assume that the safe memory area of the collected objects 
are reused by the allocator in the next iteration. A few of those will 
become unsafe by the recently added spurious pointers from uncollected 
objects, but for the most part, those memory areas are now safe, and the 
objects allocated/collected from them can be disregarded. The number of 
objects that need to be allocated in unsafe memory areas for the next 
iteration becomes:

n_{i+1} = P_i*n_i + P(P_i*n_i*s)*(n_0 - n_i)

This will of course not perfectly predict the real behavior, as it 
relaxes the problem away from discrete units, but should still 
adequately model the sample application. Using this, I tried to find the 
breaking point, i.e. at what size objects need to be explicitly freed 
instead of left to the GC.

"Verification":
Rewriting my original example to to a std.gc.fullCollect() every 10th 
iteration, giving the following parameters:
g = 20 MB
n = 10

the model suggests allocating objects of size:
s = 2000 would result in almost no space overhead, stable at 20 mb
s = 3000 would result in a slight but stable space overhead,
s = 4000 would cause a run-away memory usage

running the program results in:
s = 2000 memory usage is stable at 20 mb
s = 3000 results in a small apparently unbounded memory leak
s = 4000 results in a unbounded memory leak

The model appears to be at least in the correct ballpark for this sample.

Theoretical results:
Those tables show the maximum object size the GC will be able to handle 
with different static amounts of "random" data. By "being able to 
handle", I mean that the application doesn't use more than 2x the 
required amount of memory. The exact breaking point is very unstable and 
going above it rapidly results in uncontrolled memory consumption.

32 bit arch, 100 objects

1 MB data	8000 bytes
5 MB data	4500 bytes
10 MB data	3000 bytes
20 MB data	2000 bytes
100 MB data	 700 bytes
500 MB data	 200 bytes
1000 MB data	 100 bytes

32 bit arch, 1000 objects

1 MB data	3400 bytes
5 MB data	2200 bytes
10 MB data	1700 bytes
20 MB data	1200 bytes
100 MB data	 500 bytes
500 MB data	 150 bytes
1000 MB data	 100 bytes

32 bit arch, 10000 objects
1 MB data	1300 bytes
5 MB data	1000 bytes
10 MB data	 800 bytes
20 MB data	 600 bytes
100 MB data	 300 bytes
500 MB data	 100 bytes
1000 MB data	  75 bytes

Those figures need to be taken with a couple of grains of salt, but 
should give a indication of at what object size one needs to manually 
handle object lifetimes.

As a comparison -- the 64 bit haven:

64 bit arch, 100 objects
2 GB data	1.5 GB
100 GB data	600 MB
1 TB data	200 MB

64 bit arch, 1000 objects
2 GB data	350 MB
100 GB data	250 MB
1 TB data	150 MB

64 bit arch, 10000 objects
2 GB data	100 MB
100 GB data	 75 MB
1 TB data	 50 MB
</theoretical rambling>

Summary:

As far as I can see, what I have to do to avoid memory leaks with a 
conservative GC, is one of the following:

1. move to a 64 bit architecture
2. manually handle all objects larger than a few hundred bytes (see above)
3. hide all non pointer data from the GC

It is a shame #2 is such a pain and that D doesn't offer any help such 
as automatic ref-counting. Not having automatic ref-counting also 
prevents neat solutions such as transparent CoW, and automatic handling 
of scarce resources.

If I wouldn't have a strong belief that automatic ref-counting would be 
addressed soon, I'd definitely consider going back to C++. Luckily, 
before I'd give up waiting, 64 bit architectures will probably be in 
great majority. ;)

/Oskar