Changes in the D2 design to help the GC?

Wed Jul 15 14:31:19 PDT 2009

In Java the GC is able to collect garbage very quickly, so people in Java allocate many small objects quite often.
In functional-style languages, like Scala, Clojure, F#, etc, most data is immutable, so again the GC has lot of pressure in allocating and freeing many small structures all the time.

D2 syntax allows both styles of programming (you can program in D almost as Java, if you want), but if you follow one of those two styles of programming you will see that the current D GC is much less efficient, and leads to low performance, compared to Java/F#. (Scoped classes are not enough).

I am not expert of GCs yet, but I'm certain there are ways to improve the current situation. Beside improving the GC itself, there can be ways to modify a bit the current design of D2 to help the design of a more efficient GC. Do you have ideas?

Time ago I have suggested to split the D pointers in two types, the GC-managed ones and the ones that work on the C heap, that the GC never touches. The type system can assure they never get mixed by mistake. Now I think (just an idea) the type of GC-managed pointers can be split in two types: the ones that are fully managed by a moving GC (see below) and the ones managed by a conservative GC, such memory is pinned, and the GC doesn't move it around. The type system will assure such three groups doesn't mix unless the programmer is really determined to mix them :-)

A simple idea of mine to improve the GC (not to change the D2 language yet) is to split the D GC in two parts, one is a moving one, that acts like a Java-style GC, especially useful in SafeD code, such GC will become the one used in OOP/functional-style code, probably it is the GC that will be used in most of the code of most D programs. A second part of the GC acts in a conservative way, like the current GC, it's safer. The second part of the GC manages "pinned" blocks of memory, that can't be moved, such memory is usually the one managed in lower level D modules, by user-written collections, etc. The performance of this second part of the GC will be lower (like the current one), but most data will not be managed by it anyway.

When you use LDC the slow GC is one of the few parts of D language that have low performance still (the other two part are that currently D isn't able to inline closures and virtual methods. Such things too will eventually need to be addressed if D wants to become high-performance. I can leave such topic to other posts/threads).

Bye,
bearophile