[RFC]Proposal for better garbage collection

Wed Feb 22 11:40:35 PST 2012

On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
> As I'm not satisfied with the current GC D has and don't see the
> situation improving in the future without significant changes to the
> compiler I wrote the following document that points out all the
> possible issues with garbage collection I could think of and
> possible solutions for them. This document is just a draft and only
> a proposal, critics and comments are welcome.

Have you seen this?

	http://www.llucax.com.ar/proj/dgc/index.html

[...]
> 2) Tracking references on the stack:
> 
> The D compiler always needs to emit a full stack frame so that the
> GC can walk up the stack at any time in the program. The stack frame
> of every function generated by the D compiler starts which a
> bitfield (usually the size of a machine register) where each bit
> indicates that these bytes are a pointer / reference. The bitfield
> needs to be large enough to cover the whole stack frame of the
> function.

This adds a lot of overhead to the runtime stack, esp. if you have deep
recursion. It's also not necessarily faster, since the GC now has to
parse a bitfield (a variable-length encoded bitfield, no less), instead
of just scanning words directly, which can be optimized by CPU-specific
microcode depending on the target platform.

[...]
> Every scope generated by the D compiler would need additional code
> at the start and end of the scope. When the scope is entered the
> bitfield would be patched to represent the new variables inside the
> scope and when the scope is left the bitfield is patched again to
> remove the changes that were made on entering the scope.

This would introduce quite a lot of overhead per scope. It will also
lead to strange things like:

	if (x) y();	// faster
	if (x) { y(); }	// slower

which will encourage people to omit {} after if, which makes code more
fragile and hard to read.

> Every time a function gets called that did not get generated by the
> D compiler ( C / C++ etc functions) the compiler generates a call
> into the runtime and passes the current stack pointer and stack base
> pointer to it.
> 
> void _d_externalCallStart(void* stackptr, void* baseptr);
> 
> Every time such a function returns the compiler   generates a call
> into the the runtime too.
> 
> void _d_externalCallEnd(void* stackptr, void* baseptr);
> 
> Every time a functions that can get called from other languages
> (extern(C) etc) are executed the end callback is inserted at the
> start of the functions and the start callback is inserted at the end
> of the function.
> Using these callbacks the GC can mark certain parts of the stack as
> "non-D" and ignore them when scanning for bit fields and
> references/pointers. It can just skip parts of the stack that are
> "non-D" and therefore does not need a full stack frame within these
> "non-D" sections.

This may not be a bad idea, though it does introduce some overhead when
you cross language boundaries.

> All these features are required so that the GC can precisely scan
> pointers/references on the stack and change them as necessary.
> Remaining issues: The D compiler can freely move around value types
> on the stack. With such move operations it would be necessary to fix
> up all the bit fields. I needs to be investigated if this is doable.

This can only make the GC slower, especially if it needs to update
variable-length encoded bitfields. Of course, you may be able to offset
this by making it possible to do real-time GC, (reduced throughput but
less waiting time for collection cycles) but that's a very complex
problem.

> 3) Tracking references on the heap
> 
> For every class / struct a mixin template which is defined inside
> the runtime gets instantiated. This template can then use
> introspection to generate the necessary information to allow the GC
> to scan the pointers within that struct / class precisely.

So basically you're proposing a compacting precise-scanning GC instead
of the current conservative GC. There are pros and cons in either
approach; it'd be nice if you could compare them.

[...]
> 5) pointer / reference changed callback
> 
> Every time a pointer / reference is changed the D compiler emits a
> call into the runtime and passes the new value of the reference /
> pointer with it.

This introduces a LOT of overhead, especially in a language like D which
manipulates a lot of pointers quite often (esp. if you use slices a
lot).

[...]
> -If the GC interrupts a thread right before any of the above mentioned
> callbacks happen it will cause a invalid state for the GC and the GC
> might access invalid pointers. It has to be investigated if this leads
> to invalid behavior. It can be fixed by not interrupting a thread but
> pause it the next time it calls any of callbacks, or other functions
> that can be interrupted by the GC.

This adds a lot of intermittent pauses in program execution.

The link I posted at the top has a GC implementation that doesn't
introduce *any* of this overhead (the GC runs concurrently with the
program), with no pause during a collection cycle (garbage is
incrementally collected when allocating new memory).

[...]
> 6) Different interface for the GC
> 
> The current interface to the GC would have to change because the
> "this block of memory might contain a pointer" approach wouldn't
> work anymore. For example a block of memory and a delegate which
> iterates over all pointers within the memory block could be used for
> user allocated memory blocks. There should be a separate allocator
> function provided by the GC that allocates memory that does not get
> moved around so it can be used to pass it to non garbage collected
> code.

It would be nice if there was a way for GCs to be pluggable, especially
in the compiler. Currently, we can only swap GCs that implement the same
interface as the existing one, but to switch to a different GC model
like you're proposing would require a lot of compiler support.

> 7) Compiler Options
> 
> Each of the above mentioned groups of features should be exposed as
> compiler options so that you can turn them on/off depending on which
> type of GC you use. Default on/off states for these features are set
> within a config file depending on which type of GC currently ships
> per default with druntime.

This would be nice.

> 8) Conclusion
> 
> Garbage Collection brings a lot of advantages for the programmer
> using the language but is not free and shouldn't be treated as free.

I don't know, but after reading this:

	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps

I think there might be a possibility of (almost) free GC.

> Full support for garbage collection is required to build a fast and
> efficient GC. This additional support requires additional features
> within the compiler but should result in a overall better performing
> language.

I agree in principle, although for specific GC proposals, we'd need to
evaluate the pros and cons to determine what is improved and what
degrades. Unfortunately, GC is a complex problem, and different GCs work
better with different apps. I wouldn't be so quick to make claims about
the performance of GCs. It depends on what the app does.

T

-- 
EMACS = Extremely Massive And Cumbersome System