[RFC]Proposal for better garbage collection

Benjamin Thaut code at benjamin-thaut.de
Wed Feb 22 10:56:15 PST 2012

As I'm not satisfied with the current GC D has and don't see the 
situation improving in the future without significant changes to the 
compiler I wrote the following document that points out all the possible 
issues with garbage collection I could think of and possible solutions 
for them. This document is just a draft and only a proposal, critics and 
comments are welcome.

Kind Regards
Benjamin Thaut

1) Preface

All modern Languages with fast and efficient GCs have build in support 
for garbage collection within the compiler / virtual machine. Currently 
the D language does have as much GC support as C++ has but fully relies 
on the GC with a lot of language features and the standard library. As a 
result the GC is slow, non parallel and always needs to stop all threads 
before collection. This document suggests to build in better support for 
garbage collection into the language so that creating a fast and 
efficient GC becomes possible. A list of features that would be required 
is described in this document.

2) Tracking references on the stack:

The D compiler always needs to emit a full stack frame so that the GC 
can walk up the stack at any time in the program. The stack frame of 
every function generated by the D compiler starts which a bitfield 
(usually the size of a machine register) where each bit indicates that 
these bytes are a pointer / reference. The bitfield needs to be large 
enough to cover the whole stack frame of the function.

For example on x86: 11001...
1 = bytes 0 to 4 are a pointer
1 = bytes 4 to 8 are a pointer
00 = bytes 8 to 16 are not a pointer
1 = bytes 16 to 20 are a pointer

The last bit indicates whether the bitfield is continued in the 
following bytes or not. 1 means continued 0 means finished.
Every scope generated by the D compiler would need additional code at 
the start and end of the scope. When the scope is entered the bitfield 
would be patched to represent the new variables inside the scope and 
when the scope is left the bitfield is patched again to remove the 
changes that were made on entering the scope.
Every time a function gets called that did not get generated by the D 
compiler ( C / C++ etc functions) the compiler generates a call into the 
runtime and passes the current stack pointer and stack base pointer to it.

void _d_externalCallStart(void* stackptr, void* baseptr);

Every time such a function returns the compiler   generates a call into 
the the runtime too.

void _d_externalCallEnd(void* stackptr, void* baseptr);

Every time a functions that can get called from other languages 
(extern(C) etc) are executed the end callback is inserted at the start 
of the functions and the start callback is inserted at the end of the 
Using these callbacks the GC can mark certain parts of the stack as 
"non-D" and ignore them when scanning for bit fields and 
references/pointers. It can just skip parts of the stack that are 
"non-D" and therefore does not need a full stack frame within these 
"non-D" sections.
All these features are required so that the GC can precisely scan 
pointers/references on the stack and change them as necessary.
Remaining issues: The D compiler can freely move around value types on 
the stack. With such move operations it would be necessary to fix up all 
the bit fields. I needs to be investigated if this is doable.

3) Tracking references on the heap

For every class / struct a mixin template which is defined inside the 
runtime gets instantiated. This template can then use introspection to 
generate the necessary information to allow the GC to scan the pointers 
within that struct / class precisely.

4) Thread local / global memory

A callback into the runtime needs to happen in the following cases:
- a __gshared variable is assigned
- a reference / pointer is casted to immutable
- a reference / pointer is casted to shared

void _d_castToGlobalMem(void* ptr);

This can be used by the GC to keep thread local pools of memory and move 
a memory block to a global memory pool as soon as it is needed there.

5) pointer / reference changed callback

Every time a pointer / reference is changed the D compiler emits a call 
into the runtime and passes the new value of the reference / pointer 
with it.

void _d_pointerChanged(void *ptr);

This can be used when writing a generational GC to have separate pools 
for young and old generations. Every time the young generation needs to 
be collected it can be avoided to scan the old generations pool because 
it is sufficient to only check the pointers that have changed since the 
last time the young generation was collected. With the above mentioned 
callback it is easily possible to track these references.

Remaining issues:
-If the GC interrupts a thread right before any of the above mentioned 
callbacks happen it will cause a invalid state for the GC and the GC 
might access invalid pointers. It has to be investigated if this leads 
to invalid behavior. It can be fixed by not interrupting a thread but 
pause it the next time it calls any of callbacks, or other functions 
that can be interrupted by the GC. This in turn could cause a thread to 
be non pausable because it is stuck in a polling loop. The compiler 
could identify loops without any GC interruptible function and manually 
insert one.
-When pointers/references are passed inside processor registers the GC 
cannot know if these values are actually pointers/references or 
represent other values. If threads are paused at defined points in the 
code as mentioned before this issues would be fixed because the state at 
these points is known and can be handled accordingly.

6) Different interface for the GC

The current interface to the GC would have to change because the "this 
block of memory might contain a pointer" approach wouldn't work anymore. 
For example a block of memory and a delegate which iterates over all 
pointers within the memory block could be used for user allocated memory 
blocks. There should be a separate allocator function provided by the GC 
that allocates memory that does not get moved around so it can be used 
to pass it to non garbage collected code.

7) Compiler Options

Each of the above mentioned groups of features should be exposed as 
compiler options so that you can turn them on/off depending on which 
type of GC you use. Default on/off states for these features are set 
within a config file depending on which type of GC currently ships per 
default with druntime.

8) Conclusion

Garbage Collection brings a lot of advantages for the programmer using 
the language but is not free and shouldn't be treated as free. Full 
support for garbage collection is required to build a fast and efficient 
GC. This additional support requires additional features within the 
compiler but should result in a overall better performing language.

More information about the Digitalmars-d mailing list