[RFC]Proposal for better garbage collection

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Feb 22 12:03:59 PST 2012

On 22.02.2012 22:56, Benjamin Thaut wrote:
> As I'm not satisfied with the current GC D has and don't see the
> situation improving in the future without significant changes to the
> compiler I wrote the following document that points out all the possible
> issues with garbage collection I could think of and possible solutions
> for them. This document is just a draft and only a proposal, critics and
> comments are welcome.
> Kind Regards
> Benjamin Thaut
> 1) Preface
> All modern Languages with fast and efficient GCs have build in support
> for garbage collection within the compiler / virtual machine. Currently
> the D language does have as much GC support as C++ has but fully relies
> on the GC with a lot of language features and the standard library. As a
> result the GC is slow, non parallel and always needs to stop all threads
> before collection. This document suggests to build in better support for
> garbage collection into the language so that creating a fast and
> efficient GC becomes possible. A list of features that would be required
> is described in this document.
> 2) Tracking references on the stack:
> The D compiler always needs to emit a full stack frame so that the GC
> can walk up the stack at any time in the program.

  I think walking up the stack to collect this info again and again (the 
stack has a lot of "heavy frames" on the bottom, right?) sounds like a 
tremendously slow way of getting necessary memory ranges. I'm no expert 

The stack frame of
> every function generated by the D compiler starts which a bitfield
> (usually the size of a machine register) where each bit indicates that
> these bytes are a pointer / reference. The bitfield needs to be large
> enough to cover the whole stack frame of the function.
> For example on x86: 11001...
> 1 = bytes 0 to 4 are a pointer
> 1 = bytes 4 to 8 are a pointer
> 00 = bytes 8 to 16 are not a pointer
> 1 = bytes 16 to 20 are a pointer
> The last bit indicates whether the bitfield is continued in the
> following bytes or not. 1 means continued 0 means finished.
> Every scope generated by the D compiler would need additional code at
> the start and end of the scope. When the scope is entered the bitfield
> would be patched to represent the new variables inside the scope and
> when the scope is left the bitfield is patched again to remove the
> changes that were made on entering the scope.

Again I'm no expert, but what happens when GC starts collecting a thread 
stack amid this patching operation?

> Every time a function gets called that did not get generated by the D
> compiler ( C / C++ etc functions) the compiler generates a call into the
> runtime and passes the current stack pointer and stack base pointer to it.
> void _d_externalCallStart(void* stackptr, void* baseptr);
> Every time such a function returns the compiler generates a call into
> the the runtime too.
> void _d_externalCallEnd(void* stackptr, void* baseptr);

Why would you need these? And if you start calling callback on every 
operation, you may just as well pass direct ranges of memory to GC 
without stack walk.
+ you can call D function from C one that in turn calls D one, think 
extern(C) and callbacks.

> Every time a functions that can get called from other languages
> (extern(C) etc) are executed the end callback is inserted at the start
> of the functions and the start callback is inserted at the end of the
> function.
> Using these callbacks the GC can mark certain parts of the stack as
> "non-D" and ignore them when scanning for bit fields and
> references/pointers. It can just skip parts of the stack that are
> "non-D" and therefore does not need a full stack frame within these
> "non-D" sections.
> All these features are required so that the GC can precisely scan
> pointers/references on the stack and change them as necessary.
> Remaining issues: The D compiler can freely move around value types on
> the stack. With such move operations it would be necessary to fix up all
> the bit fields. I needs to be investigated if this is doable.
> 3) Tracking references on the heap
> For every class / struct a mixin template which is defined inside the
> runtime gets instantiated. This template can then use introspection to
> generate the necessary information to allow the GC to scan the pointers
> within that struct / class precisely.
> 4) Thread local / global memory
> A callback into the runtime needs to happen in the following cases:
> - a __gshared variable is assigned
> - a reference / pointer is casted to immutable
> - a reference / pointer is casted to shared
> void _d_castToGlobalMem(void* ptr);
> This can be used by the GC to keep thread local pools of memory and move
> a memory block to a global memory pool as soon as it is needed there.
> 5) pointer / reference changed callback
> Every time a pointer / reference is changed the D compiler emits a call
> into the runtime and passes the new value of the reference / pointer
> with it.

Bye-bye any speed of p++ ? I mean I'm horrified, and I bet I'm not alone.

> void _d_pointerChanged(void *ptr);
> This can be used when writing a generational GC to have separate pools
> for young and old generations. Every time the young generation needs to
> be collected it can be avoided to scan the old generations pool because
> it is sufficient to only check the pointers that have changed since the
> last time the young generation was collected. With the above mentioned
> callback it is easily possible to track these references.
> Remaining issues:
> -If the GC interrupts a thread right before any of the above mentioned
> callbacks happen it will cause a invalid state for the GC and the GC
> might access invalid pointers. It has to be investigated if this leads
> to invalid behavior. It can be fixed by not interrupting a thread but
> pause it the next time it calls any of callbacks, or other functions
> that can be interrupted by the GC. This in turn could cause a thread to
> be non pausable because it is stuck in a polling loop. The compiler
> could identify loops without any GC interruptible function and manually
> insert one.
> -When pointers/references are passed inside processor registers the GC
> cannot know if these values are actually pointers/references or
> represent other values. If threads are paused at defined points in the
> code as mentioned before this issues would be fixed because the state at
> these points is known and can be handled accordingly.
> 6) Different interface for the GC
> The current interface to the GC would have to change because the "this
> block of memory might contain a pointer" approach wouldn't work anymore.
> For example a block of memory and a delegate which iterates over all
> pointers within the memory block could be used for user allocated memory
> blocks. There should be a separate allocator function provided by the GC
> that allocates memory that does not get moved around so it can be used
> to pass it to non garbage collected code.
> 7) Compiler Options
> Each of the above mentioned groups of features should be exposed as
> compiler options so that you can turn them on/off depending on which
> type of GC you use. Default on/off states for these features are set
> within a config file depending on which type of GC currently ships per
> default with druntime.

Combinatorial explosion of sets of options that doesn't necessary allow 
a particular GC?
> 8) Conclusion
> Garbage Collection brings a lot of advantages for the programmer using
> the language but is not free and shouldn't be treated as free. Full
> support for garbage collection is required to build a fast and efficient
> GC. This additional support requires additional features within the
> compiler but should result in a overall better performing language.

Well, that was critic ;)

Dmitry Olshansky

More information about the Digitalmars-d mailing list