[RFC]Proposal for better garbage collection

Benjamin Thaut code at benjamin-thaut.de
Wed Feb 22 11:53:45 PST 2012


Am 22.02.2012 20:40, schrieb H. S. Teoh:
> On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
>> As I'm not satisfied with the current GC D has and don't see the
>> situation improving in the future without significant changes to the
>> compiler I wrote the following document that points out all the
>> possible issues with garbage collection I could think of and
>> possible solutions for them. This document is just a draft and only
>> a proposal, critics and comments are welcome.
>
> Have you seen this?
>
> 	http://www.llucax.com.ar/proj/dgc/index.html

Yes I know about dgc it is better but still not on par with for example 
the GC that is shipped with the .NET 4.0
All I'm saying is that without propper support from the compiler we are 
not going to get GCs as good as in other modern languages.

>
>
> [...]
>> 2) Tracking references on the stack:
>>
>> The D compiler always needs to emit a full stack frame so that the
>> GC can walk up the stack at any time in the program. The stack frame
>> of every function generated by the D compiler starts which a
>> bitfield (usually the size of a machine register) where each bit
>> indicates that these bytes are a pointer / reference. The bitfield
>> needs to be large enough to cover the whole stack frame of the
>> function.
>
> This adds a lot of overhead to the runtime stack, esp. if you have deep
> recursion. It's also not necessarily faster, since the GC now has to
> parse a bitfield (a variable-length encoded bitfield, no less), instead
> of just scanning words directly, which can be optimized by CPU-specific
> microcode depending on the target platform.
>

If you have a better idea for percise stack scanning I'm open for 
suggestions.

>
> [...]
>> Every scope generated by the D compiler would need additional code
>> at the start and end of the scope. When the scope is entered the
>> bitfield would be patched to represent the new variables inside the
>> scope and when the scope is left the bitfield is patched again to
>> remove the changes that were made on entering the scope.
>
> This would introduce quite a lot of overhead per scope. It will also
> lead to strange things like:
>
> 	if (x) y();	// faster
> 	if (x) { y(); }	// slower
>
> which will encourage people to omit {} after if, which makes code more
> fragile and hard to read.
>

Scopeds that don't have variables declared inside them don't need the 
bitfield patching. so that argument is completely pointless. Scopes that 
contain varaibles that are not pointers or refrences also don't need the 
patching.

>
>> Every time a function gets called that did not get generated by the
>> D compiler ( C / C++ etc functions) the compiler generates a call
>> into the runtime and passes the current stack pointer and stack base
>> pointer to it.
>>
>> void _d_externalCallStart(void* stackptr, void* baseptr);
>>
>> Every time such a function returns the compiler   generates a call
>> into the the runtime too.
>>
>> void _d_externalCallEnd(void* stackptr, void* baseptr);
>>
>> Every time a functions that can get called from other languages
>> (extern(C) etc) are executed the end callback is inserted at the
>> start of the functions and the start callback is inserted at the end
>> of the function.
>> Using these callbacks the GC can mark certain parts of the stack as
>> "non-D" and ignore them when scanning for bit fields and
>> references/pointers. It can just skip parts of the stack that are
>> "non-D" and therefore does not need a full stack frame within these
>> "non-D" sections.
>
> This may not be a bad idea, though it does introduce some overhead when
> you cross language boundaries.
>
>
>> All these features are required so that the GC can precisely scan
>> pointers/references on the stack and change them as necessary.
>> Remaining issues: The D compiler can freely move around value types
>> on the stack. With such move operations it would be necessary to fix
>> up all the bit fields. I needs to be investigated if this is doable.
>
> This can only make the GC slower, especially if it needs to update
> variable-length encoded bitfields. Of course, you may be able to offset
> this by making it possible to do real-time GC, (reduced throughput but
> less waiting time for collection cycles) but that's a very complex
> problem.
>
>
>> 3) Tracking references on the heap
>>
>> For every class / struct a mixin template which is defined inside
>> the runtime gets instantiated. This template can then use
>> introspection to generate the necessary information to allow the GC
>> to scan the pointers within that struct / class precisely.
>
> So basically you're proposing a compacting precise-scanning GC instead
> of the current conservative GC. There are pros and cons in either
> approach; it'd be nice if you could compare them.
>

I'm proposing a compacting percise-scanning generantional gc that has 
thread local pools and can scan these thread local pools without 
stopping the other threads. Also it will be able to collect young 
generations without the need to scan the old generations.

>
> [...]
>> 5) pointer / reference changed callback
>>
>> Every time a pointer / reference is changed the D compiler emits a
>> call into the runtime and passes the new value of the reference /
>> pointer with it.
>
> This introduces a LOT of overhead, especially in a language like D which
> manipulates a lot of pointers quite often (esp. if you use slices a
> lot).
>

I did not make this up, I know a smalltalk implementation that actually 
does this and is pretty efficient.

>
> [...]
>> -If the GC interrupts a thread right before any of the above mentioned
>> callbacks happen it will cause a invalid state for the GC and the GC
>> might access invalid pointers. It has to be investigated if this leads
>> to invalid behavior. It can be fixed by not interrupting a thread but
>> pause it the next time it calls any of callbacks, or other functions
>> that can be interrupted by the GC.
>
> This adds a lot of intermittent pauses in program execution.

Why should there be pauses, there is just a additional check in every 
callback to the gc there already is. When the gc wants to collect he 
sets the pause flag to true and waits until all required threads paused 
themselfs.

>
> The link I posted at the top has a GC implementation that doesn't
> introduce *any* of this overhead (the GC runs concurrently with the
> program), with no pause during a collection cycle (garbage is
> incrementally collected when allocating new memory).
>

Any non percise scanning algorithms will not be able to deal with memory 
fragmentation and will also have uneccessary overhead for scanning 
regions of memory that don't contain any pointers. Also they can leak 
memory because some int value has the same value as a pointer and 
therefore the gc does not free that block of memory.

>
> [...]
>> 6) Different interface for the GC
>>
>> The current interface to the GC would have to change because the
>> "this block of memory might contain a pointer" approach wouldn't
>> work anymore. For example a block of memory and a delegate which
>> iterates over all pointers within the memory block could be used for
>> user allocated memory blocks. There should be a separate allocator
>> function provided by the GC that allocates memory that does not get
>> moved around so it can be used to pass it to non garbage collected
>> code.
>
> It would be nice if there was a way for GCs to be pluggable, especially
> in the compiler. Currently, we can only swap GCs that implement the same
> interface as the existing one, but to switch to a different GC model
> like you're proposing would require a lot of compiler support.
>
>
>> 7) Compiler Options
>>
>> Each of the above mentioned groups of features should be exposed as
>> compiler options so that you can turn them on/off depending on which
>> type of GC you use. Default on/off states for these features are set
>> within a config file depending on which type of GC currently ships
>> per default with druntime.
>
> This would be nice.
>
>
>> 8) Conclusion
>>
>> Garbage Collection brings a lot of advantages for the programmer
>> using the language but is not free and shouldn't be treated as free.
>
> I don't know, but after reading this:
>
> 	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps
>
> I think there might be a possibility of (almost) free GC.
>

There is no free GC. The only question is which trade offs you want to 
make. Modern implementations like the .NET 4.0 garbage collector show 
that all the things mentioned here are possible and are faster then 
primitve implementations.

>
>> Full support for garbage collection is required to build a fast and
>> efficient GC. This additional support requires additional features
>> within the compiler but should result in a overall better performing
>> language.
>
> I agree in principle, although for specific GC proposals, we'd need to
> evaluate the pros and cons to determine what is improved and what
> degrades. Unfortunately, GC is a complex problem, and different GCs work
> better with different apps. I wouldn't be so quick to make claims about
> the performance of GCs. It depends on what the app does.
>
>
> T
>

Fully agree on this

I want to add that I did not make all this up. Most of the mentioned 
features here are actually used in a Smalltalk implementation that 
compiles Smalltalk to C for faster execution.



More information about the Digitalmars-d mailing list