[RFC]Proposal for better garbage collection

H. S. Teoh hsteoh at quickfur.ath.cx
Wed Feb 22 12:41:18 PST 2012


On Wed, Feb 22, 2012 at 08:53:45PM +0100, Benjamin Thaut wrote:
> Am 22.02.2012 20:40, schrieb H. S. Teoh:
> >On Wed, Feb 22, 2012 at 07:56:15PM +0100, Benjamin Thaut wrote:
> >>As I'm not satisfied with the current GC D has and don't see the
> >>situation improving in the future without significant changes to the
> >>compiler I wrote the following document that points out all the
> >>possible issues with garbage collection I could think of and
> >>possible solutions for them. This document is just a draft and only
> >>a proposal, critics and comments are welcome.
> >
> >Have you seen this?
> >
> >	http://www.llucax.com.ar/proj/dgc/index.html
> 
> Yes I know about dgc it is better but still not on par with for
> example the GC that is shipped with the .NET 4.0
> All I'm saying is that without propper support from the compiler we
> are not going to get GCs as good as in other modern languages.

I agree. Better compiler support would definitely be beneficial.


[...]
> >>Every scope generated by the D compiler would need additional code
> >>at the start and end of the scope. When the scope is entered the
> >>bitfield would be patched to represent the new variables inside the
> >>scope and when the scope is left the bitfield is patched again to
> >>remove the changes that were made on entering the scope.
> >
> >This would introduce quite a lot of overhead per scope. It will also
> >lead to strange things like:
> >
> >	if (x) y();	// faster
> >	if (x) { y(); }	// slower
> >
> >which will encourage people to omit {} after if, which makes code more
> >fragile and hard to read.
> >
> 
> Scopeds that don't have variables declared inside them don't need
> the bitfield patching. so that argument is completely pointless.
> Scopes that contain varaibles that are not pointers or refrences
> also don't need the patching.

That wasn't clear from your description. It makes more sense now.


[...]
> >>5) pointer / reference changed callback
> >>
> >>Every time a pointer / reference is changed the D compiler emits a
> >>call into the runtime and passes the new value of the reference /
> >>pointer with it.
> >
> >This introduces a LOT of overhead, especially in a language like D which
> >manipulates a lot of pointers quite often (esp. if you use slices a
> >lot).
> >
> 
> I did not make this up, I know a smalltalk implementation that
> actually does this and is pretty efficient.

OK.


[...]
> >>-If the GC interrupts a thread right before any of the above
> >>mentioned callbacks happen it will cause a invalid state for the GC
> >>and the GC might access invalid pointers. It has to be investigated
> >>if this leads to invalid behavior. It can be fixed by not
> >>interrupting a thread but pause it the next time it calls any of
> >>callbacks, or other functions that can be interrupted by the GC.
> >
> >This adds a lot of intermittent pauses in program execution.
> 
> Why should there be pauses, there is just a additional check in
> every callback to the gc there already is. When the gc wants to
> collect he sets the pause flag to true and waits until all required
> threads paused themselfs.

Whereas with a scheme like dgc there is no need for threads to pause at
all.


> >The link I posted at the top has a GC implementation that doesn't
> >introduce *any* of this overhead (the GC runs concurrently with the
> >program), with no pause during a collection cycle (garbage is
> >incrementally collected when allocating new memory).
> >
> 
> Any non percise scanning algorithms will not be able to deal with
> memory fragmentation

There are ways to deal with this. Though, granted, they're imperfect.


> and will also have uneccessary overhead for scanning regions of memory
> that don't contain any pointers.

True. But if the scanning is running in its own thread anyway, and no
other thread needs to wait for it, then this doesn't really matter, does
it?


> Also they can leak memory because some int value has the same value as
> a pointer and therefore the gc does not free that block of memory.

Yes, this is definitely a problem. It's not easy to fix this in a
language like D, though, without adding some overhead.


[...]
> >>8) Conclusion
> >>
> >>Garbage Collection brings a lot of advantages for the programmer
> >>using the language but is not free and shouldn't be treated as free.
> >
> >I don't know, but after reading this:
> >
> >	http://www.cs.purdue.edu/homes/grr/snapshot-gc.ps
> >
> >I think there might be a possibility of (almost) free GC.
> >
> 
> There is no free GC. The only question is which trade offs you want
> to make. Modern implementations like the .NET 4.0 garbage collector
> show that all the things mentioned here are possible and are faster
> then primitve implementations.

True. But then again, D's GC is only a simple implementation. Just
because an advanced implementation of a GC beats D's GC doesn't
necessarily mean that that particular implementation's GC model is the
best.

But I'm not trying to argue against your proposal. I'm just saying we
should evaluate different GC models to see which one works best. But for
that we need a way to easily plug in different GC models, which requires
compiler support.


[...]
> >I agree in principle, although for specific GC proposals, we'd need
> >to evaluate the pros and cons to determine what is improved and what
> >degrades. Unfortunately, GC is a complex problem, and different GCs
> >work better with different apps. I wouldn't be so quick to make
> >claims about the performance of GCs. It depends on what the app does.
> >
> >
> >T
> >
> 
> Fully agree on this
> 
> I want to add that I did not make all this up. Most of the mentioned
> features here are actually used in a Smalltalk implementation that
> compiles Smalltalk to C for faster execution.

The thing is, Smalltalk is different enough from D that it's hard to
draw conclusions about the performance of the GC when used in D just by
looking at its performance in Smalltalk. In Smalltalk, the programmer
doesn't have direct access to things like pointers and byte
representations of stuff. This allows the compiler to make optimizations
that would you couldn't safely do with a D program. It also means
programmers may implement the same algorithms differently in D than in
Smalltalk. So the memory usage patterns of a D program are quite
different from a Smalltalk program, and this will affect how a
particular GC behaves when applied to D.

Without actual testing we have no way to know for sure.

But regardless, we need better compiler support. No argument about that.
:)


T

-- 
Lottery: tax on the stupid. -- Slashdotter


More information about the Digitalmars-d mailing list