More radical ideas about gc and reference counting

Rainer Schuetze via Digitalmars-d digitalmars-d at puremagic.com
Sun May 11 11:18:38 PDT 2014



On 11.05.2014 10:22, Benjamin Thaut wrote:
> Am 10.05.2014 19:54, schrieb Andrei Alexandrescu:
>>
>>> The next sentence goes on to list the advantages of RC (issues we have
>>> wrestled with, like destructors), and then goes on to say the recent
>>> awesome RC is within 10% of "the fastest tracing collectors".
>>> Are you suggesting that D's GC is among 'the fastest tracing
>>> collectors'? Is such a GC possible in D?
>>
>> I believe it is.
>>
>
> While it might be possible to implement a good GC in D it would require
> major changes in the language and its librariers. In my opinion it would
> be way more work to implement a propper GC than to implement ARC.
>
> Every state of the art GC requires percise knowdelge of _all_ pointers.
> And thats exactly what we currently don't have in D.

I think most garbage collectors can work with a number of false 
pointers. The referenced memory area has to be treated as pinned and 
cannot be moved. Limiting the false pointers to stack and registers 
seems like a compromise, though most of the stack could even be 
annotated. Code for this does already exist in the debug info 
generation, though I suspect stack tracing could be unreliable.

Here's my current stance on the GC discussions:

I agree that the current GC is pretty lame, even if it were precise. 
"Stop-the-World" with complete tracing does not work for any interactive 
application that uses more than a few hundred MB of garbage collected 
memory (with or without soft-realtime requirements). Other applications 
with larger allocation requirements are easily dominated by collection 
time. Proposing to use manual memory management instead is admitting 
failure to me.

For a reasonable GC I currently see 2 possible directions:

1. Use a scheme that takes a snapshot of the heap, stack and registers 
at the moment of collection and do the actual collection in another 
thread/process while the application can continue to run. This is the 
way Leandro Lucarellas concurrent GC works 
(http://dconf.org/2013/talks/lucarella.html), but it relies on "fork" 
that doesn't exist on every OS/architecture. A manual copy of the memory 
won't scale to very large memory, though it might be compressed to 
possible pointers. Worst case it will need twice as much memory as the 
current heap.

It would be very interesting how far we can push this model on the 
supported platforms.

2. Change the compiler to emit (library defined) write barriers for 
modifications of (possible) pointers. This will allow to experiment with 
more sophisticated GC algorithms (if the write barrier is written in D, 
we might also need pointers without barriers to implement it). I know 
Walter is against this, and I am also not sure if this adds acceptable 
overhead, but we don't have proof of the opposite, too.

As we all know, the usual eager reference counting with atomic 
operations is not memory-safe, so my current favorite is "concurrent 
buffered reference counting" (see chapter 18.2/3 "The garbage collection 
handbook" by Richard Jones et al): reference count modifications are not 
performed directly by the write barrier, but it just logs the operation 
into a thread local buffer. This is then processed by a collector thread 
which also detects cycles (only on candidates which had their reference 
count decreased during the last cycle). Except for very large reference 
chains this scales with the number of executed pointer modifications and 
not with the heap size.


More information about the Digitalmars-d mailing list