Has anyone ever tested CDGC (Concurrent D GC)?

Tue Jul 19 08:25:58 PDT 2011

On Tue, 19 Jul 2011 10:19:09 -0400, Masahiro Nakagawa <repeatedly at gmail.com> wrote:

> On Tue, 19 Jul 2011 22:57:47 +0900, Trass3r <un at known.com> wrote:
>
>>> I am interested in Concurrent GC.
>>>
>>> But, I have a question about CDGC.
>>> AFAIK, other language runtimes use thread for concurrent processing.
>>> Why use fork()? What is the merit of fork() approach?
>>
>> I don't know.
>> On Windows you can't use fork anyway and we have to figure out an
>> alternative way.
>>
>> Maybe he explains it in his thesis, but it's only available in Spanish:
>> http://www.llucax.com.ar/proj/dgc/index.html
>
> Thanks for the link!
> But, I didn't read Spanish...
>

He explains a lot of it in his blog. CDGC is an example of a snapshot GC. In short, when a collection is triggered, a snapshot of the program's entire memory subsystem is created. The GC can then trace using this snapshot without having to worry about program writes, etc, messing it up. When finished, it passes back to the main program what objects are garbage and the snapshot is discarded. Now, a userland implementation of this is horribly inefficient, so CDGC uses fork to leverage the OS's ability to do copy-on-write at the page level. I believe CDGC also uses a memory mapped file for the GC's meta info, to avoid the update message passing overhead.

While Windows does support allocating memory in a COW manner, there's no way to do add this dynamically to an existing page. Windows does support page-level write tracking (since 2K SP3), which is very useful for incremental GC. i.e. the GC can use OS to only trace through modified pages.

There are two other alternative, modern GCs that I know of which fit system programming languages like D. One used a kernel patch to trap hardware writes efficiently, allowing one to bolt a traditional concurrent GC onto a system's language. Which, while cool, isn't practical until OS APIs support it out of the box. The other is thread-local GCs, which according to Apple, have roughly equivalent performance to existing concurrent GCs. Given shared and immutable, thread-local GC's make a lot of sense for D and can be combined with other concurrent options should they be/become available.