Non-moving generational GC [was: Template Metaprogramming Made Easy (Huh?)]

Mon Sep 14 19:51:19 PDT 2009

On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmohamed at mac.com> wrote:

> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandford at jhu.edu> said:
>
>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella  
>> <llucax at gmail.com>  wrote:
>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste:
>> [snip]
>>>> I understand your points for using a separate memory manager, and
>>>> I agree with you that having less active allocations make for faster
>>>> sweeps, no matter how little of them are scanned for pointers. However
>>>> I just had an idea on how to implement generational collection on
>>>> a non-moving GC which should solve your issues (and well, mines too)
>>>> with the collector not being fast enough. I need to do some hacking on
>>>  I saw a paper about that. The idea was to simply have some list of
>>> objects/pages in each generation and modify that lists instead of  
>>> moving
>>> objects. I can't remember the name of the paper so I can't find it now  
>>> :S
>>>  The problem with generational collectors (in D) is that you need
>>> read/write barriers to track inter-generational pointers (to be able to
>>> use pointers to younger generations in the older ones as roots when
>>> scanning), which can make the whole deal a little unpractical for
>>> a language that doesn't want to impose performance penalty to thing you
>>> wont use (I don't see a way to instrument read/writes to pointers to  
>>> the
>>> GC only). This is why RC was always rejected as an algorithm for the  
>>> GC  in
>>> D, I think.
>>>
>>>> my custom GC first, but I believe it could give yet another  
>>>> performance
>>>> boost. I'll add my memory manager to my list of code modules to make
>>>> public :)
>>>
>>  As a counter-point, objective-c just introduced a thread-local GC.   
>> According to a blog post   
>> (http://www.sealiesoftware.com/blog/archive/2009/08/28/objc_explain_Thread-local_garbage_collection.html)  
>>  apparently this has allowed pause times similar to the pause times of  
>> the  previous generational GC. (Except that the former is doing a full  
>> collect,  and the later still has work to do) On that note, it would  
>> probably be a  good idea if core.gc.BlkAttr supported shared and  
>> immutable state flags,  which could be used to support a thread-local  
>> GC.
>
> 1) to allocate large objects that have a guard object it is a good idea  
> to pass through the GC because if memory is tight a gc collection is  
> triggered thereby possibly freeing some extra memory
> 2) using gc malloc is not faster than malloc, especially with several  
> threads the single lock of the basic gc makes itself felt.
>
> for how I use D (not realtime) the two things I would like to see from  
> new gc are:
> 1) multiple pools (at least one per cpu, with thread id hash to assign  
> threads to a given pool).
> This to avoid the need of a global gc lock in the gc malloc, and if  
> possible use memory close to the cpu when a thread is pinned, not to  
> have really thread local memory, if you really need local memory  
> different from the stack then maybe a separate process should be used.  
> This is especially well doable with 64 bits, with 32 memory  
> usage/fragmentation could become an issue.
> 2) multiple thread doing the collection (a main thread distributing the  
> work to other threads (one per cpu), that do the mark phase using atomic  
> ops).
>
> other better gc, less latency (but not at the cost of too much  
> computation), would be nice to have, but are not a priority for my usage.
>
> Fawzi
>

For what it's worth, the whole point of thread-local GC is to do 1) and  
2). For the purposes of clarity, thread-local GC refers to each thread  
having it's own GC for non-shared objects + a shared GC for shared  
objects. Each thread's GC may allocate and collect independently of each  
other (e.g. in parallel) without locking/atomics/etc.