Non-moving generational GC [was: Template Metaprogramming Made Easy (Huh?)]
Robert Jacques
sandford at jhu.edu
Mon Sep 14 19:51:19 PDT 2009
On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmohamed at mac.com> wrote:
> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandford at jhu.edu> said:
>
>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella
>> <llucax at gmail.com> wrote:
>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste:
>> [snip]
>>>> I understand your points for using a separate memory manager, and
>>>> I agree with you that having less active allocations make for faster
>>>> sweeps, no matter how little of them are scanned for pointers. However
>>>> I just had an idea on how to implement generational collection on
>>>> a non-moving GC which should solve your issues (and well, mines too)
>>>> with the collector not being fast enough. I need to do some hacking on
>>> I saw a paper about that. The idea was to simply have some list of
>>> objects/pages in each generation and modify that lists instead of
>>> moving
>>> objects. I can't remember the name of the paper so I can't find it now
>>> :S
>>> The problem with generational collectors (in D) is that you need
>>> read/write barriers to track inter-generational pointers (to be able to
>>> use pointers to younger generations in the older ones as roots when
>>> scanning), which can make the whole deal a little unpractical for
>>> a language that doesn't want to impose performance penalty to thing you
>>> wont use (I don't see a way to instrument read/writes to pointers to
>>> the
>>> GC only). This is why RC was always rejected as an algorithm for the
>>> GC in
>>> D, I think.
>>>
>>>> my custom GC first, but I believe it could give yet another
>>>> performance
>>>> boost. I'll add my memory manager to my list of code modules to make
>>>> public :)
>>>
>> As a counter-point, objective-c just introduced a thread-local GC.
>> According to a blog post
>> (http://www.sealiesoftware.com/blog/archive/2009/08/28/objc_explain_Thread-local_garbage_collection.html)
>> apparently this has allowed pause times similar to the pause times of
>> the previous generational GC. (Except that the former is doing a full
>> collect, and the later still has work to do) On that note, it would
>> probably be a good idea if core.gc.BlkAttr supported shared and
>> immutable state flags, which could be used to support a thread-local
>> GC.
>
> 1) to allocate large objects that have a guard object it is a good idea
> to pass through the GC because if memory is tight a gc collection is
> triggered thereby possibly freeing some extra memory
> 2) using gc malloc is not faster than malloc, especially with several
> threads the single lock of the basic gc makes itself felt.
>
> for how I use D (not realtime) the two things I would like to see from
> new gc are:
> 1) multiple pools (at least one per cpu, with thread id hash to assign
> threads to a given pool).
> This to avoid the need of a global gc lock in the gc malloc, and if
> possible use memory close to the cpu when a thread is pinned, not to
> have really thread local memory, if you really need local memory
> different from the stack then maybe a separate process should be used.
> This is especially well doable with 64 bits, with 32 memory
> usage/fragmentation could become an issue.
> 2) multiple thread doing the collection (a main thread distributing the
> work to other threads (one per cpu), that do the mark phase using atomic
> ops).
>
> other better gc, less latency (but not at the cost of too much
> computation), would be nice to have, but are not a priority for my usage.
>
> Fawzi
>
For what it's worth, the whole point of thread-local GC is to do 1) and
2). For the purposes of clarity, thread-local GC refers to each thread
having it's own GC for non-shared objects + a shared GC for shared
objects. Each thread's GC may allocate and collect independently of each
other (e.g. in parallel) without locking/atomics/etc.
More information about the Digitalmars-d
mailing list