Non-moving generational GC [was: Template Metaprogramming Made Easy (Huh?)]
Jeremie Pelletier
jeremiep at gmail.com
Wed Sep 16 09:52:14 PDT 2009
Fawzi Mohamed Wrote:
> On 2009-09-15 04:51:19 +0200, "Robert Jacques" <sandford at jhu.edu> said:
>
> > On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmohamed at mac.com> wrote:
> >
> >> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandford at jhu.edu> said:
> >>
> >>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella
> >>> <llucax at gmail.com> wrote:
> >>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste:
> >>> [snip]
> >> [1) to allocate large objects that have a guard object it is a good
> >> idea to pass through the GC because if memory is tight a gc collection
> >> is triggered thereby possibly freeing some extra memory
> >> 2) using gc malloc is not faster than malloc, especially with several
> >> threads the single lock of the basic gc makes itself felt.
> >>
> >> for how I use D (not realtime) the two things I would like to see from
> >> new gc are:
> >> 1) multiple pools (at least one per cpu, with thread id hash to assign
> >> threads to a given pool).
> >> This to avoid the need of a global gc lock in the gc malloc, and if
> >> possible use memory close to the cpu when a thread is pinned, not to
> >> have really thread local memory, if you really need local memory
> >> different from the stack then maybe a separate process should be used.
> >> This is especially well doable with 64 bits, with 32 memory
> >> usage/fragmentation could become an issue.
> >> 2) multiple thread doing the collection (a main thread distributing the
> >> work to other threads (one per cpu), that do the mark phase using
> >> atomic ops).
> >>
> >> other better gc, less latency (but not at the cost of too much
> >> computation), would be nice to have, but are not a priority for my
> >> usage.
> >>
> >> Fawzi
> >>
> >
> > For what it's worth, the whole point of thread-local GC is to do 1) and
> > 2). For the purposes of clarity, thread-local GC refers to each thread
> > having it's own GC for non-shared objects + a shared GC for shared
> > objects. Each thread's GC may allocate and collect independently of
> > each other (e.g. in parallel) without locking/atomics/etc.
>
> Well I want at least thread local pools (or almost, one can probably
> restrict it to the number of cpus, which will give most of the
> benefit), but not an extra partition of the memory in thread local and
> shared.
> Such a partition might be easier in D2 (I think it was discussed, but
> even then I am not fully sure about the benefit), because then you have
> to somehow be able to share and maybe even unshare an object, which
> will be cumbersome. Thread local things add a level in the memory
> hierarchy that I am not fully sure is worth having, in it you should
> have almost only low level plumbing.
> If you really want that much separation for many things then maybe a
> separate process + memmap might be better.
> The fast local storage for me is the stack, and one might think about
> being more aggressive in using it, the heap is potentially shared.
> Well at least that is my feeling.
>
> Note that on 64 bit one can easily use a few bits to subdivide the
> memory in parts, making finding the pool group very quick, and this
> discussion is orthogonal to being generational or not.
>
> Fawzi
>
I just posted my memory manager to pastebin:
http://pastebin.com/f7459ba9d
I gave up on the generational feature, its indeed impossible without write barriers to keep track of pointers from old generations to newer ones. I had the whole tracing algorithm done but without generations, a naive scan and sweep is faster because it has way less cache misses.
I'd like to get some feedback on it if possible.
More information about the Digitalmars-d
mailing list