Non-moving generational GC [was: Template Metaprogramming Made Easy (Huh?)]

Wed Sep 16 09:52:14 PDT 2009

Fawzi Mohamed Wrote:

> On 2009-09-15 04:51:19 +0200, "Robert Jacques" <sandford at jhu.edu> said:
> 
> > On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmohamed at mac.com> wrote:
> > 
> >> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandford at jhu.edu> said:
> >> 
> >>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella  
> >>> <llucax at gmail.com>  wrote:
> >>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste:
> >>> [snip]
> >> [1) to allocate large objects that have a guard object it is a good 
> >> idea  to pass through the GC because if memory is tight a gc collection 
> >> is  triggered thereby possibly freeing some extra memory
> >> 2) using gc malloc is not faster than malloc, especially with several  
> >> threads the single lock of the basic gc makes itself felt.
> >> 
> >> for how I use D (not realtime) the two things I would like to see from  
> >> new gc are:
> >> 1) multiple pools (at least one per cpu, with thread id hash to assign  
> >> threads to a given pool).
> >> This to avoid the need of a global gc lock in the gc malloc, and if  
> >> possible use memory close to the cpu when a thread is pinned, not to  
> >> have really thread local memory, if you really need local memory  
> >> different from the stack then maybe a separate process should be used.  
> >> This is especially well doable with 64 bits, with 32 memory  
> >> usage/fragmentation could become an issue.
> >> 2) multiple thread doing the collection (a main thread distributing the 
> >>  work to other threads (one per cpu), that do the mark phase using 
> >> atomic  ops).
> >> 
> >> other better gc, less latency (but not at the cost of too much  
> >> computation), would be nice to have, but are not a priority for my 
> >> usage.
> >> 
> >> Fawzi
> >> 
> > 
> > For what it's worth, the whole point of thread-local GC is to do 1) and 
> >  2). For the purposes of clarity, thread-local GC refers to each thread 
> >  having it's own GC for non-shared objects + a shared GC for shared  
> > objects. Each thread's GC may allocate and collect independently of 
> > each  other (e.g. in parallel) without locking/atomics/etc.
> 
> Well I want at least thread local pools (or almost, one can probably 
> restrict it to the number of cpus, which will give most of the 
> benefit), but not an extra partition of the memory in thread local and 
> shared.
> Such a partition might be easier in D2 (I think it was discussed, but 
> even then I am not fully sure about the benefit), because then you have 
> to somehow be able to share and maybe even unshare an object, which 
> will be cumbersome. Thread local things add a level in the memory 
> hierarchy that I am not fully sure is worth having, in it you should 
> have almost only low level plumbing.
> If you really want that much separation for many things then maybe a 
> separate process + memmap might be better.
> The fast local storage for me is the stack, and one might think about 
> being more aggressive in using it, the heap is potentially shared.
> Well at least that is my feeling.
> 
> Note that on 64 bit one can easily use a few bits to subdivide the 
> memory in parts, making finding the pool group very quick, and this 
> discussion is orthogonal to being generational or not.
> 
> Fawzi
> 

I just posted my memory manager to pastebin:
http://pastebin.com/f7459ba9d

I gave up on the generational feature, its indeed impossible without write barriers to keep track of pointers from old generations to newer ones. I had the whole tracing algorithm done but without generations, a naive scan and sweep is faster because it has way less cache misses.

I'd like to get some feedback on it if possible.