Non-moving generational GC [was: Template Metaprogramming Made Easy (Huh?)]

Tue Sep 15 06:38:30 PDT 2009

On 2009-09-15 04:51:19 +0200, "Robert Jacques" <sandford at jhu.edu> said:

> On Mon, 14 Sep 2009 18:53:51 -0400, Fawzi Mohamed <fmohamed at mac.com> wrote:
> 
>> On 2009-09-14 17:07:00 +0200, "Robert Jacques" <sandford at jhu.edu> said:
>> 
>>> On Mon, 14 Sep 2009 09:39:51 -0400, Leandro Lucarella  
>>> <llucax at gmail.com>  wrote:
>>>> Jeremie Pelletier, el 13 de septiembre a las 22:58 me escribiste:
>>> [snip]
>> [1) to allocate large objects that have a guard object it is a good 
>> idea  to pass through the GC because if memory is tight a gc collection 
>> is  triggered thereby possibly freeing some extra memory
>> 2) using gc malloc is not faster than malloc, especially with several  
>> threads the single lock of the basic gc makes itself felt.
>> 
>> for how I use D (not realtime) the two things I would like to see from  
>> new gc are:
>> 1) multiple pools (at least one per cpu, with thread id hash to assign  
>> threads to a given pool).
>> This to avoid the need of a global gc lock in the gc malloc, and if  
>> possible use memory close to the cpu when a thread is pinned, not to  
>> have really thread local memory, if you really need local memory  
>> different from the stack then maybe a separate process should be used.  
>> This is especially well doable with 64 bits, with 32 memory  
>> usage/fragmentation could become an issue.
>> 2) multiple thread doing the collection (a main thread distributing the 
>>  work to other threads (one per cpu), that do the mark phase using 
>> atomic  ops).
>> 
>> other better gc, less latency (but not at the cost of too much  
>> computation), would be nice to have, but are not a priority for my 
>> usage.
>> 
>> Fawzi
>> 
> 
> For what it's worth, the whole point of thread-local GC is to do 1) and 
>  2). For the purposes of clarity, thread-local GC refers to each thread 
>  having it's own GC for non-shared objects + a shared GC for shared  
> objects. Each thread's GC may allocate and collect independently of 
> each  other (e.g. in parallel) without locking/atomics/etc.

Well I want at least thread local pools (or almost, one can probably 
restrict it to the number of cpus, which will give most of the 
benefit), but not an extra partition of the memory in thread local and 
shared.
Such a partition might be easier in D2 (I think it was discussed, but 
even then I am not fully sure about the benefit), because then you have 
to somehow be able to share and maybe even unshare an object, which 
will be cumbersome. Thread local things add a level in the memory 
hierarchy that I am not fully sure is worth having, in it you should 
have almost only low level plumbing.
If you really want that much separation for many things then maybe a 
separate process + memmap might be better.
The fast local storage for me is the stack, and one might think about 
being more aggressive in using it, the heap is potentially shared.
Well at least that is my feeling.

Note that on 64 bit one can easily use a few bits to subdivide the 
memory in parts, making finding the pool group very quick, and this 
discussion is orthogonal to being generational or not.

Fawzi