Explicit Thread Local Heaps

Fri Nov 12 08:50:32 PST 2010

On 12-nov-10, at 16:36, dsimcha wrote:

> There was some discussion around here a while back about the  
> possibility of
> using thread-local heaps in the standard GC.  This was rejected  
> largely
> because of the complexity it would add when casting to shared/ 
> immutable.
>
> I'm wondering if it would be a good idea to allow memory to be  
> explicitly
> allocated as thread-local through a separate GC.  Such a GC would be  
> designed
> from the ground up to assume thread-local data and would never be  
> used to
> allocate in standard Phobos or Druntime functions.  It would simply  
> be a
> Phobos module, something like std.localgc.  The only way to use it  
> would be to
> explicitly call something like ThreadLocal.malloc, or pass it as a  
> parameter
> to something that needs an allocator.
>
> The collector would (unsafely) assume that you always maintain at  
> least one
> pointer to all thread-locally allocated data on either the relevant  
> thread's
> stack, the thread-local heap or in thread-local storage.  The global  
> heap,
> __gshared storage and other threads' stacks would not be scanned.
>
> A major issue I see is interfacing such a GC with the regular GC  
> such that
> pointers from the thread-local memory to shared memory are dealt with
> properly, without being excessively conservative.  The thread-local  
> GC would
> likely use core.stdc.malloc() to allocate large blocks of memory,  
> and would
> need a way to signal to the shared GC what blocks might contain  
> pointers
> without synchronizing on every update.
>
> If this sounds like a good idea, maybe I'll start prototyping it.   
> Overall,
> the idea is that thread-local heaps are an optimization that should  
> be done
> explicitly when/if you need it, not something that needs to be built  
> deep into
> the language runtime.

In my code the lock during allocation is more an issue than GC scanning.
Having thread local (or better numa node local) pools for the  
allocation with separate locks would solve the main bottleneck.

I have always disliked extra memory hierarchies, I feel that its  
benefit/complexity ratio is too small, but I might be wrong.
The problem you identified of pointers to "global" memory is difficult  
to solve in a way that really gives the local GC and advantage over  
the a good GC implementation has uses several pools, without burdening  
the programmer.

Still I imagine that having a localgc library implementation could be  
useful to some.
I suspect that using it for general types that might allocate memory  
on their own would be difficult, but as this be used in special cases  
probably it isn't an issue.