Explicit Thread Local Heaps
Fawzi Mohamed
fawzi at gmx.ch
Fri Nov 12 08:50:32 PST 2010
On 12-nov-10, at 16:36, dsimcha wrote:
> There was some discussion around here a while back about the
> possibility of
> using thread-local heaps in the standard GC. This was rejected
> largely
> because of the complexity it would add when casting to shared/
> immutable.
>
> I'm wondering if it would be a good idea to allow memory to be
> explicitly
> allocated as thread-local through a separate GC. Such a GC would be
> designed
> from the ground up to assume thread-local data and would never be
> used to
> allocate in standard Phobos or Druntime functions. It would simply
> be a
> Phobos module, something like std.localgc. The only way to use it
> would be to
> explicitly call something like ThreadLocal.malloc, or pass it as a
> parameter
> to something that needs an allocator.
>
> The collector would (unsafely) assume that you always maintain at
> least one
> pointer to all thread-locally allocated data on either the relevant
> thread's
> stack, the thread-local heap or in thread-local storage. The global
> heap,
> __gshared storage and other threads' stacks would not be scanned.
>
> A major issue I see is interfacing such a GC with the regular GC
> such that
> pointers from the thread-local memory to shared memory are dealt with
> properly, without being excessively conservative. The thread-local
> GC would
> likely use core.stdc.malloc() to allocate large blocks of memory,
> and would
> need a way to signal to the shared GC what blocks might contain
> pointers
> without synchronizing on every update.
>
> If this sounds like a good idea, maybe I'll start prototyping it.
> Overall,
> the idea is that thread-local heaps are an optimization that should
> be done
> explicitly when/if you need it, not something that needs to be built
> deep into
> the language runtime.
In my code the lock during allocation is more an issue than GC scanning.
Having thread local (or better numa node local) pools for the
allocation with separate locks would solve the main bottleneck.
I have always disliked extra memory hierarchies, I feel that its
benefit/complexity ratio is too small, but I might be wrong.
The problem you identified of pointers to "global" memory is difficult
to solve in a way that really gives the local GC and advantage over
the a good GC implementation has uses several pools, without burdening
the programmer.
Still I imagine that having a localgc library implementation could be
useful to some.
I suspect that using it for general types that might allocate memory
on their own would be difficult, but as this be used in special cases
probably it isn't an issue.
More information about the Digitalmars-d
mailing list