Poor memory allocation performance with a lot of threads on 36 core machine

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Fri Feb 19 00:29:00 PST 2016


On Thursday, 18 February 2016 at 17:27:13 UTC, Chris Wright wrote:
> On Thu, 18 Feb 2016 13:00:12 +0000, Witek wrote:
>> So, the question is, why is D / DMD allocator so slow under 
>> heavy multithreading?
>
> It's a global GC, possibly with a little per-thread pool.
>
> As part of the abortive Amber language project, I was looking 
> into ways to craft per-thread GC. You need to tell the runtime 
> whether a variable is marked as shared or __gshared and that's 
> pretty much sufficient -- you can only refer to unshared 
> variables from one thread, which means you can do a local 
> collection stopping only one thread's execution. You can have 
> one heap for each thread and one cross-thread heap.
>
> This work hasn't happened in D yet.

Unfortunately, given how easy it is to cast between mutable, 
const, immutable, shared (and it's quite common to construct 
something as mutable and then cast it to immutable or shared) and 
how it's pretty easy to pass objects across threads, it becomes 
_very_ problematic to have a per-thread memory pool in D, even if 
theoretically it's a great idea.

> I would like to look into D's GC and parallelism more. I've 
> started on mark/sweep parallelism but haven't made any 
> worthwhile progress. I'll take this as my next task. It's more 
> awkward because it requires changes to the runtime interface, 
> which means modifying both DMD and the runtime.

Sociomantic has a concurrent GC for D1, and I think that they've 
ported it to D2 (if not, I expect that they're in the process of 
doing so), and that may or may not end up in druntime at some 
point, but the key problem that I recall is that it relies on 
fork to do what it does, which works fantastically on *nix 
systems, but doesn't work on Windows, and I'm not sure that 
anyone has figured out how to do the same thing on Windows yet 
(the key thing is that it needs to be able to take a snapshot of 
the memory). And other performance improvements have been made to 
the GC, but the fact that we're a system language that allows you 
ultimately to do most anything really limits what we can do in 
comparison to a language sitting in VM.

- Jonathan M Davis


More information about the Digitalmars-d mailing list