Poor memory allocation performance with a lot of threads on 36 core machine

Martin Nowak via Digitalmars-d digitalmars-d at puremagic.com
Fri Feb 19 05:06:58 PST 2016


On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote:
> So, the question is, why is D / DMD allocator so slow under 
> heavy multithreading? The working set is pretty small (few 
> megabytes at most), so I do not think this is an issue with GC 
> scanning itself.  Can I plug-in tcmalloc / jemalloc, to be used 
> as the underlying allocator, instead of using glibc? Or is D 
> runtime using mmap/srbk/etc directly?
>
> Thanks.

As others have noted, this is due to heavy contention in the GC.
There is a pending PR 
(https://github.com/D-Programming-Language/druntime/pull/1447) to 
replace the recursive mutex with a spinlock, that should improve 
the numbers a bit but doesn't solve the problem.
Since version 2.070 we also suspend threads in parallel which 
heavily reduces the pause times with many threads 
https://github.com/D-Programming-Language/druntime/pull/1110.

The real fix (using thread local allocator caches) has a very 
high priority in our backlog 
(https://trello.com/c/K7HrSnwo/28-thread-cache-for-gc), but isn't 
yet fully implemented. You can see the latest state here 
https://github.com/MartinNowak/druntime/commits/gcCache. I still 
need to add a queue on each thread cache to sync metadata.

So for the time being, use at least 2.070.0, and try to replace 
GC allocations with malloc.


More information about the Digitalmars-d mailing list