Poor memory allocation performance with a lot of threads on 36 core machine
Martin Nowak via Digitalmars-d
digitalmars-d at puremagic.com
Fri Feb 19 05:06:58 PST 2016
On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote:
> So, the question is, why is D / DMD allocator so slow under
> heavy multithreading? The working set is pretty small (few
> megabytes at most), so I do not think this is an issue with GC
> scanning itself. Can I plug-in tcmalloc / jemalloc, to be used
> as the underlying allocator, instead of using glibc? Or is D
> runtime using mmap/srbk/etc directly?
>
> Thanks.
As others have noted, this is due to heavy contention in the GC.
There is a pending PR
(https://github.com/D-Programming-Language/druntime/pull/1447) to
replace the recursive mutex with a spinlock, that should improve
the numbers a bit but doesn't solve the problem.
Since version 2.070 we also suspend threads in parallel which
heavily reduces the pause times with many threads
https://github.com/D-Programming-Language/druntime/pull/1110.
The real fix (using thread local allocator caches) has a very
high priority in our backlog
(https://trello.com/c/K7HrSnwo/28-thread-cache-for-gc), but isn't
yet fully implemented. You can see the latest state here
https://github.com/MartinNowak/druntime/commits/gcCache. I still
need to add a queue on each thread cache to sync metadata.
So for the time being, use at least 2.070.0, and try to replace
GC allocations with malloc.
More information about the Digitalmars-d
mailing list