Poor memory allocation performance with a lot of threads on 36 core machine
Witek via Digitalmars-d
digitalmars-d at puremagic.com
Thu Feb 18 05:55:02 PST 2016
On Thursday, 18 February 2016 at 13:49:45 UTC, Vladimir Panteleev
wrote:
> On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote:
>> So, the question is, why is D / DMD allocator so slow under
>> heavy multithreading? The working set is pretty small (few
>> megabytes at most), so I do not think this is an issue with GC
>> scanning itself. Can I plug-in tcmalloc / jemalloc, to be
>> used as the underlying allocator, instead of using glibc? Or
>> is D runtime using mmap/srbk/etc directly?
>>
>> Thanks.
>
> Currently, all memory allocations use a global GC lock[1]. As
> such, presently high-parallelism programs need to avoid
> allocating memory via the GC.
>
> You can avoid this problem by using a different allocation /
> memory management strategy. You may want to have a look at
> std.experimental.allocator.
>
> [1]:
> https://github.com/D-Programming-Language/druntime/blob/30f8c1af39eb17d8ebec1f5fd401eb5cfd6b36da/src/gc/gc.d#L348-L370
Yeah, I was just using stuff like:
int[] newPartialSolution = partialSolution ~ row;
int[] newAvailableSolution = vailableSolution[0 .. $-1].dup;
// Move last element if needed in newAvailableSolution.
It was pretty hard to find out, because it was hidden behind "~".
Yes, -vgc helped here, but still, I was not expecting so terrible
performance.
I will try using std.experimental.allocator, but this doesn't
play well with "~", and I would need to manually do expandArray,
and array operations, which is a pain. It would be nice to encode
allocator used in the type, potentially by wrapping array into
custom struct/class.
"As of this time, std.experimental.allocator is not integrated
with D's built-in operators that allocate memory, such as new,
array literals, or array concatenation operators."
More information about the Digitalmars-d
mailing list