Poor memory allocation performance with a lot of threads on 36 core machine

Witek via Digitalmars-d digitalmars-d at puremagic.com
Thu Feb 18 05:55:02 PST 2016


On Thursday, 18 February 2016 at 13:49:45 UTC, Vladimir Panteleev 
wrote:
> On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote:
>> So, the question is, why is D / DMD allocator so slow under 
>> heavy multithreading? The working set is pretty small (few 
>> megabytes at most), so I do not think this is an issue with GC 
>> scanning itself.  Can I plug-in tcmalloc / jemalloc, to be 
>> used as the underlying allocator, instead of using glibc? Or 
>> is D runtime using mmap/srbk/etc directly?
>>
>> Thanks.
>
> Currently, all memory allocations use a global GC lock[1]. As 
> such, presently high-parallelism programs need to avoid 
> allocating memory via the GC.
>
> You can avoid this problem by using a different allocation / 
> memory management strategy. You may want to have a look at 
> std.experimental.allocator.
>
> [1]: 
> https://github.com/D-Programming-Language/druntime/blob/30f8c1af39eb17d8ebec1f5fd401eb5cfd6b36da/src/gc/gc.d#L348-L370

Yeah, I was just using stuff like:

int[] newPartialSolution = partialSolution ~ row;
int[] newAvailableSolution = vailableSolution[0 .. $-1].dup;
// Move last element if needed in newAvailableSolution.

It was pretty hard to find out, because it was hidden behind "~". 
Yes, -vgc helped here, but still, I was not expecting so terrible 
performance.

I will try using std.experimental.allocator, but this doesn't 
play well with "~", and I would need to manually do expandArray, 
and array operations, which is a pain. It would be nice to encode 
allocator used in the type, potentially by wrapping array into 
custom struct/class.

"As of this time, std.experimental.allocator is not integrated 
with D's built-in operators that allocate memory, such as new, 
array literals, or array concatenation operators."



More information about the Digitalmars-d mailing list