Memory Allocation and Locking

Fri Aug 22 06:19:54 PDT 2008

"dsimcha" wrote
>I guess this really is more of a C++ question then a D question, but it's 
>kind
> of both, so what the heck?
>
> Anyhow, I had some old C++ code laying around that I've been using, and I 
> had
> added some quick and dirty multithreading of embarrassingly parallel parts 
> of
> it to it to speed it up.  Lo and behold, it actually worked.  A while 
> later, I
> decided I needed to extend it in ways that couldn't be done nearly as 
> easily
> in C++ as in D, and it wasn't much code, so I just translated it to D.
>
> First iteration was slow as dirt because of resource contention for memory
> allocation.  Put in a few ugly hacks to avoid all memory allocation in 
> inner
> loops (i.e. ~= ), and now the D version is faster than the C++ version.
>
> Anyhow, the real quesiton is, why is it that I can get away w/ heap
> allocations in inner loops (i.e. vector.push_back() ) in multithreaded C++
> code but not in multithreaded D code?  I'm assuming, since there didn't 
> seem
> to be any resource contention in the C++ version, that there was no 
> locking
> for the memory allocation, but all of the allocation was done through STL
> rather than explicitly, so I don't really know.  However, the code 
> actually
> worked like this, so I never really gave it much thought until today.  Is 
> it
> just a minor miracle that this worked at all w/o me explicitly using some 
> kind
> of lock?
>
> By the way, if it's implementation defined, my C++ compiler is MinGW GCC 
> 3.45.

I think it may be the way the GC is implemented.  If it needs memory, it 
first tries running a collect cycle (not sure if there is some algorithm to 
limit running the GC) before trying to get more memory from the OS.  If you 
put the allocations in an inner loop, you are running collect cycles more 
frequently (you should be able to see this by profiling the code).  I'm not 
sure of the algorithm to decide when to run, but worst case, it is every 
time you allocate and there is no free space left.

Most memory allocation schemes that I've seen need to lock the heap, so it 
is always a point of contention.  However, with the new shared/unshared 
thing, it looks like D might be advancing past that obstacle.  But the GC 
really needs some work to be smarter about how many times it runs.  When 
implementing dcollections, I created a custom allocator that allocates pages 
of elements at once instead of individual elements.  This saw a huge 
speedup, because I'm not running the GC as frequently.

-Steve