Memory Allocation and Locking
Steven Schveighoffer
schveiguy at yahoo.com
Fri Aug 22 06:19:54 PDT 2008
"dsimcha" wrote
>I guess this really is more of a C++ question then a D question, but it's
>kind
> of both, so what the heck?
>
> Anyhow, I had some old C++ code laying around that I've been using, and I
> had
> added some quick and dirty multithreading of embarrassingly parallel parts
> of
> it to it to speed it up. Lo and behold, it actually worked. A while
> later, I
> decided I needed to extend it in ways that couldn't be done nearly as
> easily
> in C++ as in D, and it wasn't much code, so I just translated it to D.
>
> First iteration was slow as dirt because of resource contention for memory
> allocation. Put in a few ugly hacks to avoid all memory allocation in
> inner
> loops (i.e. ~= ), and now the D version is faster than the C++ version.
>
> Anyhow, the real quesiton is, why is it that I can get away w/ heap
> allocations in inner loops (i.e. vector.push_back() ) in multithreaded C++
> code but not in multithreaded D code? I'm assuming, since there didn't
> seem
> to be any resource contention in the C++ version, that there was no
> locking
> for the memory allocation, but all of the allocation was done through STL
> rather than explicitly, so I don't really know. However, the code
> actually
> worked like this, so I never really gave it much thought until today. Is
> it
> just a minor miracle that this worked at all w/o me explicitly using some
> kind
> of lock?
>
> By the way, if it's implementation defined, my C++ compiler is MinGW GCC
> 3.45.
I think it may be the way the GC is implemented. If it needs memory, it
first tries running a collect cycle (not sure if there is some algorithm to
limit running the GC) before trying to get more memory from the OS. If you
put the allocations in an inner loop, you are running collect cycles more
frequently (you should be able to see this by profiling the code). I'm not
sure of the algorithm to decide when to run, but worst case, it is every
time you allocate and there is no free space left.
Most memory allocation schemes that I've seen need to lock the heap, so it
is always a point of contention. However, with the new shared/unshared
thing, it looks like D might be advancing past that obstacle. But the GC
really needs some work to be smarter about how many times it runs. When
implementing dcollections, I created a custom allocator that allocates pages
of elements at once instead of individual elements. This saw a huge
speedup, because I'm not running the GC as frequently.
-Steve
More information about the Digitalmars-d-learn
mailing list