Memory Allocation and Locking
Sean Kelly
sean at invisibleduck.org
Fri Aug 22 12:50:38 PDT 2008
dsimcha wrote:
> I guess this really is more of a C++ question then a D question, but it's kind
> of both, so what the heck?
>
> Anyhow, I had some old C++ code laying around that I've been using, and I had
> added some quick and dirty multithreading of embarrassingly parallel parts of
> it to it to speed it up. Lo and behold, it actually worked. A while later, I
> decided I needed to extend it in ways that couldn't be done nearly as easily
> in C++ as in D, and it wasn't much code, so I just translated it to D.
>
> First iteration was slow as dirt because of resource contention for memory
> allocation. Put in a few ugly hacks to avoid all memory allocation in inner
> loops (i.e. ~= ), and now the D version is faster than the C++ version.
>
> Anyhow, the real quesiton is, why is it that I can get away w/ heap
> allocations in inner loops (i.e. vector.push_back() ) in multithreaded C++
> code but not in multithreaded D code? I'm assuming, since there didn't seem
> to be any resource contention in the C++ version, that there was no locking
> for the memory allocation, but all of the allocation was done through STL
> rather than explicitly, so I don't really know. However, the code actually
> worked like this, so I never really gave it much thought until today. Is it
> just a minor miracle that this worked at all w/o me explicitly using some kind
> of lock?
You can get away with it in C++ because there are no garbage collection
cycles occurring when the allocator runs low on memory. To easiest way
to approach an apples-apples comparison would be to disable the GC at
the start of your program.
Another performance issue for some programs is a result of how the GC
allocator obtains memory from the OS. It does so in bite-sized blocks,
so if your app does a ton of allocation and then runs the GC is not only
collecting periodically but then it's also obtaining a chunk of memory
from the OS to store the latest allocation (with some extra room for
additional blocks), then it's doing the same thing again, etc. The
Tango GC provides a reserve() method to optimize for this situation by
allowing the user to have the GC pre-build a pool of N bytes for use by
future allocations. In my testing this increased app performance by 50%
or more given certain program designs.
Sean
More information about the Digitalmars-d-learn
mailing list