Memory leak with dynamic array
Steven Schveighoffer
schveiguy at yahoo.com
Sun Apr 11 20:41:42 PDT 2010
On Sun, 11 Apr 2010 12:50:11 -0400, Joseph Wakeling
<joseph.wakeling at gmail.com> wrote:
> I was very happy to see that D _does_ have a 'reserve' function for
> arrays,
> which I had been missing compared to C++ (it's not mentioned in the array
> docs).
It's new. It probably should be mentioned in the spec, but it's a
druntime thing.
>
> Still, I don't think that pre-reserving the memory per se is the
> influencing
> factor on the differences in performance.
No, and like bearophile said, the D array is a compromise between
performance and flexibility. There are amazing ways to use D arrays that
you could never do with C++ vectors.
> Note that in C++ the memory is not preassigned either. The difference
> between the performance of these pieces of code is striking -- on my
> machine the D example takes about 70--75 seconds to run, whereas the
> C++ example speeds through it in 10s or less.
The C++ example is reallocating memory, freeing memory it is no longer
using. It also manually handles the memory management, allocating larger
and larger arrays in some algorithmically determined fashion (for example,
multiplying the length by some constant factor). This gives it an edge in
performance because it does not have to do any costly lookup to determine
if it can append in place, plus the realloc of the memory probably is
cheaper than the GC realloc of D.
If you want to compare apples to apples (well, probably more like red
apples to green apples), you need to do these things in a struct for D. I
had thought the D appender class would do the trick, but as you stated
below, it's even slower. This needs to be remedied.
> D also uses about 20% more memory than the C++ even though the C++ code
> declares a higher capacity for the vector (above 8 million) than D does
> for the array (a little over 5 million).
D does not assume you stopped caring about the memory being pointed to
when it had to realloc. Therefore, it leaves it around in case you are
still using it. You can also expect more overhead in the GC because it
tends to hang on to memory in case it wants to use it again, or because it
hasn't collected it yet. This is true of most GC-based languages.
For example, you can do this in D:
int[] arr = new int[50];
int *arrelem = &arr[5];
for(int i = 0; i < 10000; i++)
arr ~= i;
*arrelem = 12345; // valid.
You can't do the same thing with C++ vectors, when they reallocate, the
memory they used to own could be freed. This invalidates all pointers and
iterators into the vector, but the language doesn't prevent you from
having such dangling pointers. Using one of them can result in memory
corruption, one of the worst bugs. D tries to eliminate as much memory
corruption problems as possible.
>
> I don't know if it was naive to assume that D's dynamic arrays would be
> equivalent to vectors in C++. But it's clear that the D array appender
> ~= is much slower than the C++ vector push_back().
But is much safer, and supports safe slicing. Compromises.
> Using an Appender class and put() (from std.array) is even slower,
> despite the std.array docs recommending this over ~. :-(
This must be fixed, the appender should be blazingly fast at appending
(almost as fast as C++), with the drawback that the overhead is higher.
> It's disappointing because I'm loving so much of D's syntax, but I can
> write far more efficient code (with less explicit memory management)
> in C++ ...
You haven't done much with it yet. When you start discovering how much D
takes care of, you will be amazed :)
array appending isn't the fastest operation, but it is safe, and safety
can be worth infinitely more than speed sometimes.
The thing about D is it *can* be fast and unsafe, just as fast and unsafe
as C, but that's not the default.
-Steve
More information about the Digitalmars-d-learn
mailing list