Memory leak with dynamic array

Sun Apr 11 20:41:42 PDT 2010

On Sun, 11 Apr 2010 12:50:11 -0400, Joseph Wakeling  
<joseph.wakeling at gmail.com> wrote:

> I was very happy to see that D _does_ have a 'reserve' function for  
> arrays,
> which I had been missing compared to C++ (it's not mentioned in the array
> docs).

It's new.  It probably should be mentioned in the spec, but it's a  
druntime thing.

>
> Still, I don't think that pre-reserving the memory per se is the  
> influencing
> factor on the differences in performance.

No, and like bearophile said, the D array is a compromise between  
performance and flexibility.  There are amazing ways to use D arrays that  
you could never do with C++ vectors.

> Note that in C++ the memory is not preassigned either.  The difference
> between the performance of these pieces of code is striking -- on my
> machine the D example takes about 70--75 seconds to run, whereas the
> C++ example speeds through it in 10s or less.

The C++ example is reallocating memory, freeing memory it is no longer  
using.  It also manually handles the memory management, allocating larger  
and larger arrays in some algorithmically determined fashion (for example,  
multiplying the length by some constant factor).  This gives it an edge in  
performance because it does not have to do any costly lookup to determine  
if it can append in place, plus the realloc of the memory probably is  
cheaper than the GC realloc of D.

If you want to compare apples to apples (well, probably more like red  
apples to green apples), you need to do these things in a struct for D.  I  
had thought the D appender class would do the trick, but as you stated  
below, it's even slower.  This needs to be remedied.

> D also uses about 20% more memory than the C++ even though the C++ code
> declares a higher capacity for the vector (above 8 million) than D does
> for the array (a little over 5 million).

D does not assume you stopped caring about the memory being pointed to  
when it had to realloc.  Therefore, it leaves it around in case you are  
still using it.  You can also expect more overhead in the GC because it  
tends to hang on to memory in case it wants to use it again, or because it  
hasn't collected it yet.  This is true of most GC-based languages.

For example, you can do this in D:

int[] arr = new int[50];
int *arrelem = &arr[5];

for(int i = 0; i < 10000; i++)
   arr ~= i;

*arrelem = 12345; // valid.

You can't do the same thing with C++ vectors, when they reallocate, the  
memory they used to own could be freed.  This invalidates all pointers and  
iterators into the vector, but the language doesn't prevent you from  
having such dangling pointers.  Using one of them can result in memory  
corruption, one of the worst bugs.  D tries to eliminate as much memory  
corruption problems as possible.

>
> I don't know if it was naive to assume that D's dynamic arrays would be
> equivalent to vectors in C++.  But it's clear that the D array appender
> ~= is much slower than the C++ vector push_back().

But is much safer, and supports safe slicing.  Compromises.

> Using an Appender class and put() (from std.array) is even slower,
> despite the std.array docs recommending this over ~. :-(

This must be fixed, the appender should be blazingly fast at appending  
(almost as fast as C++), with the drawback that the overhead is higher.

> It's disappointing because I'm loving so much of D's syntax, but I can
> write far more efficient code (with less explicit memory management)
> in C++ ...

You haven't done much with it yet.  When you start discovering how much D  
takes care of, you will be amazed :)

array appending isn't the fastest operation, but it is safe, and safety  
can be worth infinitely more than speed sometimes.

The thing about D is it *can* be fast and unsafe, just as fast and unsafe  
as C, but that's not the default.

-Steve