Memory leak with dynamic array

Sun Apr 11 09:50:11 PDT 2010

Thanks very much for the extended and detailed explanations.  The assumeSafeAppend
function
indeed fixes the memory leakage. :-)

In reply to some comments ...

> > That is to be expected.  The append function is not the most efficient
> > thing.  It still must do a lot of work just to look up the valid length
> > and verify appending can be done, whereas the v2 "append" is a single
> > instruction!
>
> The v2 "append" seems two instructions:
> arr[arr_len] = j;
> arr_len++;
>
> And the C++ v1 too has to verify if the appending can be done, comparing
> the size with the capacity and if so assigning the item and increasing the
> size.
>
> I understand that the D code has to work more here, and I presume your code
> can't be improved. But those timings are a little sad anyway :-|
>
> > A better method is to set the length, and then write the data.
>
> What I have done in the D v2 example.

I was very happy to see that D _does_ have a 'reserve' function for arrays,
which I had been missing compared to C++ (it's not mentioned in the array
docs).

Still, I don't think that pre-reserving the memory per se is the influencing
factor on the differences in performance.  To see why, compare these two pieces
of code -- a non-leaking D code, and the equivalent in C++:

////////////// D example
/////////////////////////
import std.stdio;

void main()
{
    double[] x;

    foreach(uint i;0..100) {
        x.length = 0;

        writefln("Array capacity (1) = %u",x.capacity);
        assumeSafeAppend(x);
        writefln("Array capacity (2) = %u",x.capacity);

        foreach(uint j;0..5000000)
            x ~= j;

        writefln("At iteration %u, x has %u elements.",i,x.length);
    }
}
/////////////////////////

//////////// C++ example
/////////////////////////
#include <vector>
#include <iostream>

using namespace std;

int main()
{
    vector<double> x;

    for(unsigned int i=0;i!=100;++i) {
        x.resize(0);

        cout << "Vector capacity: " << x.capacity() << endl;

        for(unsigned int j=0;j!=5000000;++j)
            x.push_back(j);

        cout << "At iteration " << i << ", x has " << x.size() << " elements." <<
endl;
    }

    return 0;
}
/////////////////////////

Note that in C++ the memory is not preassigned either.  The difference
between the performance of these pieces of code is striking -- on my
machine the D example takes about 70--75 seconds to run, whereas the
C++ example speeds through it in 10s or less.

D also uses about 20% more memory than the C++ even though the C++ code
declares a higher capacity for the vector (above 8 million) than D does
for the array (a little over 5 million).

I don't know if it was naive to assume that D's dynamic arrays would be
equivalent to vectors in C++.  But it's clear that the D array appender
~= is much slower than the C++ vector push_back().

Using an Appender class and put() (from std.array) is even slower,
despite the std.array docs recommending this over ~. :-(

It's disappointing because I'm loving so much of D's syntax, but I can
write far more efficient code (with less explicit memory management)
in C++ ...