Memory leak with dynamic array

Sun Apr 11 17:42:18 PDT 2010

Joseph Wakeling:

> I was very happy to see that D _does_ have a 'reserve' function for arrays,
> which I had been missing compared to C++ (it's not mentioned in the array
> docs).

'reserve' is not a dynamic array method, it's a free function contained in the Phobos default module that also contains object, see the docs page:
http://www.digitalmars.com/d/2.0/phobos/object.html
It's not mentioned in the array docs because the 'reserve' and related functions are new, they come from a recent change in how dynamic arrays are implemented in D.

> Still, I don't think that pre-reserving the memory per se is the influencing
> factor on the differences in performance.

You are right. The difference in performance comes from mostly from the way D arrays are designed. I agree with you that dynamic array append in D is slow, in absolute terms. Comparing the performance of D code with C++ is a good thing to do, it gives a baseline for comparison.

D dynamic arrays are more flexible than C++ vector, they can be sliced, such slicing is O(1), and the slices are seen by the language just like other arrays. So you pay the price of some performance for such increased flexibility. The idea here is that the built-in data types must be as flexible as possible even if their performance is not so high, so they can be used for many different purposes. Then D standard library will have specialized data structures that are faster thanks to being more specialized and less flexible. In D dynamic arrays some of the performance price is also paid for the automatic memory management, for the GC that's not a precise GC (for example if your array has some empty items at the end past its true length, the GC must ignore them).

I think D dynamic arrays have a lower append performance because they don't contain the capacity in a handy place (their new design has a capacity, but it's not beside length and pointer). I don't know how the D array append works in situations of multithreading. If you don't require the slicing flexibility, you can use D language to implement a C++-style Vector in D. If you implement a struct that contains capacity, length and pointer to data, and you don't let the GC scan the memory of the data, and you add method overloading for the ~= that doubles the size of the memory with a realloc once in a while, you probably have a performance similar to the C++ one (not equal because the performance profile of the GC in allocating memory chunks is a little more complex than the C malloc). See the bottom of this post.

> D also uses about 20% more memory than the C++

I don't know why, I presume it's caused by the GC bookkeeping, it must be careful to avoid memory fragmentation. Currently the D GC is not so refined.

> Using an Appender class and put() (from std.array) is even slower,
> despite the std.array docs recommending this over ~. :-(

That was useful and designed for the precedent design of the dynamic arrays. Maybe it can be removed from Phobos now...

> It's disappointing because I'm loving so much of D's syntax, but I can
> write far more efficient code (with less explicit memory management)
> in C++ ...

As you know all data structures are the result of compromises, they are a balance of performance for their different uses, there is no perfect data structure; in practice D arrays are not designed for an efficient "en mass" appending. A D dynamic array is seen locally as a struct of 2 words. So it's cheap to copy and pass around. Why they don't contain the capacity too in such struct is a long story.

If you like to write benchmarks you will also find performance difference between for example a C++ hash_map and the built-in associative arrays of D.

Beside the design (that for example can make it hard to implement append-efficient dynamic arrays in D), D language is also rather new still, so far the focus was on its design, not in tuning the performance of its implementation. It's being several years they are performance-tuning C++ implementations :-)

-------------------

Below three implementations in D and C++. The Vector(T) is bare-bones. It missed most features and it's unsafe.

In the D2 version I don't use the GC, and the memory is deallocated when arr gets out of scope.

Timings, T=double, N=5_000_000, NLOOPS=100, seconds:
  D1:  38.20
  D2:   8.32
  C++:  6.65

You can see the performance of the append in D2 is a matter of data structure implementation, it's not a language problem :-)

With LDC (once we'll have a D2 version of it) the performance of D2 can probably be the same as the C++. DMD maybe loses a little here because it's not so good at inlining, or maybe because the C++ vector is better than this D2 code.

-------------------

// First D version
import std.stdio: printf;

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    T[] arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        arr.assumeSafeAppend;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// Second D version
import std.stdio: printf;
import std.c.stdlib: malloc, realloc, free;

struct Vector(T) {
    enum double FAST_GROW = 2;
    enum double SLOW_GROW = 1.3;
    enum int STARTING_SIZE = 4;
    enum int BIG_MEM = (1 << 27) / T.sizeof;

    T* ptr;
    int length, capacity;

    ~this() {
        free(ptr);
        ptr = null;
        length = 0;
        capacity = 0;
    }

    void opOpAssign(string Op:"~=")(T item) {
        if (length >= capacity) {
            if (capacity == 0)
                _grow_capacity(STARTING_SIZE);
            else if (capacity < BIG_MEM)
                _grow_capacity(cast(int)(capacity * FAST_GROW));
            else
                _grow_capacity(cast(int)(capacity * SLOW_GROW));
        }

        ptr[length] = item;
        length++;
    }

    void _grow_capacity(int new_capacity) {
        ptr = cast(T*)realloc(ptr, new_capacity * T.sizeof);
        assert(ptr);
        this.capacity = new_capacity;
    }
}

void main() {
    alias double T;
    enum int N = 5_000_000;
    enum int NLOOPS = 100;
    Vector!T arr;

    foreach (i; 0 .. NLOOPS) {
        arr.length = 0;
        printf("Array capacity = %d\n", arr.capacity);

        foreach (uint j; 0 .. N)
            arr ~= j;

        printf("At iteration %u, x has %u elements.\n", i, arr.length);
    }
}

------------------

// C++ version

#include "stdio.h"
#include <vector>

int main() {
    typedef double T;
    const unsigned int N = 5000000;
    const unsigned int NLOOPS = 100;
    std::vector<T> arr;

    for (unsigned int i = 0; i < NLOOPS; i++) {
        arr.resize(0);
        printf("Array capacity = %d\n", arr.capacity());

        for (unsigned int j = 0; j < N; j++)
            arr.push_back(j);

        printf("At iteration %u, x has %u elements.\n", i, arr.size());
    }
}

Bye,
bearophile