Code to show

Wed Mar 5 09:20:16 PST 2008

One of the most important topics of D I don't understand yet regards what the (Phobos) GC does when I try to allocate memory in more manual ways or in mixed situations.
In this post I show some simple code, if you have some time I'd like you to take a look at it and tell me what possibile problems/bugs (mostly related to GC/memory) it may have (it's not fully tested yet, but it's short enough).
(Some days from now I'll try to post a second piece of code (regarding a Deque implementation of mine) with a more complex GC/memory situation).

I have read the "Allocating Uninitialized Arrays On The Stack" part in the D docs:
>The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since the uninitialized data consists of old D stack frames, it is highly likely that some of that garbage will look like references into the GC heap, and the GC memory will not get freed. This problem really does happen, and can be pretty frustrating to track down.<

That talks about the stack only, but is this problem possible for Garbage-Collected memory on the heap too?

In a small bioinformatics benchmark program the time taken to allocate a 40 MB matrix (about 3200^2 of int) was too much long for my tastes, so I have created the following very experimental code:

import std.c.stdlib: cmalloc = malloc, cfree = free;
import std.gc: gcmalloc = malloc, gcrealloc = realloc, hasNoPointers;

T[] NewVoidGCArray(T, bool pointers=true)(int n) {
    auto pt = cast(T*)gcmalloc(n * T.sizeof);
    static if (!pointers)
        hasNoPointers(pt);
    return pt[0 .. n];
}

void delVoidGCArray(T)(ref T a) {
    gcrealloc(a.ptr, 0);
    a = null;
}

T[][] NewVoidGCMatrix(T, bool pointers=true)(int nr, int nc) {
    auto pt = cast(T*)gcmalloc(nr * nc * T.sizeof);
    static if (!pointers)
        hasNoPointers(pt);
    auto result = new T[][](nr);
    int pos;
    foreach(ref row; result) {
        row = pt[pos .. pos+nc];
        pos += nc;
    }
    return result;
}

void delVoidGCMatrix(T)(ref T[][] a) {
    gcrealloc(a.ptr, 0);
    a = null;
}

//-----------------------------------

T[] NewVoidCArray(T)(int n) {
    auto pt = cast(T*)cmalloc(n * T.sizeof);
    return pt[0 .. n];
}

void delVoidCArray(T)(ref T a) {
    cfree(a.ptr);
    a = null;
}

T[][] NewVoidCMatrix(T)(int nr, int nc) {
    auto pt = cast(T*)cmalloc(nr * nc * T.sizeof);
    auto result = new T[][](nr);
    int pos;
    foreach(ref row; result) {
        row = pt[pos .. pos+nc];
        pos += nc;
    }
    return result;
}

void deleteVoidCMatrix(T)(ref T[][] a) {
    cfree(a.ptr);
    a = null;
}

Notes:
- If you spot some wrong or dangerous things please tell me.
- I know most programs spend most time using data structures, and not just creating them, so in most programs this can't speed up much the code.
- Even if this code is "safe" (and I am far from sure of that), it's longer than the normal D code, so I am going to use it only in very special situations, in speed-critical situations, and probably never in long-ish programs that must be realiable enough.
- If I use such NewVoid[C|CG][Matrix|Array] I am going to fill them as soon as possible anyway, because void data is a bomb anyway, and it requires care.
- If that T type is non-reference data (like int, float, etc) then hasNoPointers() is useful. If T is an object, then the bool "pointers" must be true (elsewhere I have written a recursive template that automatically tells if a type is/contains a refence type or not, but I haven't used it here) to allow the GC manage those object referces correctly, keeping them alive if I remove their reference elsewhere, etc.
- I presume putting D objects in one of those arrays allocated from the C heap (NewVoidCArray/ NewVoidCMatrix) is a too much dangerous thing to do, I don't know if it's possible, but I think I'll always avoid to do it.
- The size of those array/matrices can be defined at run time. And their memory is contigous, this means a 3200^2 matrix requires about less than about 41 MB, instead of about 52 (because the sizes from the memory pool become rounded, etc). Being contigous maybe there is a bit more cache coherence.
- Those arrays/matrices can't change size once created, and those matrices must be rectangular, but I can often live with this. They can be used with the normal D syntax, avoiding the slow down D has when you create a struct to wrap such things, with the operator overloading to use [] etc.
- All those slices, like pt[0 .. n] don't actually copy the memory, so they are fast. The matrix uses an array of arrays, this wastes some memory, but I can live with that, and it allows me a standard syntax and to avoid structs/classes (another possible solution is to use pointers, so I can use the C syntax, but I lose the handy .length attribute, and the nice array bound cheeks when not in -release mode).
- With one of this functions, I have written short code that's as fast as C code, but half the lines of code (and it produces the correct result). Probably I'll show you the interesting benchmark I was working on.

Once I'll slowly start to understand how the GC works, I'll be able to create more complex data structures for D, like the Deque I am working on.

Bye and thank you for any help you can give,
bearophile