Associative Array implementation

dsimcha dsimcha at yahoo.com
Wed Apr 8 16:41:57 PDT 2009


It seems that Phobos's associative array implementation is horribly
inefficient in terms of storage space and the amount of false pointers it
generates for the GC.  A simple implementation that would be more space
efficient and GC-friendly is to use a pair of parallel arrays instead of an
array of pointers to structs to represent the data.  This would allow the key
block and the value block to be marked separately as having pointers or not.
In the current implementation, even if both the key and value are value types,
they are scanned by the GC because they're stored in the same struct with
pointers.  Then, we could use open addressing and probing to resolve
collisions instead of chaining.

Also, does anyone besides me have a use for an AA implementation that is
designed to be used with a second stack-based allocator (TempAlloc/SuperStack
as discussed here previously)?  I use it for calculating mutual information
and similar things where I need to build a temporary AA as part of an
algorithm, and it seems very quick in practice compared to traditional
heap-based AAs.  It works like this:

real someFunction(uint[] nums) {
    // New frame on TempAlloc.  Everything allocated after this
    // point gets freed when we leave the current scope.
    mixin(newFrame);

    // Allocate a StackHash with an array size of nums.length / 2.
    // This is the size of the array used internally, and is fixed.
    // If you add too many elements, a StackHash can't be rehashed,
    // so it gets slow.
    auto counts = StackHash!(uint, uint)(nums.length / 2);

    // Use like normal hash table.  Count occurrences of each num.
    foreach(elem; nums) {
        counts[elem]++;
    }

    // Do stuff with counts.

    return ans;
}



More information about the Digitalmars-d mailing list