What hashing algorithm is used for the D implementation of associative arrays?

safety0ff via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Aug 14 15:20:19 PDT 2014


On Thursday, 14 August 2014 at 13:10:58 UTC, bearophile wrote:
>
> D AAs used to be not vulnerable to collision attacks because 
> they resolved collisions building a red-black tree for each 
> bucket. Later buckets became linked lists for speed,

Slight corrections:
It was a effectively a randomized BST, it used the hash value + 
comparison function to place the elements in the tree.
E.g. The AA's node comparison function might be:

    if (hash == other.hash)
       return value.opCmp(other.value);
    else if (hash < other.hash)
       return -1;
    return 1;

The hash function has a significant influence on how balanced the 
BST will be.
Insertion and removal order also have performance influence since 
rebalancing was only done when growing the AA.
It had no performance guarantees.

I believe it was removed to reduce memory consumption, see the 
Mar 19 2010 cluster of commits by Walter Bright to aaA.d.
Since GC rounds up allocations to powers of two for small 
objects, the additional pointer doubles the allocation size per 
node.

A template library based AA implementation should be able to 
handily outperform built-in AAs and provide guarantees.
Furthermore, improved memory management could be a significant 
win.

Fun fact:
The AA implementation within DMD still uses the "randomized" BST 
though the hash functions are very rudimentary.


More information about the Digitalmars-d-learn mailing list