Fun project - faster associative array algorithm

Marco Leise via Digitalmars-d digitalmars-d at puremagic.com
Wed Apr 8 11:42:17 PDT 2015


What's a faster algorithm? There are some that deal with
collisions differently but might be slower with good hash
distributions. You can improve iteration speed by storing only
an index into a dense array of values, but it eats more memory.

As far as I can see from my own experiments with HAMT
(http://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf)
the time it takes to compute hashes and do a binary comparison
for e.g. strings is so huge that the actual AA algorithm
doesn't really matter. If anyone managed to make string AAs
faster by swapping out the AA implementation I'd be surprised.

Instead I focuses on using malloc over GC and borrowed w0rp's
idea of not hashing integral data types such as int, char or
bool. They are now their own hash, which is dangerous in terms
of hash distribution, but acts as a real turbo boost over the
built-in AA (+700% faster for lookups in my tests with 100%
hit rate.)

May be important:

I found that Tuple generates a hash by addition of the hashes
of its fields. This means that a coordinate tuple like
Tuple!(int, int) will collide with mirror coordinates.
E.g.: tuple(2,5) has the same hash as tuple(5,2) and so on.
Furthermore the hash functions for each field are virtual
function calls from a TypeInfo object that contains further
branching to determine whether the type has an overridden
toHash or the default implementation (SuperFastHash over
binary representation) is used.
I found that:

  enum hasher = &typeid(T).getHash;
  hasher(&t);

already improved hashing speed for gdc a bit. Without looking
at the generated assembly my wild guess is that
de-virtualization failed. TypeInfo objects should be known at
compile time and getHash and other calls to it inlined. Even
the dynamic branch to check for overridden toHash is known at
compile-time.

-- 
Marco



More information about the Digitalmars-d mailing list