storing the hash multiplier instead of the hash value

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Mar 22 10:51:37 PDT 2010


Currently, each entry in a D hashtable stores the hash of the key for 
efficiency purposes.

There is a bit of redundancy: as soon as you entered a hash bucket you 
know that the hashkey % tablesize is a specific number, call it k. But k 
is not enough information to restore the hash, so the actual hash gets 
stored for each entry.

I'm thinking of exploiting that redundancy by storing the hash 
multiplier, not the hash value. Instead of storing h in each slot, store 
m = h / tablesize. Then you can restore the actual h by writing:

restored_h = m * tablesize + k;

k is known for each given bucket (it's the index of the bucket) and m is 
what gets stored per entry.

What's the advantage of this approach? Well the advantage is that m is a 
small number. Any hash function will try to disperse the hash value as 
much as possible between the 32 available bits. But m will be a smaller 
number, and therefore will be more grouped and will engender fewer false 
pointers.

Would this help?


Andrei



More information about the Digitalmars-d mailing list