storing the hash multiplier instead of the hash value

Lionello Lunesu lio at lunesu.remove.com
Mon Mar 22 17:13:45 PDT 2010


On 23-3-2010 1:51, Andrei Alexandrescu wrote:
> Currently, each entry in a D hashtable stores the hash of the key for
> efficiency purposes.
> 
> There is a bit of redundancy: as soon as you entered a hash bucket you
> know that the hashkey % tablesize is a specific number, call it k. But k
> is not enough information to restore the hash, so the actual hash gets
> stored for each entry.
> 
> I'm thinking of exploiting that redundancy by storing the hash
> multiplier, not the hash value. Instead of storing h in each slot, store
> m = h / tablesize. Then you can restore the actual h by writing:
> 
> restored_h = m * tablesize + k;
> 
> k is known for each given bucket (it's the index of the bucket) and m is
> what gets stored per entry.
> 
> What's the advantage of this approach? Well the advantage is that m is a
> small number. Any hash function will try to disperse the hash value as
> much as possible between the 32 available bits. But m will be a smaller
> number, and therefore will be more grouped and will engender fewer false
> pointers.
> 
> Would this help?

Probably not. hash is an 'int' and this 'm' will still use up the same
space as an int, I reckon.

Even if it ends up being 16-bit and can be put in a short, alignment
will probably cause it to consume 32-bits anyway. What's more the 16-bit
access (not to mention the multiplier) will slow the AA down.

I say, leave it be.

L.




More information about the Digitalmars-d mailing list