AA rehash threshold

deadalnix via Digitalmars-d digitalmars-d at puremagic.com
Tue Nov 18 18:46:14 PST 2014


On Wednesday, 19 November 2014 at 02:16:36 UTC, Steven
Schveighoffer wrote:
> From all I remember about hash tables, you are supposed to 
> minimize the number of collisions when adding/looking up data. 
> This helps keep the O(1) lookup property intact.
>

That is the theory, but the real world is sometime different. It
depends on how you resolve collisions. If you do it via some kind
of linked list, then yes, you must remove all collisions.

But if you do it by putting the element in a slot close to its
actual position, then thing gets different. In that case, you
don't want to minimize collision, but to minimize the distance to
the object, in order for it to be in the cache line.

After all, unless your hash table is very small, the first hit is
most likely a cache miss, meaning ~300 cycles. at this point the
cache line is in L1, with a 3 cycle access time. So accessing
several element in the same cache line is often preferable.

You may want to read about robin hood hash table for a simple
example of this. These hash tables are fast even with a very high
collision rate.


More information about the Digitalmars-d mailing list