Fast hashtable

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue Feb 28 13:00:05 PST 2017


On Tue, Feb 28, 2017 at 12:57:14PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> This is of possible interest:
> https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/ --

Related to this, recently I found some interesting papers for large
external (i.e., on-disk) hash tables:

	http://www.itu.dk/people/pagh/papers/linear.pdf
	http://www.weizhewei.com/papers/pods020-wei.pdf

AIUI, blocked probing (1st paper) is similar to Robin Hood hashing, in
that inserting an entry may cause an existing entry to be moved out of
its occupied slot to a different one, but blocked probing also has
additional interesting characteristics:

- Scanning is done in blocks of size 2^i with starting slots of index
  2^i for incrementing i. This structure makes it cache-oblivious, and
  thus able to take advantage of the modern cache hierarchy without
  needing cache-specific tuning.

- Something special about the 2^i block sizes makes the math just work
  out (see 2nd paper), so that lookups have expected average 1 +
  1/2^Ω(b) I/O operations, where b is the block size (provided the load
  factor is bounded away from 1), which is a marked improvement over
  plain linear probing which has expected 1 + O(α/b) I/O operations,
  where α is the load factor.

I didn't look closely enough at the analysis to know for sure, but it
seems that since the analysis is cache-oblivious, the O(1 + 1/2^Ω(b))
I/O operations should also generalize to cache misses as well (as part
of the memory hierarchy, if you regard secondary storage as the lowest
level of the hierarchy).  So I'm expecting this might be even faster
than the Robin Hood hashing in your linked blog.


T

-- 
Sometimes the best solution to morale problems is just to fire all of the unhappy people. -- despair.com


More information about the Digitalmars-d mailing list