Fast hashtable

Tue Feb 28 22:44:34 PST 2017

On Tuesday, 28 February 2017 at 21:00:05 UTC, H. S. Teoh wrote:
> On Tue, Feb 28, 2017 at 12:57:14PM -0500, Andrei Alexandrescu 
> via Digitalmars-d wrote:
>> This is of possible interest: 
>> https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/ --
>
> Related to this, recently I found some interesting papers for 
> large external (i.e., on-disk) hash tables:
>
> 	http://www.itu.dk/people/pagh/papers/linear.pdf
> 	http://www.weizhewei.com/papers/pods020-wei.pdf
>
> AIUI, blocked probing (1st paper) is similar to Robin Hood 
> hashing, in that inserting an entry may cause an existing entry 
> to be moved out of its occupied slot to a different one, but 
> blocked probing also has additional interesting characteristics:
>
> - Scanning is done in blocks of size 2^i with starting slots of 
> index
>   2^i for incrementing i. This structure makes it 
> cache-oblivious, and
>   thus able to take advantage of the modern cache hierarchy 
> without
>   needing cache-specific tuning.
>
> - Something special about the 2^i block sizes makes the math 
> just work
>   out (see 2nd paper), so that lookups have expected average 1 +
>   1/2^Ω(b) I/O operations, where b is the block size (provided 
> the load
>   factor is bounded away from 1), which is a marked improvement 
> over
>   plain linear probing which has expected 1 + O(α/b) I/O 
> operations,
>   where α is the load factor.
>
> I didn't look closely enough at the analysis to know for sure, 
> but it seems that since the analysis is cache-oblivious, the 
> O(1 + 1/2^Ω(b)) I/O operations should also generalize to cache 
> misses as well (as part of the memory hierarchy, if you regard 
> secondary storage as the lowest level of the hierarchy).  So 
> I'm expecting this might be even faster than the Robin Hood 
> hashing in your linked blog.
>
>
> T

I liked that article. I didn't really understand the point about 
implementation of modulo primes, maybe I missed something. Given 
that our man is doing modulo a 'known' value (he had a switch 
statement to get to them), why not do something rather cheaper 
than a compiler-expanded constant div/mod made up of multiplies 
and shifts

     const uint power2 = 512; // say, some 1 << n anyway
     const uint prime = 509; // some prime just below the power, 
some prime > power2/2

     static assert( power2 - 1 - prime < prime );

     x = x & ( power2 - 1 );
     x = ( x >= prime ) ? x - prime : x;

which is good news on my x86 with GDC -O3 (only 3 operations, and 
sub cmovx ) - all well provided you make sure that you are 
getting CMOVx not branches. I could work out the power from the 
prime using CTFE given a bit of thought. Maybe CTFE could even do 
the reverse?

Have I finally gone mad?