Fast hashtable
Cecil Ward via Digitalmars-d
digitalmars-d at puremagic.com
Tue Feb 28 22:44:34 PST 2017
On Tuesday, 28 February 2017 at 21:00:05 UTC, H. S. Teoh wrote:
> On Tue, Feb 28, 2017 at 12:57:14PM -0500, Andrei Alexandrescu
> via Digitalmars-d wrote:
>> This is of possible interest:
>> https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/ --
>
> Related to this, recently I found some interesting papers for
> large external (i.e., on-disk) hash tables:
>
> http://www.itu.dk/people/pagh/papers/linear.pdf
> http://www.weizhewei.com/papers/pods020-wei.pdf
>
> AIUI, blocked probing (1st paper) is similar to Robin Hood
> hashing, in that inserting an entry may cause an existing entry
> to be moved out of its occupied slot to a different one, but
> blocked probing also has additional interesting characteristics:
>
> - Scanning is done in blocks of size 2^i with starting slots of
> index
> 2^i for incrementing i. This structure makes it
> cache-oblivious, and
> thus able to take advantage of the modern cache hierarchy
> without
> needing cache-specific tuning.
>
> - Something special about the 2^i block sizes makes the math
> just work
> out (see 2nd paper), so that lookups have expected average 1 +
> 1/2^Ω(b) I/O operations, where b is the block size (provided
> the load
> factor is bounded away from 1), which is a marked improvement
> over
> plain linear probing which has expected 1 + O(α/b) I/O
> operations,
> where α is the load factor.
>
> I didn't look closely enough at the analysis to know for sure,
> but it seems that since the analysis is cache-oblivious, the
> O(1 + 1/2^Ω(b)) I/O operations should also generalize to cache
> misses as well (as part of the memory hierarchy, if you regard
> secondary storage as the lowest level of the hierarchy). So
> I'm expecting this might be even faster than the Robin Hood
> hashing in your linked blog.
>
>
> T
I liked that article. I didn't really understand the point about
implementation of modulo primes, maybe I missed something. Given
that our man is doing modulo a 'known' value (he had a switch
statement to get to them), why not do something rather cheaper
than a compiler-expanded constant div/mod made up of multiplies
and shifts
const uint power2 = 512; // say, some 1 << n anyway
const uint prime = 509; // some prime just below the power,
some prime > power2/2
static assert( power2 - 1 - prime < prime );
x = x & ( power2 - 1 );
x = ( x >= prime ) ? x - prime : x;
which is good news on my x86 with GDC -O3 (only 3 operations, and
sub cmovx ) - all well provided you make sure that you are
getting CMOVx not branches. I could work out the power from the
prime using CTFE given a bit of thought. Maybe CTFE could even do
the reverse?
Have I finally gone mad?
More information about the Digitalmars-d
mailing list