Fun project - faster associative array algorithm

Martin Nowak via Digitalmars-d digitalmars-d at puremagic.com
Tue Apr 7 11:35:04 PDT 2015


On 04/07/2015 07:24 PM, Walter Bright wrote:
> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c
> 
> uses quadratic probing instead of linked lists, but this is not cache
> friendly, either.

Quite to the contrary, it's very cache friendly.
The triangular numbers are

1 3 6 10 15 21

With 8 byte per entry as in stringtable you need 2 collisions on the
same "bucket" to land 48 bytes away, which might still be the same
cacheline.

The problem with linear probing is primary clustering, especially when
your hash isn't too good, which can lead to long streaks of neighbouring
buckets that are filled. Any insert that hashes into such a lump
(chances are increasing the bigger it becomes) will make it grow even
further.

For higher load factor such as 0.75 this can lead to a huge variance of
the probe sequence length and extreme lengths on the upper 95/99
percentiles.

Here is the ticket for open addressing, btw.
https://issues.dlang.org/show_bug.cgi?id=14385


More information about the Digitalmars-d mailing list