AA rehash threshold

Steven Schveighoffer via Digitalmars-d digitalmars-d at puremagic.com
Tue Nov 18 18:16:36 PST 2014


 From all I remember about hash tables, you are supposed to minimize the 
number of collisions when adding/looking up data. This helps keep the 
O(1) lookup property intact.

Reading wikipedia 
(http://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing):

"To keep the load factor under a certain limit, e.g. under 3/4, many 
table implementations expand the table when items are inserted. For 
example, in Java's HashMap class the default load factor threshold for 
table expansion is 0.75 and in Python's dict, table size is resized when 
load factor is greater than 2/3."

Well, I just realized that D AA's "load factor" is 4 (see below). That 
means, for a hash table with 4 buckets, you need to add 17 items to 
generate a rehash. It also means that you are guaranteed to have one 
bucket with at least 4 elements on it before a rehash. And that's on an 
evenly spread hash table. In most cases, we would see buckets with 
around 8-10 elements.

I don't think this is optimal, but I'm nervous about correcting this -- 
going from a load factor of 4 to a load factor of 0.75 would be a 
tremendous jump. I'm not certain how the interaction with the GC and the 
list of primes in the AA runtime will fare with this new load.

Any thoughts from the experts out there?

-Steve

Code that uses factor of 4: 
https://github.com/D-Programming-Language/druntime/blob/9e410b0bd91ead4439bf304c28c2fce511df436a/src/rt/aaA.d#L221


More information about the Digitalmars-d mailing list