AA rehash threshold
Steven Schveighoffer via Digitalmars-d
digitalmars-d at puremagic.com
Tue Nov 18 18:16:36 PST 2014
From all I remember about hash tables, you are supposed to minimize the
number of collisions when adding/looking up data. This helps keep the
O(1) lookup property intact.
Reading wikipedia
(http://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing):
"To keep the load factor under a certain limit, e.g. under 3/4, many
table implementations expand the table when items are inserted. For
example, in Java's HashMap class the default load factor threshold for
table expansion is 0.75 and in Python's dict, table size is resized when
load factor is greater than 2/3."
Well, I just realized that D AA's "load factor" is 4 (see below). That
means, for a hash table with 4 buckets, you need to add 17 items to
generate a rehash. It also means that you are guaranteed to have one
bucket with at least 4 elements on it before a rehash. And that's on an
evenly spread hash table. In most cases, we would see buckets with
around 8-10 elements.
I don't think this is optimal, but I'm nervous about correcting this --
going from a load factor of 4 to a load factor of 0.75 would be a
tremendous jump. I'm not certain how the interaction with the GC and the
list of primes in the AA runtime will fare with this new load.
Any thoughts from the experts out there?
-Steve
Code that uses factor of 4:
https://github.com/D-Programming-Language/druntime/blob/9e410b0bd91ead4439bf304c28c2fce511df436a/src/rt/aaA.d#L221
More information about the Digitalmars-d
mailing list