Build in associated array (DIctionary) needs some enhancements
An Pham
home at home.com
Sun Mar 30 16:30:34 UTC 2025
1. No function to create with desired capability
The initial size is default to 8 and increased of power of 2.
There are two problems with this scheme. To populate to size of
few thousands, there are too many memory relocations and rebuild
the list. The other problem is wasted of unused slots; example a
list of 2_100, the list will have 4_096 of slots
2. Cascade of collision
If hash causes a collision, it will use the subsequent slot as an
overflow. However, if a next item hash points to that subsequent
slot and it is already occupied by that overflow item, hence the
cascade collision
3. Duplicate hash calculation
For a key that fits into register, size_t, there is no hash
calculation but when it is added into the list, it will apply
final MurmurHash2; for a key size bigger than size_t, it will do
hash calculation with applied final MurmurHah2 and later added to
the list, it will apply MurmurHash2 again. There are two problems
with this scheme, the first one is obvious un-needed calculation;
the other is if an integer key value fitted into list
length/capability, applied MurmurHash2 potentially causes hash
collision
Some suggestions
1. Use prime number length/capability to avoid wasted of unused
slots
2. Add a way to initialize list with length/capability
3. A way to avoid final MurmurHash2 or get rid of it completely
and add function/delegate to customized hash calculation
4. Use different scheme to handle collision such as plain array
to hold key-value pair and hash list with integer index pointed
to key-value pair. This scheme has two advantages; key-value pair
will have integer index for collision and associated list will
have order of items added when do iteration (possible use in
json...); The dis-advantage is indirect access and it will take
more work when doing deletion. However, deletion is rarely used
and not a big deal for this use case
Happy codings
More information about the Digitalmars-d
mailing list