Build in associated array (DIctionary) needs some enhancements
Steven Schveighoffer
schveiguy at gmail.com
Sun Mar 30 17:20:30 UTC 2025
On Sunday, 30 March 2025 at 16:30:34 UTC, An Pham wrote:
> 1. No function to create with desired capability
> The initial size is default to 8 and increased of power of 2.
> There are two problems with this scheme. To populate to size of
> few thousands, there are too many memory relocations and
> rebuild the list. The other problem is wasted of unused slots;
> example a list of 2_100, the list will have 4_096 of slots
The memory allocation of the bucket list is not significant,
neither is having the bucket list be larger than needed. And the
growth factor is actually 4.
For an AA, you *want* to have a load factor that is not 1, or the
hash lookup creeps towards linear complexity.
>
> 2. Cascade of collision
> If hash causes a collision, it will use the subsequent slot as
> an overflow. However, if a next item hash points to that
> subsequent slot and it is already occupied by that overflow
> item, hence the cascade collision
This is not the case. The algorithm to find an appropriate slot
is not linear in this fashion. Please try reading the code again.
> 3. Duplicate hash calculation
> For a key that fits into register, size_t, there is no hash
> calculation but when it is added into the list, it will apply
> final MurmurHash2; for a key size bigger than size_t, it will
> do hash calculation with applied final MurmurHah2 and later
> added to the list, it will apply MurmurHash2 again. There are
> two problems with this scheme, the first one is obvious
> un-needed calculation; the other is if an integer key value
> fitted into list length/capability, applied MurmurHash2
> potentially causes hash collision
The hash function is used once, and is calculated by the
TypeInfo. If you want a different hash algorithm, you can create
a type and define toHash.
> Some suggestions
Please implement your suggestions, test the performance, and get
back with numbers. It's often quite surprising what makes the
difference in performance! So suggesting changes without testing
isn't fruitful.
-Steve
More information about the Digitalmars-d
mailing list