Build in associated array (DIctionary) needs some enhancements

Sun Mar 30 17:20:30 UTC 2025

On Sunday, 30 March 2025 at 16:30:34 UTC, An Pham wrote:
> 1. No function to create with desired capability
> The initial size is default to 8 and increased of power of 2. 
> There are two problems with this scheme. To populate to size of 
> few thousands, there are too many memory relocations and 
> rebuild the list. The other problem is wasted of unused slots; 
> example a list of 2_100, the list will have 4_096 of slots

The memory allocation of the bucket list is not significant, 
neither is having the bucket list be larger than needed. And the 
growth factor is actually 4.

For an AA, you *want* to have a load factor that is not 1, or the 
hash lookup creeps towards linear complexity.

>
> 2. Cascade of collision
> If hash causes a collision, it will use the subsequent slot as 
> an overflow. However, if a next item hash points to that 
> subsequent slot and it is already occupied by that overflow 
> item, hence the cascade collision

This is not the case. The algorithm to find an appropriate slot 
is not linear in this fashion. Please try reading the code again.

> 3. Duplicate hash calculation
> For a key that fits into register, size_t, there is no hash 
> calculation but when it is added into the list, it will apply 
> final MurmurHash2; for a key size bigger than size_t, it will 
> do hash calculation with applied final MurmurHah2 and later 
> added to the list, it will apply MurmurHash2 again. There are 
> two problems with this scheme, the first one is obvious 
> un-needed calculation; the other is if an integer key value 
> fitted into list length/capability, applied MurmurHash2 
> potentially causes hash collision

The hash function is used once, and is calculated by the 
TypeInfo. If you want a different hash algorithm, you can create 
a type and define toHash.

> Some suggestions

Please implement your suggestions, test the performance, and get 
back with numbers. It's often quite surprising what makes the 
difference in performance! So suggesting changes without testing 
isn't fruitful.

-Steve