Reserving/Preallocating associative array?

Fri Dec 27 09:16:50 PST 2013

Am 27.12.2013 17:49, schrieb Gordon:
>
> Benjamin,
> Can you point me to your Hashmap implementation? I could perhaps use it
> to improve the timings even more.

https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d

This implementation depends on my own allocator design, but it should be 
possible to remove that dependeny quite easly by replacing all 
allocations / frees with malloc/free or GC.malloc / nothing. Just make 
sure that memory is not initialized beforehand (as new 
ubyte[sizeInBytes] would do) because that also has a considerable 
performance impact. Also when allocating with GC.malloc you should 
specify the GC.BlkAttr.NO_SCAN flag, because you know that your data 
does not contain any pointers. That way the GC will not scan that huge 
memory block and it should speed up collection a lot.

To improve the performance of the hashmap you can try:
- specifingy different hashing functions (via the hash policy). See 
https://github.com/Ingrater/thBase/blob/master/src/thBase/policies/hashing.d 
for more examples of hashing policies.
- Change the amount of always free entries in the hashmap (currently 
25%) for that change line 233, 207, 210 (not factored into a constant 
yet). Reducing the free entries might result in less cache misses, but 
more linear search, as this is a linear probing hashmap. Increasing the 
free entries might reduce the amount of linear search, but increase 
cache misses and increases memory usage.
- Compute a series of prime numbers, for which each prime number is at 
least twice as big as the previous one and use that as the possible 
sizes for the hashmap. Prime number sizes give better distribution of 
the items within the hashmap, therefor reduce the amount of linear 
searching neccessary and thus improve the hashmap performance.
- When searching for the next free spot in the hashmap, it currently 
just adds 1 to the previous index found (line 301). It is also possible 
to rehash the previous index (just put it into the hashing function 
again), which would give a new index that is more distant in the hashmap 
from the current one. This would again improve the distribution of the 
items in the hashmap, and thus reduce linear search time, but may 
increase the amount of cache-misses as it no longer does linear memory 
access.

In case you are interrested in my implementation of your problem see here:
http://dpaste.dzfl.pl/8d0d174e

You won't be able to compile the code unless you setup my custom D 
environment. Which I would not recommend unless you really hate the D 
garbage collector ;-)

Kind Regards
Benjamin Thaut