Reserving/Preallocating associative array?
Benjamin Thaut
code at benjamin-thaut.de
Fri Dec 27 09:16:50 PST 2013
Am 27.12.2013 17:49, schrieb Gordon:
>
> Benjamin,
> Can you point me to your Hashmap implementation? I could perhaps use it
> to improve the timings even more.
https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d
This implementation depends on my own allocator design, but it should be
possible to remove that dependeny quite easly by replacing all
allocations / frees with malloc/free or GC.malloc / nothing. Just make
sure that memory is not initialized beforehand (as new
ubyte[sizeInBytes] would do) because that also has a considerable
performance impact. Also when allocating with GC.malloc you should
specify the GC.BlkAttr.NO_SCAN flag, because you know that your data
does not contain any pointers. That way the GC will not scan that huge
memory block and it should speed up collection a lot.
To improve the performance of the hashmap you can try:
- specifingy different hashing functions (via the hash policy). See
https://github.com/Ingrater/thBase/blob/master/src/thBase/policies/hashing.d
for more examples of hashing policies.
- Change the amount of always free entries in the hashmap (currently
25%) for that change line 233, 207, 210 (not factored into a constant
yet). Reducing the free entries might result in less cache misses, but
more linear search, as this is a linear probing hashmap. Increasing the
free entries might reduce the amount of linear search, but increase
cache misses and increases memory usage.
- Compute a series of prime numbers, for which each prime number is at
least twice as big as the previous one and use that as the possible
sizes for the hashmap. Prime number sizes give better distribution of
the items within the hashmap, therefor reduce the amount of linear
searching neccessary and thus improve the hashmap performance.
- When searching for the next free spot in the hashmap, it currently
just adds 1 to the previous index found (line 301). It is also possible
to rehash the previous index (just put it into the hashing function
again), which would give a new index that is more distant in the hashmap
from the current one. This would again improve the distribution of the
items in the hashmap, and thus reduce linear search time, but may
increase the amount of cache-misses as it no longer does linear memory
access.
In case you are interrested in my implementation of your problem see here:
http://dpaste.dzfl.pl/8d0d174e
You won't be able to compile the code unless you setup my custom D
environment. Which I would not recommend unless you really hate the D
garbage collector ;-)
Kind Regards
Benjamin Thaut
More information about the Digitalmars-d
mailing list