Static introspection of suitable hash function depending on type of hash key
Nordlöw
per.nordlow at gmail.com
Mon Oct 2 20:55:23 UTC 2017
Does anyone have any good reads on the subject of statically
choosing suitable hash-functions depending on the type (and in
turn size) of the key?
I wonder because I'm currently experimenting with a hash set
implementation at
https://github.com/nordlow/phobos-next/blob/40e2973b74d58470a13a5a6ee0ed9c9a42c6dea1/src/hashset.d
and benchmark for different hash-functions for it at
https://github.com/nordlow/phobos-next/blob/b8942dc569921b4dadfddbdcdac3a2bb0834a9e0/src/benchmarkAppend.d
I'm measuring significant differences in speed depending on the
choice of the hash-function:
Inserted 1000000 integers in 49 ms, 65 μs, and 9 hnsecs, Checked
1000000 integers in 48 ms, 562 μs, and 2 hnsecs for
HashSet!(uint, null, typeidHashOf)
Inserted 1000000 integers in 51 ms, 897 μs, and 5 hnsecs, Checked
1000000 integers in 47 ms, 108 μs, and 9 hnsecs for
HashSet!(uint, null, hashOf)
Inserted 1000000 integers in 60 ms, 641 μs, and 5 hnsecs, Checked
1000000 integers in 70 ms, 664 μs, and 2 hnsecs for
HashSet!(uint, null, MurmurHash3!(128u, 64u))
Inserted 1000000 integers in 34 ms, 450 μs, and 5 hnsecs, Checked
1000000 integers in 27 ms, 738 μs, and 8 hnsecs for
HashSet!(uint, null, FNV!(64LU, true))
Inserted 1000000 integers in 97 ms, 400 μs, and 6 hnsecs, Checked
1000000 integers in 104 ms, 33 μs, and 1 hnsec for HashSet!(uint,
null, XXHash64)
integers in 39 ms, 304 μs, and 3 hnsecs for bool[uint]
using LDC 1.4.0.
A factor 2 for insert() and factor 4 for contains().
The reason is partly because many high-performance hashes, such
as XXHash64, have a significant overhead (tens of clock-cycles)
because of its super-scalar nature but are fast (~1 clock-cycle
per byte) for large keys.
The test is dumb for now and is only constructed to benchmark the
hash-function.
According to a comment at
https://stackoverflow.com/questions/46533112/static-introspection-of-suitable-hash-function-depending-on-type-of-hash-key
"the C++ standard library already includes hashes for many basic
types? (I ask this because if you don't have a special
distribution I would assume the standard committee has already
made good choices. My answer would be to use those.)"
Does Phobos have anything similar today or planned?
More information about the Digitalmars-d-learn
mailing list