getHash inconsistency
H. S. Teoh
hsteoh at quickfur.ath.cx
Thu Mar 22 11:18:48 PDT 2012
On Wed, Mar 21, 2012 at 04:35:11PM +1100, Daniel Murphy wrote:
> "H. S. Teoh" <hsteoh at quickfur.ath.cx> wrote in message
> news:mailman.951.1332306541.4860.digitalmars-d at puremagic.com...
> >
> > Here's the current hashing code for char[] and string:
> >
> > foreach (char c; s)
> > hash = hash * 11 + c;
> >
> > For const(char)[], it's rt.util.hash.hashOf, which is Paul Hsieh's
> > SuperFastHash algorithm with very good hash distribution properties.
> > It does seem to involve a lot more operations that the simple loop
> > above, though; so I assume the above simple loop was chosen because
> > hashing strings are a very common operation and, for the most part,
> > only need a simple hash function.
> >
> > So I'm kinda leaning towards SuperFastHash, but worried about
> > whether it will cause performance degradation, in which case we
> > should stick with the simple loop.
[...]
> Benchmark time!
Alright, so after some benchmarking, I found that the above custom hash
function works best for *short* (3 to 10 character) randomized
alphabetic strings (I assumed alphabetic to be the typical use case of
strings). It's faster than SuperFastHash, and even has better
distribution properties, probably because SuperFastHash is tuned for
arbitrary binary data of arbitrary length, whereas the custom function
is tuned for short string-like data.
With longer strings, SuperFastHash beats the custom algorithm, and
distribution properties are approximately the same.
So I'm still on the fence about which algorithm is better. I can see why
the custom hash was adopted for strings, since your typical AA tends to
have short alphabetic keys, and something like SuperFastHash is probably
overkill. But for longer keys, SuperFastHash is better.
T
--
Curiosity kills the cat. Moral: don't be the cat.
More information about the Digitalmars-d
mailing list