Absolutely horrible default string hashing
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sun May 3 11:59:10 PDT 2009
dsimcha wrote:
> == Quote from Steve Teale (steve.teale at britseyeview.com)'s article
>> Kristian Kilpi Wrote:
>>> On Sun, 03 May 2009 15:33:12 +0300, J�r�me M. Berger <jeberger at free.fr>
>>> wrote:
>>>> |
>>>> | foreach(c; str)
>>>> | {
>>>> | ret = (ret << 4) + c;
>>>> | }
>>>> |
>>>> That one is very bad because it only takes into account the last
>>>> few characters of the string (how many depends on the size of the
>>>> hash). However, several hashing algorithms use 2^n+1 as their
>>>> multiplier, which is very fast even on old/small/embedded hardware
>>>> because it can be implemented as a shift and an addition.
>>> You are of course right; thanks for the correction. Lets scrap that
>>> algorithm! :)
>> Would something like
>> foreach(i, c; str)
>> {
>> ret ^= cast(uint) c;
>> if (i & 1)
>> ret <<= 1;
>> }
>> do a better job. That could cover the first 64 chars reasonably well, and I
> suspect that could cover a lot of use cases.
>
> I guess a lot of people missed my follow-up post on this.
Yes. Such a minor oversight in a face-to-face conversation takes five
seconds to fix. In a newsgroup, it becomes a long thread :o).
> This is largely a
> typeinfo bug, not a string hashing bug. typeid(string) returns a TypeInfo
> reference to generic array TypeInfo, whereas typeid(char[]) returns a TypeInfo
> reference for which the hashing works well. What is needed is for
> typeid(typemodifier(char)[]) to return the same thing as typeid(char[]).
>
> Then again, I guess that, given how bad the generic array hashing is, it wouldn't
> hurt to improve that, too. If someone implements a decent meta-hashing function
> (i.e. take a bunch of hashes of elements of some object, like an array, a tuple,
> or a struct, and produce a single good hash out of these), there would be no
> shortage of use cases for it.
I have exchanged email with Paul Hsieh about using his hash function in
Phobos (azillionmonkeys.com) and he said his license is very permissive.
I seem to recall I have introduced his hash function, with credit, in
Phobos' old runtime but I am not sure whether Sean took that over. I
suggest we do that in druntime for string, be they mutable or not.
Andrei
More information about the Digitalmars-d
mailing list