Absolutely horrible default string hashing

"Jérôme M. Berger" jeberger at free.fr
Sun May 3 05:33:12 PDT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kristian Kilpi wrote:
| On Sun, 03 May 2009 04:59:26 +0300, BCS <none at anon.com> wrote:
|>
|> IIRC something like this is common:
|>
|> hash_t ret = 0;
|> foreach(c;str) { ret *= SomePrime; ret += c; }
|>
|
| I think that's the basis for the best general string hashing funcs
| around. IIRC from the university, it doesn't matter, in practice,
if the
| multiplier is a prime number or not. So, the multiplication can be
| replaced with bit shifting (that's as fast as the addition operation).
| E.g.
|
| foreach(c; str)
| {
|     ret = (ret << 4) + c;
| }
|
	That one is very bad because it only takes into account the last
few characters of the string (how many depends on the size of the
hash). However, several hashing algorithms use 2^n+1 as their
multiplier, which is very fast even on old/small/embedded hardware
because it can be implemented as a shift and an addition.

| I quickly googled around and found the algorithm djb2 which uses the
| multiplier 33:
| http://www.cse.yorku.ca/~oz/hash.html
|
| djb2 is obviously a little bit slower but it *may* give slightly
better
| distribution (I'm not gonna run tests now to see if there is a real
| world difference... ;) ).

		Jerome
- --
mailto:jeberger at free.fr
http://jeberger.free.fr
Jabber: jeberger at jabber.fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEUEARECAAYFAkn9jwgACgkQd0kWM4JG3k/dLgCeI9UpUhxiAuw4QB0LHFCAXk00
yLgAljamIvoLLsyDhZ0OSggl9lv/M9A=
=CdQB
-----END PGP SIGNATURE-----



More information about the Digitalmars-d mailing list