Python dictionary inspired hash table implementation

Thu Jul 3 04:01:51 PDT 2008

On Wed, 02 Jul 2008 21:55:45 -0400, bearophile wrote:

[..]
> from timeit import default_timer as clock
> 
> def main():
>     strings = ["key_" + str(i) for i in xrange(1000000)]
> 
>     t = clock()
>     d = dict.fromkeys(strings, 0)
>     print round(clock() - t, 2)
>     if len(d) < 200: print d
> 
> import psyco; psyco.bind(main)
> main()

Outputs 0.68 on average.

I ported your Phobos code to Tango, these are the results:

Dict!(char[], uint)
10 x 1000000 iterations
inserts:  1292566/s (0.77s)
lookups: 3492603/s (0.29s)

HashMap!(char[], uint, ..)
10 x 1000000 iterations
inserts:  1673673/s (0.60s)
lookups: 6483940/s (0.15s)

uint[char[]]
10 x 1000000 iterations
inserts:  2088894/s (0.48s)
lookups: 3886346/s (0.26s)

There might be some bottleneck in Dict.
Array hashing for Dict only accounts for 0.03s.

btw.:
There is a bug in the array hash method.
It doesn't iterate up to the last array element.
Here is the fixed version used for the test.

//hash function
ubyte[] a = cast(ubyte[]) data;
int len = a.length;
ubyte* p = cast(ubyte *) a.ptr;
hash = *p << 7;
while (--len >= 0)
{
	hash = (1000003 * hash) ^ *p++;
}
hash ^= a.length;