Python dictionary inspired hash table implementation

Moritz Warning moritzwarning at web.de
Wed Jul 2 15:31:53 PDT 2008


On Wed, 02 Jul 2008 17:11:43 -0400, bearophile wrote:

> Moritz Warning Wrote:
>> E.g., it is twice as slow for lookups of ulong keys compared to Tango
>> HashMap (3 times slower for insertion). It's also slower for arrays.
>> But that's likely to change with some more tweaking.
> 
> I think it may need more than just tweakings: the advantage of not using
> pointers may get lost when the data is too much long. So you may switch
> to using pointers when the data requires more than 4 or 8 bytes.

True, but these changes can be made easily by changing one of the 
wrappers for the targeted key type or by adding a new wrapper.
I haven't tested much which approach is most efficient for each data type 
so far.

> 
> How much work does it need to be ported to Phobos?
I guess it's a matter of ~10min for those used to Phobos.
The only thing that would need to be changed
in the implementation itself is the traits stuff.


> 
> This may enjoy a better name:
> template Helper(T)
> In my libs I have named it DeconstArrayType.
The name was just a dummy introduced a few hours ago until a better name 
arises (until now :P).

> 
> If you have a Python, like a 2.5, you may try it with similar code, to
> compare the speeds. The simpler code you may try are (in pairs):
I have Python 2.5.2 (for Debian).

> def main():
>     d = {}
>     for i in xrange(500000):
>         d["key_" + str(i)] = 0
> import psyco; psyco.bind(main)
> main()
> 
> 
> import std.string: format;
> void main() {
>     int[string] d;
>     for (int i = 0; i < 500_000; ++i)
>         d["key_" ~ format(i)] = 0;
> }
I have my doubts that these methods are useful for comparison
since both string formating function may overrule the computation done by 
the dictionary.
Can you provide a Python example that pre-computes an array with strings 
in a separate function?

> 
> 
> def main():
>     d = dict.fromkeys(xrange(10000000), 0)
> main()

cProfile tells me that Python spend 2.497s (best from 5 tries) with this 
build-in function.
That's about 6680027 insertions per second.







More information about the Digitalmars-d mailing list