Associative Arrays need cleanout method or property to help garbage collection
Michael Rynn
michaelrynn at optusnet.com.au
Sun Mar 21 22:19:01 PDT 2010
On Sat, 20 Mar 2010 16:55:15 -0300, Robert Jacques wrote:
> On Wed, 17 Mar 2010 02:57:34 -0300, Michael Rynn
> <michaelrynn at optusnet.com.au> wrote:
> [snip]
>> I want to add that the problems of AA with garbage collection are not
>> particular to the implementation of built-in AA. The number of nodes
>> that builtin AA creates will increase GC scanning, but the AA slowdowns
>> also occurs using the pyDict implementation with string keys.
>
> Actually, you can design the AA to 'play nice' with D's GC. See RandAA.
I have done a lot of work, comparing and testing the various AA
implementations with uint and string keys. The results are on http://
www.dsource.org/projects/aa/wiki/HashTreeBenchmark. The adaptation
sources and benchmark program are in the http://www.dsource.org/projects/
aa source tree.
I included 2 variants of the RandomAA in the test results, one using the
current psuedo-random sequence and the other using a similar slot
sequence algorithm to the pydict implementation.
I called the 2nd class the parallel-py AA, (still in the RandAA file as
the default template option). The Parallel-rd sequence using psuedo
random lookup was a clear performance loser, possibly as the keys are
well randomized already. Parallel-py outperforms the pydict on uint keys,
but lagged on string keys. So the idea of using separate arrays to hide
from unnecessary GC scans may be a help.
The best performer all round was the Tango-based hashmap. Builtin AA has
a few issues with really big datasets as noted before, and was not so
fast with insertions and integer sized keys.
The big dataset GC problem is hopefully and probably not an issue for
smaller datasets ( < 50,000) , unless maybe the data is has a lot of
unlucky significant pointers, something very hard to control.
There is now a template code version of the Builtin AA class (hash/
hashtree.d) with added optimizations of node heap management (from Tango
Container), no hash necessary for integer sized keys, and enforceable
cleanup.
This optimized hashtree on these benchmarks performed as well as the
Tango hashmap. I also added some enforceable cleanup for the Tango based
hashmap, so that GC failure issues did not affect timings on big datasets.
--- taf
Michael Rynn
More information about the Digitalmars-d
mailing list