Associative Arrays need cleanout method or property to help garbage collection

Sun Mar 21 22:19:01 PDT 2010

On Sat, 20 Mar 2010 16:55:15 -0300, Robert Jacques wrote:

> On Wed, 17 Mar 2010 02:57:34 -0300, Michael Rynn
> <michaelrynn at optusnet.com.au> wrote:
> [snip]
>> I want to add that the problems of AA with garbage collection are not
>> particular to the implementation of built-in AA. The number of nodes
>> that builtin AA creates will increase GC scanning, but the AA slowdowns
>> also occurs using the pyDict implementation with string keys.
> 
> Actually, you can design the AA to 'play nice' with D's GC. See RandAA.

I have done a lot of work, comparing and testing the various AA 
implementations with uint and string keys. The results are on http://
www.dsource.org/projects/aa/wiki/HashTreeBenchmark. The adaptation 
sources and benchmark program are in the http://www.dsource.org/projects/
aa source tree.

I included 2 variants of the RandomAA in the test results, one using the 
current psuedo-random sequence and the other using a similar slot 
sequence algorithm to the pydict implementation.  

I called the 2nd class the parallel-py AA, (still in the RandAA file as 
the default template option). The Parallel-rd sequence using psuedo 
random lookup was a clear performance loser, possibly as the keys are 
well randomized already. Parallel-py outperforms the pydict on uint keys, 
but lagged on string keys. So the idea of using separate arrays to hide 
from unnecessary GC scans may be a help. 

The best performer all round was the Tango-based hashmap. Builtin AA has 
a few issues with really big datasets as noted before, and was not so 
fast with insertions and integer sized keys.

The big dataset GC problem is hopefully and probably not an issue for 
smaller datasets ( < 50,000) , unless maybe the data is has a lot of 
unlucky significant pointers, something very hard to control. 

There is now a template code version of the Builtin AA class (hash/
hashtree.d) with added optimizations of node heap management (from Tango 
Container), no hash necessary for integer sized keys, and enforceable 
cleanup.

 This optimized hashtree on these benchmarks performed as well as the 
Tango hashmap. I also added some enforceable cleanup for the Tango based 
hashmap, so that GC failure issues did not affect timings on big datasets.

--- taf

Michael Rynn