Associative Arrays need cleanout method or property to help garbage collection

Tue Mar 16 20:07:57 PDT 2010

On Tue, 16 Mar 2010 16:12:25 +0000, Moritz Warning wrote:

> On Tue, 16 Mar 2010 04:41:55 +0000, Michael Rynn wrote:
> 
>> The use of built in Associative Array can have deleterious effects on
>> Garbage Collection.
>  :(
> 
> 
> Fwiw, here is the original Python Dictionary implementation:
> http://svn.python.org/projects/python/trunk/Objects/dictobject.c This is
> also a nice read:
> http://svn.python.org/projects/python/trunk/Objects/dictnotes.txt
> 
> As for difference to the D impl.:
> The decision logic for table resizing is a bit different (Python uses a
> load factor of 2/3 and table doubling for more as 50K entries, *4 else).
> Python also wraps everything in Objects and attaches the hash when
> computed.
> 

I am beginning to get a headache playing around with this. There must be 
better things to do.  I have refined the test program a bit more and 
added some missing bits to aaht by calling the implementation class.

The following summary statistics were obtained.
The no-cleanup averages represent more or less the value of the middle 
run number, due to slowdown, except for aapy.   Even so, doing a cleanup 
improved the aapy results,  for which insertion times are very 
impressive. I attribute this to not needing to allocate memory for each 
insertion. Lookups for aapy are on average about 7-8% slower.

The bits I find interesting are:-

When doing cleanups, the aaht library version of the builtin AA is about 
33% faster on insertions than the builtin AA, although a few percent 
slower on lookups. Using a Tangy heap Container for node allocation 
increases performance to 60% faster than builtin, and brings the lookups 
to par. It also allows a super fast cleanup.

Since the builtin has no builtin cleanup to call, I used the following 
template to inefficiently do the task. 

The aaht version with a custom clear did the task 1.5 to 2.5 times faster 
(no TangyHeap), or really super fast (TangyHeap).

void
clear(K,V)(ref V[K] aa)
{
    K[] keyset = aa.keys;
    foreach(ks; keyset)
    {
        aa.remove(ks);
    }
}

A call to aa.clear then cleans up in reasonable time.

summary average 10 runs of size 500_000

no-cleanup uint[uint]
		insert  lookup
builtin(K,V)	4.395	0.0388
aaht(K,V)	4.5906	0.0426
aapy(K,V)	0.139	0.0423

cleaned-up uint[uint]
		insert  lookup cleanup
builtin(K,V)	0.4601	0.0376	  0.1252
aaht(K,V)	0.3021	0.0399	  0.081
aapy(K,V)	0.0957	0.0438	  0.0002

no-cleanup uint[string]
builtin(K,V)	3.4727	0.1596
aaht(K,V)	3.6453	0.1646
aapy(K,V)	0.9753	0.1735

cleaned-up uint[string]
builtin(K,V)	0.668	0.1658	 0.2609
aaht(K,V)	0.4396	0.1621	 0.1001
aapy(K,V)	0.2286	0.1717	 0.0002

After recompile with aaht version=TangyHeap
uint[uint]
aaht(K,V)	0.188	0.0387	0.0004

uint[string]
aaht(K,V)	0.3391	0.1609	0.0001

I have no empirical or logical justifications for when to resize tables. 
I just copy the previous code and hope I got it right.

 There must have been some previous research on the number of times a 
lookup or insert needs to check another slot because of hash collision, 
causing slowdown. But wether the ratio is 3/5 (60%) 2/3 (66%) or 3/4(75%) 
has not yet made for me an observable effect in pydict.

> As for your modifications to the PyDict code: Why have you changed the
> length() implementation? As far as I remember, it's wrong to just return
> "used".

Formerly 'used' field was the number of active nodes in the table itself.
The relationship is mathematically seen in the size() property: 
---
length = used + is_dummy + is_unused;
---

where is_dummy and is_unused can be 1 or 0.

I have rewritten this an equivalent form
used_entries = length_ - is_dummy - is_unused.

I have now changed the 'used' to become length_, and added a comment to 
make it plain.

This length_ variable now includes the count of the usage of the 2 extra 
nodes for unused_hash(0) and dummy_hash(1).  The length_ count will get 
bumped up or down whenever these nodes are set or unset respectively, as 
if they were normal nodes.

'used_entries' is only calculated just before readjusting the table size.
This is a computationally a trivial no difference, only preferred because 
I thought that length_ will be accessed more than 'used_entries', giving 
it the possibility of no calculation property access.

I want to add that the problems of AA with garbage collection are not 
particular to the implementation of built-in AA. The number of nodes that 
builtin AA creates will increase GC scanning, but the AA slowdowns also 
occurs using the pyDict implementation with string keys.

Original PyDictD2 slowed down for both uint and string keys, unless I set 
NO_SCAN calloc.
Then I changed the implementation to use new Entry[] instead of a calloc. 
This helped the uint[uint] test, but not the uint[string] test.

For uint[string]  the pyDict still slows progressively after each run, 
not nearly as much as the for builtin AA. The only way I can prevent this 
is to forcefully clean up the AA after each use, for whatever 
implementation.

The testhash.d command line 3rd option for the test select shows this.

option 5 -  aapy uint[string] - no cleanup. -- slows down
option 15   aapy uint[string] - cleanup     -- no slow down.

ditto for aabuiltin -1,11(except that it always slows down, because no 
cleanup possible)  and aaht - 3,13.

For immutable(char)[] keys, uint values
The AA needs to be scanned because AA node holds a reference to the key. 
If the key is not reachable and referenced elsewhere, the key array will 
be deleted by the GC, and the AA key will then reference freed memory.

The testhash program is holding references to the original string keys in 
a dynamic array. Also tested a version that makes sure all the AA values 
are 0, to help the garbage collector, but it still slows down if no 
cleanup requested.  Another version flag calls the GC.collect before each 
run. Its as if bits in the string keys are being misinterpreted as 
pointers.

With the current D garbage collector, using any AA implementation, 
builtin or otherwise, requires the ability to enforce cleanup. I do not 
expect any harm from using enforced AA memory management.  

>From my reading of Hans Boehm garbage collector, it is possible to 
provide TypeInfo hints to it has to what to interpret as pointers, in a 
structure or array. It would be nice to be able to do this from TypeInfo 
in D. I do not know if it is done to any extent at all in D.  I suppose 
there is always both a development and a performance penalty for this 
sort of thing. There also appears to be a penalty for not doing it.

While the GC cannot be fixed real soon, or cannot be replaced with a more 
descriminating or enhanced with TypeInfo hints, then for many 
applications, D and its libraries interweave manual memory management 
with garbage collection, such as this little aa project.  Applications 
are able to delete and clear things using some degree safety 
encapsulation and stack scoping, and still let the garbage collector 
worry about what it can cope with.  

Mono dev is trying to wean off the Hans Boehm and use something more 
exact and generational. Given the failing of garbage collection for most 
AA implementations, all current D AA implementations need a manual 
cleanup method. Other alternatives include using being able to specify 
non-scanned memory implementation where the programmmer deems it safe. 

Have a cleanup method for the AA, makes memory management more flexible 
and accessible, and when used will nearly always improve overall 
performace.  Improve the AA or fix the GC. Which is easier?

The aaht version with the heap node allocation and fast cleanup seems to 
be a better AA than the built-in.  The aapy has by far the best insertion 
results.

But do not take this very restricted and rather extreme sampling of 
possible test parameters to be a reliable guide to relative performance 
when used in other applications with differing environments.
---
Michael Rynn.