Druntime AA interface could be enhanced a bit

Michael Rynn michaelrynn at optusnet.com.au
Fri Apr 9 10:46:12 PDT 2010


In playing around trying to understand and make a better AA, I have 
updated again the dsource/projects/aa/trunk/druntime/aaA.d, this time to 
be a single linked list version.  

In the template versions of various implementations, and in Java HashMap, 
I noticed that its nice to be able to specify the initial capacity, and a 
load ratio (nodes to table length), and see how performance varies. 

It would be nice to have an _aaInit function, where these things are 
specified.  

And corresponding .init property.

And a .capacity (read and write) property.
And a .loadRatio (double) property.
How about a .statistics property?

valuetype[keytype] aa.

aa.init(capacity, loadratio).
...fill it up...
uint[]  stats = aa.statistics;

The implementation template classes now have these things. Capacity and 
load ratio are pretty standard for AA. They could be added to the current 
AA as well, which I might get around to. 

The _aaInit function should push the TypeInfo_AssociativeArray, once and 
for all, and not have different TypeInfos and valuesizes, re-pushed to 
different functions, var-arged before keys and values, as if they might 
change at any time. _aaRehash may get a different TypeInfo when called 
directly than when called from inside _aaGet, which I am sure will be 
corrected soon.

There may be a vague general principle here, of having classes that can 
be auto-initialized if they support certain properties.  The AA is really 
a class type hiding in a reference, and it would be nice to support that 
directly.

  Since the implementation is a reference type, the compiler can check 
for nullness, and call an init thunk, before calling standard AA 
functions and properties that require non-null reference.  Its either 
initialized or it isn't. 

The sizeof AA is 4, on my machine, not 8, as is supposed to be the common 
value in the D web site documentation on AA. Its the size of a pointer, 
so the site would be more accurate to just say its an opaque reference 
pointer sized type.

Again the aaA.d has been given a runtime TypeInfo initialisation tweak so 
that it can shovel integer value keys straight into the hash value of the 
nodes, and avoid TypeInfo compare and getHash.  It needs a bit more 
testing.  Its easier to work and experiment with than the more 
complicated tree version.

The current AA has nice balanced trees , but they do not make much of a 
performance difference at lower load ratios ( < 4 ), so that extra 
pointer seems to be a waste. The current AA will rehash before the load 
level becomes too sluggish.  I have uploaded some better refined tests to 
compare the various implementations at dsource/aa.

The linear congruential sequence generator variant is an example of "open 
addressing" hash table, and its performance is limited by the drawbacks 
of this model. The other implementations do better, waste less memory, 
and can handle table load ratios greater than  1.

Performance varies very much with the sort of keys used, much more so 
than the implementation variants, and table loads. Wrapping a string in 
class and caching the hash value gets much better performance than raw 
string keys. The string hash distribution was a bit better with a hash 
function using 31 as the character multiplier rather than 11.

-- May be I will get a job using Java soon..






More information about the Digitalmars-d mailing list