Druntime AA interface could be enhanced a bit
Michael Rynn
michaelrynn at optusnet.com.au
Fri Apr 9 10:46:12 PDT 2010
In playing around trying to understand and make a better AA, I have
updated again the dsource/projects/aa/trunk/druntime/aaA.d, this time to
be a single linked list version.
In the template versions of various implementations, and in Java HashMap,
I noticed that its nice to be able to specify the initial capacity, and a
load ratio (nodes to table length), and see how performance varies.
It would be nice to have an _aaInit function, where these things are
specified.
And corresponding .init property.
And a .capacity (read and write) property.
And a .loadRatio (double) property.
How about a .statistics property?
valuetype[keytype] aa.
aa.init(capacity, loadratio).
...fill it up...
uint[] stats = aa.statistics;
The implementation template classes now have these things. Capacity and
load ratio are pretty standard for AA. They could be added to the current
AA as well, which I might get around to.
The _aaInit function should push the TypeInfo_AssociativeArray, once and
for all, and not have different TypeInfos and valuesizes, re-pushed to
different functions, var-arged before keys and values, as if they might
change at any time. _aaRehash may get a different TypeInfo when called
directly than when called from inside _aaGet, which I am sure will be
corrected soon.
There may be a vague general principle here, of having classes that can
be auto-initialized if they support certain properties. The AA is really
a class type hiding in a reference, and it would be nice to support that
directly.
Since the implementation is a reference type, the compiler can check
for nullness, and call an init thunk, before calling standard AA
functions and properties that require non-null reference. Its either
initialized or it isn't.
The sizeof AA is 4, on my machine, not 8, as is supposed to be the common
value in the D web site documentation on AA. Its the size of a pointer,
so the site would be more accurate to just say its an opaque reference
pointer sized type.
Again the aaA.d has been given a runtime TypeInfo initialisation tweak so
that it can shovel integer value keys straight into the hash value of the
nodes, and avoid TypeInfo compare and getHash. It needs a bit more
testing. Its easier to work and experiment with than the more
complicated tree version.
The current AA has nice balanced trees , but they do not make much of a
performance difference at lower load ratios ( < 4 ), so that extra
pointer seems to be a waste. The current AA will rehash before the load
level becomes too sluggish. I have uploaded some better refined tests to
compare the various implementations at dsource/aa.
The linear congruential sequence generator variant is an example of "open
addressing" hash table, and its performance is limited by the drawbacks
of this model. The other implementations do better, waste less memory,
and can handle table load ratios greater than 1.
Performance varies very much with the sort of keys used, much more so
than the implementation variants, and table loads. Wrapping a string in
class and caching the hash value gets much better performance than raw
string keys. The string hash distribution was a bit better with a hash
function using 31 as the character multiplier rather than 11.
-- May be I will get a job using Java soon..
More information about the Digitalmars-d
mailing list