Associative Arrays need cleanout method or property to help

Walter Bright newshound1 at digitalmars.com
Tue Mar 23 13:41:56 PDT 2010


Michael Rynn wrote:
> On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:
> 
>> bearophile wrote:
>>> A way to know the usage patterns of D AAs is to add instrumentation in
>>> nonrelease mode, they can save few statistical data once in a while in
>>> the install directory of D, and then the user can give this little txt
>>> file to Walter to tune the language :-)
>> There's no need to tune the language. The implementation and data
>> structure of the AAs is completely opaque to the language. The
>> implementation is in aaA.d. Feel free to try different implementations
>> in it!
> 
> Well, I have.
> 
> See the code here : http://www.dsource.org/projects/aa/browser/trunk/
> druntime/aaA.d
> 
> See the benchmarks and comments here : http://www.dsource.org/projects/aa/
> wiki/DrunTimeAA.  
> 
> The result squeezes some extra performance for integer or less sized keys 
> (about 20% faster for lookups).
> 
> Insertions and cleanups are faster too.

It's a nice implementation. There are some problems with it, though:

1. Deleted entries cannot be returned to the garbage collector pool unless the 
entire hash table is removed.

2. All the keys are stored in a single array, as are all the values. This 
requires that a contiguous chunk of memory can be found for the arrays when 
rehashing to a larger hashmap. This could make it very difficult for this to 
work when hash tables grow to be of significant size relative to all available 
memory. Note that the current implementation has a bucket array that has as many 
entries as the number of elements in the hashtable, but the element size is only 
a pointer, not the arbitrary size of a key or value.

3. Rehashing (required as the hashmap grows) means that the key/value arrays 
must be reallocated and copied. That's fine for the keys, but for the values 
this is a huge problem because the user may have dangling pointers into the 
values. (This is why Andrei started the thread entitled "An important potential 
change to the language: transitory ref".) I think this is currently the killer 
problem for randAA's approach.



More information about the Digitalmars-d mailing list