Associative arrays in D and default comparators

Oskar Linde olREM at OVEnada.kth.se
Fri Sep 8 02:15:55 PDT 2006


Steve Horne wrote:

> On Thu, 07 Sep 2006 14:54:41 -0700, Sean Kelly <sean at f4.ca> wrote:
> 
>>Since the AA is backed
>>by a binary tree I would expect it determine 'equality' as equivalence
>>using opCmp.
> 
> There may be special case operations on the AA where opCmp is
> unnecessary and opEquals is a little faster.
> 
> But... I had got the impression that the built-in associative arrays
> were done with hashtables. I think the manual says something about
> foreach not enumerating the items in order.
> 
> OTOH, I didn't see any kind of 'opHash' or similar.
> 
> So - binary trees, then? Red-black, perhaps?
> 
> Or are you talking about hash collision handling?

The binary tree is only used for hash collision handling. It is not balanced
in any way.

The advantage of having binary trees at the nodes is that with a poor hash
function, the AA degrades gracefully into O(log n). This is probably good
for a general purpose AA. The fact that the trees aren't balanced would
mean that in theory, you could get a O(n) worst case lookup.

The disadvantages of binary tree nodes are:
- Less than optimal performance with a good hash function
- higher memory usage (but phobos counters this somewhat by using fewer
buckets, with up to an average of 4 collisions per bucket)
- the necessity of supplying an opCmp

You can get significantly better performance using a custom implementation
(example: http://www.digitalmars.com/d/archives/digitalmars/D/30773.html).
Unfortunately, without a reference return type, there is no way to give a
custom AA the same semantics as the built in one.

> 
>>Can someone
>>suggest an alternate default implementation for opCmp that solves these
>>problems?
> 
> Don't know if it helps, but sometimes a consistent-but-arbitrary
> ordering can give good keys. For instance, the ordering of hash values
> is meaningless, but you can still use it to divide-and-conquer your
> data.

You would still need to handle the case of identical object hashes.

/Oskar



More information about the Digitalmars-d mailing list