Associative arrays in D and default comparators

Fri Sep 8 14:44:27 PDT 2006

Walter Bright wrote:
> Sean Kelly wrote:
>> Walter Bright wrote:
>>> How do you do a hash for it if there's no constant data?
>>
>> Using address, currently :-p  But that obviously has to change.  What 
>> might make the most sense from a hash perspective would be something 
>> like this:
>>
>> class C {
>>     hash_t hash;
>>     hash_t toHash() {
>>         if (hash != hash.init)
>>             return hash;
>>         hash = cast(hash_t) this;
>>         return hash;
>>     }
>> }
> 
> That will fail if the object gets moved.

Not at all.  The hash value is stored the first time it's computed, and 
it will remain constant for the life of the object.  If the object is 
later moved and a new object created in its old position then the two 
objects would simply have the same hash value.  Collisions would still 
be resolved with opCmp or opEquals depending on AA implementation.  In 
the worst case however, I'll admit that this could result in long hash 
chains.

>> But that leaves opCmp to wrestle with.  An "object id" would be a 
>> reasonable universal solution, but I don't want to pay for the 
>> synchronized op on construction.  I suppose I'm simply having a hard 
>> time getting over the idea that an object's address is is not a unique 
>> identifier in GCed languages.
> 
> An opCmp has the same problem that toHash does; fix it for either and it 
> is fixed for both. I think you'll have to do a unique id for each in 
> order to store them in an AA, or else wait until the thread id is 
> assigned before storing it.

opCmp doesn't have quite the same problem as toHash because it's 
completely valid (if ill-advised) for toHash to always return zero while 
opCmp may only evaluate two objects as equivalent if they actually are 
equivalent.  Essentially, I see toHash as a weak identity operation, 
opEquals as a strong identity operation, and opCmp as an ordering operation.

I'll admit that the template idea is appealing, but not if it would 
require adding a ton of stuff to object.d.  I think this could probably 
be avoided by passing the comparator as a delegate parameter however, 
and would allow the compiler to switch between two AA implementations 
(one for objects with opCmp, and one for objects with only opEquals) to 
handle each class of objects in a reasonable manner.  This should also 
eliminate the need for a default opCmp and opEquals implementation:

// implicit or in object.d:

extern (C)
{
     void* aaGetByCmp(void* key, int function(void*,void*) cmp);
     void* aaGetByEquals(void* key, bool function(void*,void*) equals);
}

bool doCmp(T)(void* a, void* b)
{
     return a == b ? 0 :
            a ? (cast(T)a).opCmp(cast(T)b) :
                (cast(T)b).opCmp(cast(T)a);
}

bool doEquals(T)(void* a, void* b)
{
     return cast(T)a == cast(T)b;
}

// user code:

class C
{
     hash_t toHash();
     bool opEquals( C val );
}

C[C] myAA;
C key = new C;
myAA[key] = key;

// this call:

C val = myAA[key];

// translates to:

C val = cast(C) aaGetByEquals(key, &doEquals!(C));