Structs as Keys for AAs

Wed Aug 9 18:11:29 PDT 2017

On Wed, Aug 09, 2017 at 11:31:44PM +0000, Q. Schroll via Digitalmars-d wrote:
> In [1] it says at 5. that
> 
> > For this reason, and for legacy reasons, an associative array key is
> > not allowed to define a specialized opCmp, but omit a specialized
> > opEquals.  This restriction may be removed in future versions of D.
> 
> I'm not completely sure what that means. Does "specialized" mean
> "user-defined"? I just challenged the spec and found an error by the
> way: [2]. Apart from that, it compiles.

The main issue is that the current AA implementation uses TypeInfo's
.equal method for comparing keys. (This dates from before D has
templates, and also because at the time, the AA implementation was too
tightly bound to the compiler and couldn't be easily changed.) The way
TypeInfo works, .equal will not do the right thing if you only define
opCmp without also defining opEquals.

This is why the spec was updated to say that you must essentially always
define opEquals if you define opCmp. Otherwise, you may get strange or
wrong behaviour when using the type as an AA key.

> For 5. I used
> 
> struct Key
> {
>     int id;
>     string tag;
> 
>     int opCmp(const Key other) const
>     {
>         return this.id < other.id ? -1 : this.id == other.id ?  0 : 1;
>     }
> 
>     bool opEquals(ref const Key other) const @safe pure nothrow
>     {
>         return this.id == other.id;
>     }
> 
>     size_t toHash() const @safe pure nothrow
>     {
>         return id;
>     }
> }
> 
> as a key type. To me the part "is not allowed to define a specialized
> opCmp" is clearly wrong, either a compiler bug or an error in the
> spec.

Did you test the result at runtime with a real AA?  A telling sign would
be if you added the same Key twice, then iterate over the AA to print
out the entries.  You may get the same key more than once, which would
be indicative of this problem.

Just because the compiler accepts the code, doesn't necessarily mean
it's right. (Of course, it's arguably a bug that the compiler accepts it
in the first place. But the AA implementation in the compiler is a bit
fragile to handle at the moment, I'm not sure if it can be easily fixed
without causing other problems.  We'll have to wait for Martin's library
AA to get merged...)

> Concerning opEquals and opCmp in general: Why isn't opEquals lowered to
> opCmp returning 0 if not present?
[...]

Mainly (1) for efficiency, and also, (2) to paraphrase Andrei, to allow
partial orders that may not be linear.

(1) because some types may have expensive computations to determine
whether something is bigger or smaller, but trivial to determine
equality.

(2) because conceivably you can implement the subset relation in opCmp
with a type that represents a set, so opCmp()==0 could mean the sets are
equal, OR it could mean the sets are not subsets of each other (they are
either disjoint, or have elements not in common with each other). Then
you'd need opEquals() to tell you whether or not they are equal.  For a
less exotic example, consider a custom floating-point type where NAN <
NAN and NAN > NAN are both false, so opCmp()==0 is the only reasonable
return value, yet opEquals() also == 0 because NAN != NAN.

T

-- 
INTEL = Only half of "intelligence".