About built-in AAs

Tue Aug 16 15:09:28 PDT 2011

I have some comments, that I see as problems, about the built-in AAs.

1) I have written code similar to this, do you think it's correct?

import std.stdio;

struct F {
    uint x;

    hash_t toHash() {
        puts("*");
        return x;
    }

    const bool opEquals(ref const(F) f) {
        puts("#");
        return x == f.x;
    }
}
void main() {
    auto data = [F(1), F(2), F(3), F(4), F(5), F(6)];
    int[F] aa;
    foreach (i, f; data)
        aa[f] = i;
}

After a good amount of debugging I have added those puts, and I've seen that those toHash/opEquals are never called, it's the second time I waste time on this problem.

In my opinion this is not acceptable in a language like D. If a user wants to add the hashing protocol to a class/struct then he/she will probably add a toHash. So if a toHash() is present, the compiler has to give a compile time error if all the other parts of the hash protocol are not present or if they are not correct (this currently means a correct opEquals and opCmp). This is quite important.

See also:
http://d.puremagic.com/issues/show_bug.cgi?id=3844
http://d.puremagic.com/issues/show_bug.cgi?id=4290

-------------------

2) Originally built-in AAs used a search tree to resolve collisions, but I think since some time they use a simple linked list (this has the advantage of simplifying the code, but it has the disadvantage that now hashing is able to become O(n^2), allowing certain attacks to D code that where impossible before).

So why are D AAs requiring still an opCmp to used-defined hashable structs/classes? Why don't we remove this need (and change the D docs)?

-------------------

3) This program finally uses the user-defined hash:

import std.stdio;

struct F {
    uint x;

    hash_t toHash() {
        puts("*");
        return x;
    }

    const bool opEquals(ref const(F) f) {
        puts("#");
        return x == f.x;
    }

    const int opCmp(ref const F f) {
        puts("$");
        return x - f.x;
    }
}
void main() {
    auto data = [F(1), F(2), F(3), F(3), F(5), F(1)];
    int[F] aa;
    foreach (i, f; data)
        aa[f] = i;
    writeln(aa);
}

It outputs:

$
$
F:1 F:3 F:5 F:4

The output shows that the AA is using opCmp() instead opEquals(). If the AA is hash based and resolves collisions with an unordered linked list then why isn't it calling opEquals() instead? Generally opEquals() is faster than opCmp().

-------------------

Now and then I compare the running time of C++0x code that uses <unordered_map> with the running time of the same operations done with D AAs and I don't see good timings for D. Are D AAs true templates or do they use some runtime typeid function? If they aren't true templates isn't it better to change this?

Bye,
bearophile