Surprised by hashes of arrays of arrays

Thu Feb 27 02:11:53 PST 2014

Dear list,

I stumbled over odd behavior which took quite some time of debugging.
Sharing my results may help find a solution or just make others aware
and reduce their debugging time.

To illustrate consider the code:

auto array1 = [ [1, 2], [3, 4] ];
auto array2 = array1.dup;
array2[0] = array2[0].dup;

Does either hash(&array1) == hash(&array2) hold or hash(&array1) !=
hash(&array2) where hash is defined as
auto hash = &(typeid(array1).getHash);
?

So I may be totally off here (please tell me), but it turns out that
that both arrays have different hashes due to the way hashing is
implemented. This, e.g., implies that equal arrays may be hashed to
different values. Hence when you are using them as keys in an
associative array results may be surprising.
Note that I can make the problem more difficult to spot by using a
struct that uses pointers or an array internally which is hidden by
encapsulation.

I find the behavior non-obvious but maybe there is reason. The current
implementation considers only the direct contents of the struct's memory
(the memory starting at array.ptr to array.length * size(array[0]) in
case of dynamic arrays) and does *not* follow indirections (e.g. via
pointers).

This implies that a hash can be computed without considering all memory
occupied by a value. In my current mental model of hashing I assumed
that all bytes should be read by default.

As you may conclude from this post I find this behavior odd. I expect a
hash implementation to follow indirections by calling the hashing
functions recursively. It looks to me as a case where the default is
badly designed/implemented.

Jens