Struct hash issues with string fields

H. S. Teoh hsteoh at quickfur.ath.cx
Sun Jun 3 18:36:44 PDT 2012


On Sat, May 26, 2012 at 09:53:07PM +0200, Andrej Mitrovic wrote:
> I don't understand this:
> 
> import std.stdio;
> 
> struct Symbol { string val; }
> 
> void main()
> {
>     int[string] hash1;
>     hash1["1".idup] = 1;
>     hash1["1".idup] = 2;
>     writeln(hash1);  // writes "["1":2]"
> 
>     int[Symbol] hash2;
>     Symbol sym1 = Symbol("1".idup);
>     Symbol sym2 = Symbol("1".idup);
>     hash2[sym1] = 1;
>     hash2[sym2] = 1;
>     writeln(hash2);  // writes "[Symbol("1"):1, Symbol("1"):1]"
> }
> 
> Why are sym1 and sym2 unique keys in hash2? Because the hash
> implementation checks the array pointer instead of its contents? But
> then why does hash1 not have the same issue?

Sorry for the very late reply (I'm on vacation and haven't had time to
reply to emails), but this bug is one of the infelicities of the current
AA implementation. The problem is that strings have a custom hash
function that's distinct from the generic hashing function used for
arrays.

Furthermore, the default struct hash function hashes the binary
representation of the struct, _not_ the contents of its fields.  For
reference types like string, the struct hash function only hashes the
string pointer and length, not the string contents.

So there are multiple things wrong on multiple levels here. Taken
individually, I can see why things are this way: the string hash
function uses a faster hashing algorithm that takes advantage of the
assumption that strings contain unicode data, not generic binary data.
Struct hash functions are hashed only on the binary representation of
the struct, since, in general, structs are supposed to be small value
types, so it's faster to just hash the binary representation and be done
with it, than to hash member-by-member.

However, taken as a whole, this is inconsistent and doesn't make any
sense. When the struct contains reference types like strings, then the
hash function becomes inconsistent.


> I can't override toHash() in a struct, so what am I supposed to do in
> order to make "sym1" and "sym2" be stored into the same hash key?

You should be able to simply define toHash() in the struct and it should
work (I think?). But there may be bugs in this area as well that causes
it not to work.


T

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it. -- Brian W. Kernighan


More information about the Digitalmars-d-learn mailing list