Performance of hashes and associative arrays

Joseph Rushton Wakeling joseph.wakeling at webdrake.net
Sun Nov 11 01:11:52 PST 2012


On 11/11/2012 02:35 AM, Ali Çehreli wrote:
> For classes, because they are reference types, it is the object identity that
> determines the hash value (probably the pointer of the actual object). As a
> result, even when the values of the members are the same, two objects have
> different hash values.
>
> For structs, because they are value types, it is the bytes that make up the
> object determine the hash value.

Ahh, clear.  So where classes are concerned there's a definite need to override 
opHash if you want the _values_ to be the basis of the association, rather than 
the object itself ... this would also have potential implications for memory 
blow-up where a class is used as the key type, no?

> But beware: The associative array members of structs are not hashed by values of
> their elements. (There was a long discussion recently on the the main newsgroup
> on this topic.)

I do recall that discussion, and I'll bear that in mind.

So far the only practical consideration I've had was for a Tuple!() to be the 
key, rather than an actual struct or class; as it happens I've avoided that so 
far, and I think I may continue to do so while there are still issues around 
hashing.

The more pressing reason for avoiding associative arrays as it happens was 
simply that I was getting unacceptably large memory usage as a result: I'd got 
some data stored in the form of byte[size_t] (really this should have been a set 
of size_t's, but this seemed theoretically an acceptable stop gap while there's 
no set class in Phobos), and for some reason that was generating memory 
consumption twice that of having instead a size_t[] regular array.  I guess 
perhaps because although a byte should mean just that, in practice each value in 
the byte[size_t] associative array was taking up a whole word?

It's a shame, because functionally it was useful to be able to do things of the 
form if(i in list), but memory-wise it just didn't work for the size of data I'm 
dealing with.


More information about the Digitalmars-d-learn mailing list