Performance of hashes and associative arrays

Ali Çehreli acehreli at yahoo.com
Sun Nov 11 10:40:37 PST 2012


On 11/11/2012 01:11 AM, Joseph Rushton Wakeling wrote:

 > So where classes are concerned there's a definite need to
 > override opHash if you want the _values_ to be the basis of the
 > association, rather than the object itself ... this would also have
 > potential implications for memory blow-up where a class is used as the
 > key type, no?

I am not sure that I understand.

The associative array would still contain just class variables, not 
class objects. Everytime the hash needed to be calculated, the AA would 
call toHash() on the variable, which in turn would use the values of the 
members of the object.

As another case, a single class object is owned by the runtime but many 
class variables of it can exist in multiple AAs.

 > The more pressing reason for avoiding associative arrays as it happens
 > was simply that I was getting unacceptably large memory usage as a
 > result: I'd got some data stored in the form of byte[size_t] (really
 > this should have been a set of size_t's, but this seemed theoretically
 > an acceptable stop gap while there's no set class in Phobos),

I think std.container.RedBlackTree can take that responsibility but if 
you don't need the elements to be in any particular order, then it may 
be seen as an overkill as well.

 > and for
 > some reason that was generating memory consumption twice that of having
 > instead a size_t[] regular array. I guess perhaps because although a
 > byte should mean just that, in practice each value in the byte[size_t]
 > associative array was taking up a whole word?

I don't know the actual implementation of AAs by dmd, but the hash 
buckets are probably singly-linked lists. Even if your data had no 
collisions, each byte would have to live in a singly-linked list.

 > It's a shame, because functionally it was useful to be able to do things
 > of the form if(i in list), but memory-wise it just didn't work for the
 > size of data I'm dealing with.

If all you need is 'if(i in list)', then a HashSet may help. I haven't 
used one myself but apparently there are such data structures in other 
languages like Java and C#.

Ali



More information about the Digitalmars-d-learn mailing list