Strings hash value memoized

JAnderson ask at me.com
Sat Jul 12 11:51:07 PDT 2008


bearophile wrote:
> I presume AAs are used often enough in D. If D 2.x strings are immutable, then their hash function doesn't change, so it can be useful to store the hash value inside the data structure of the string (Python works like this), by default it can be initialized to a value like -1, so it's computed only the first time it's actually needed, avoiding wasting time to compute the hash value if it's not used.
> 
> Having the hash value inside strings and other immutable data structures allows quite faster AAs (and sets too, even if they are inside an external library).
> Currently strings aren't special structures, they are just immutable dynamical arrays of chars, so the has value (32 or 64 bit according to the word size of the CPU, I presume) can't be stored in the dynamic array record, so it may be stored in the -4/-8 bytes of the data block, but this can't work for string slices. A possible solution is to create a special string type that's just like the dynamic array struct, plus the hash value, so this struct becomes bigger (12 instead of 8 bytes on 32 bit CPUs), slowing down some operations a little.
> 
> Bye,
> bearophile

Sounds like a good case for a standard lib function.  Potentially the 
hashing part could be a generic class for any type with a particular 
spealization for strings.  It could also be lazily computed (ie compute 
it the first time its needed then cache it).

Maybe something like:

alias hashed(string) hashed_string;
alias hashed(int[])  hashed_array;

I guess one downside is that compile time string hashes may have to be 
figured out at run-time with that approach.

-Joel



More information about the Digitalmars-d mailing list