Perfect hashing for string switch

bearophile bearophileHUGS at lycos.com
Fri Jan 22 13:14:59 PST 2010


This is just a possible optimization, it's not a new feature request.

A perfect hash computed at compile time can be good to implement the string switches, I have created a small benchmark:
http://codepad.org/YVFfGpsM

Timings, ldc, seconds:
  test1: 4.48 // normal switch
  test2: 2.98 // perfect hash
  test3: 2.09
  test4: 2.07
  test5: 5.44 // AA

Notice the very nice thing that I have computed the hash at compile time in test2 switch cases, you can't do that in a simple way in C :-)

You can't compute the coefficient tables of the perfect hash at compile-time in D because probably this is currently too much slow to do, unless D compilers become staged compilers and compile the code they have to run at run time too :o) (Well, an intermediate and less intrusive solution is to use LLVM to Just-In-time compile the code that LDC runs at compile time, to speed that up.)

A possible problem is that the compilation time grows if the number of strings is high.

The version based on the perfect hash (test2) becomes more efficient if the number of wrong strings gets lower.

In Tango the AA opIn_r is really slow, I think it can be a bug. I have not found a Tango dev interested in this so far.

Note that the perfect hash I have used in this little example is minimal, so there's a better way for the compiler to implement that conversion table (as I have shown in test4 that assumes a minimal perfect hash. It's about as fast as the test3 version that can tolerate a table a bit sparse), but the code I have written works equally well if the hash isn't minimal (this is positive to decrease the compilation time): in that case the table case gets a little more sparse, but the running time is the same, because it's just a little sparse, so llvm compiles it still with table (with holes).

GCC is able to compile this code better, I have asked LLVM devs to implement the same thing (this is not about perfect hashing, it can be a way for the compiler to use it better in certain situations, where you have a conversion table or string->int compile-time constant associative array):
http://llvm.org/bugs/show_bug.cgi?id=6009

Bye,
bearophile



More information about the Digitalmars-d mailing list