fun project - improving calcHash

Martin Nowak code at dawg.eu
Sun Jun 23 18:18:31 PDT 2013


On 06/23/2013 11:22 PM, Walter Bright wrote:
> Profiling shows the calcHash function is a significant contributor to
> compilation time (3.25% of total time). So making it faster is a win.
> Even making dmd 1% faster would be a nice win - all those little drops
> add up.
>

You'll find a benchmark at the end of the post.
http://forum.dlang.org/thread/op.v2yky9pdsqugbd@dawg-freebsd.lan


Obviously the switch in a for loop is a branch prediction killer.
This should work much better.

for (size_t i = 0; i < len / 4; ++i)
{
     hash *= 37;
     hash += *(const uint32_t *)str;
     str += 4;
}
switch (len & 3)
{
case 0:
     break;
case 1:
     hash *= 37;
     hash += *(const uint8_t *)str;
     break;
case 2:
     hash *= 37;
     hash += *(const uint16_t *)str;
     break;
case 3:
     hash *= 37;
     hash += (*(const uint16_t *)str << 8) +
              ((const uint8_t *)str)[2];
     break;
default:
     assert(0);
}
return hash;

Finding a faster hash algorithm isn't simple because prime 
multiplication is so simple. At least we're not diving the hash result 
by 37 any longer.
I think using the Hsieh hash is a good choice because it has quite an OK 
distribution so it leads to fewer collisions. It's also pretty fast for 
short strings and we already use it in druntime.


More information about the Digitalmars-d mailing list