fun project - improving calcHash

Sun Jun 23 18:43:41 PDT 2013

On 06/23/2013 09:20 PM, Juan Manuel Cabo wrote:
> On 06/23/2013 06:22 PM, Walter Bright wrote:
>> https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
>>
>> Profiling shows the calcHash function is a significant contributor to compilation time (3.25% of total time). So making
>> it faster is a win. Even making dmd 1% faster would be a nice win - all those little drops add up.
>>
>> There are many, many string hash functions findable through google. Anyone want to spend the effort to make a faster
>> one? Remember, ya gotta prove it's faster!
>>
>> A nice timing test would be the time expending compiling Phobos. I would suggest that the 64 bit build of dmd be used
>> for timing tests.
>>
>> Also, be careful, many of those hash functions on the intarnets have a license that makes it unusable for dmd.
> 
> 
> I performed a quick test, and I don't think that the original function
> can be improved for speed (though it might be improved for less
> collisions):
> 
>     https://gist.github.com/jmcabo/5847036
> 
> I used words and lines from the complete works of Shakespeare.
> I tested separating the loop from the switch, as was suggested
> in Timon Gehr post. It didn't improve the speed when compiled
> with "g++ -O3", though it improved it a bit without -O3.
> 
> I also tested removing the multiplication by 37. It didn't
> improve the speed. With "g++ -O3" they are all the same.
> 
> So, unless I'm measuring it wrong, the algorithm is as fast
> as can be (as fast as just adding chars).
> 
> --jm
> 
> 

Oh, it might be improved by loading 128bits at a time instead of 32bits...
but that would benefit strings of more than 16 bytes. Google's CitiHash seems
tuned for large strings.

--jm