getHash inconsistency

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Mar 20 22:10:27 PDT 2012


On Wed, Mar 21, 2012 at 03:55:49PM +1100, Daniel Murphy wrote:
> "H. S. Teoh" <hsteoh at quickfur.ath.cx> wrote in message 
> news:mailman.917.1332250604.4860.digitalmars-d at puremagic.com...
[...]
> > Then the question is, what should be the fix?
> >
> > Currently, char[] and string have getHash defined in
> > rt.typeinfo.ti_Ad.TypeInfo_Aa, which appears to be a custom hashing
> > algorithm, whereas const(char)[] uses TypeInfo_Array.getHash, which
> > uses rt.util.hash.hashOf. Which one should be used?
[...]
> 
> I assume the custom hashing routine is better/faster?  If so, that
> one.  I'd assume that getHash not working the same for const strings
> is an oversight. 
[...]

Here's the current hashing code for char[] and string:

        foreach (char c; s)
            hash = hash * 11 + c;

For const(char)[], it's rt.util.hash.hashOf, which is Paul Hsieh's
SuperFastHash algorithm with very good hash distribution properties. It
does seem to involve a lot more operations that the simple loop above,
though; so I assume the above simple loop was chosen because hashing
strings are a very common operation and, for the most part, only need a
simple hash function.

So I'm kinda leaning towards SuperFastHash, but worried about whether it
will cause performance degradation, in which case we should stick with
the simple loop.


T

-- 
"Maybe" is a strange word.  When mom or dad says it it means "yes", but when my big brothers say it it means "no"! -- PJ jr.


More information about the Digitalmars-d mailing list