Article: Increasing the D Compiler Speed by Over 75%

Dmitry Olshansky dmitry.olsh at gmail.com
Tue Jul 30 11:02:14 PDT 2013


26-Jul-2013 23:17, Walter Bright пишет:
> On 7/26/2013 5:11 AM, Dmitry Olshansky wrote:
>> 26-Jul-2013 14:47, Dmitry Olshansky пишет:
>>> 26-Jul-2013 01:25, Walter Bright пишет:
>>
>>>> The slowness was in the frackin' "convert the hash to an index in the
>>>> bucket", which is a modulus operation.
>>>
>>> Then it's past due to finally stop the madness of modulo prime table and
>>> use a power of 2 size. In essence what modulo prime does is simply
>>> enhancing the quality of your hash function w.r.t. collisions (it helps
>>> to distribute values more evenly).
>>>
>>> Any of new decent hashes are good enough to work with plain slice the
>>> lower bits approach.
>>>
>>
>> To be more concrete:
>> Spooky hash
>> http://burtleburtle.net/bob/hash/spooky.html (Public domain)
>> S-box hash
>> http://home.comcast.net/~bretm/hash/10.html (Published paper)
>>
>> Or even a good ol' FNV (Public domain)
>> http://isthe.com/chongo/tech/comp/fnv/#FNV-1a
>>
>
> How about a pull request so we can try it out?

Preliminary pull is here:
https://github.com/D-Programming-Language/dmd/pull/2436

So far it looses a bit. I'm still playing with it, looking at load 
factor, distribution of sizes, profiling.

So far observations are that SpookyHash is slower then the one that was 
there thus stealing a few percents of speed. That is then hardly 
regained by a faster lookup of a slot:
almost all of "large" tables were 31 in size and you have special case 
for that anyway.

What bothers me is that while I've been hacking at this I couldn't shake 
off the feeling that AA code assumes NO FULL HASH COLLISIONS at all?

Isn't that betting on luck (and a crude hash) a little too much (esp in 
32 bit mode)?

That is e.g. in code pasted from aav.c Key is only a hash and there is 
no way whatsoever to discern a full hash collision.

Value _aaGetRvalue(AA* aa, Key key)
{
     //printf("_aaGetRvalue(key = %p)\n", key);
     if (aa)
     {
         size_t i;
         size_t len = aa->b_length;
         if (len == 4)
             i = (size_t)key & 3;
         else if (len == 31)
             i = (size_t)key % 31;
         else
             i = (size_t)key % len;
         aaA* e = aa->b[i];
         while (e)
         {
             if (key == e->key)
                 return e->value;
             e = e->next;
         }
     }
     return NULL;    // not found
}

-- 
Dmitry Olshansky


More information about the Digitalmars-d-announce mailing list