Honey, I sped up the associated array

Fri Oct 13 09:54:11 PDT 2006

Lionello Lunesu wrote:
> Dave wrote:
>> Lionello Lunesu wrote:
>>> Some claim, so numbers first:
>>>
>>> As a benchmark, I've used my implementation of the program mentioned 
>>> in the thread "Lisp vs. C++ (not off-topic)", in a critical-priority 
>>> thread. Results are averaged over 8 runs. System: AMD X2 4600+, 
>>> Windows Vista RC1 x64, 1GB.
>>>
>>> Old AA: 563ms
>>> New AA: 445ms
>>>
>>
>> There were 4 Shootout Benchmark tests using hash tables - one is still 
>> active:
>>
>> http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=dlang&id=2 
>>
>>
>> Old AA: 1.85
>> New AA: 1.85
>>
>> For the three older tests:
>>
>> hash Old: 2.55
>>      New: 3.02
>>
>> hash2 Old: 3.36
>>       New: 2.88
>>
>> wordfreq Old: 0.178
>>          New: 0.170
>>
>> (These were run as they are/were on the Shootout)
>  >
>> So it's kind-of 6 of one and 1/2 dozen of another for those tests, but 
>> your new version sure seems
>> more elegant!
> 
> I also like powers-of-2 more than primes ;)
> 
>> Based on this and looking at the code for hash.d where the new version 
>> doesn't perform as well, it looks like the new version has more 
>> collisions.. If the getHash() in std/typeinfo/ti_Aa.d could be changed 
>> to provide a more unique hash w/o being made a lot slower, then I'd 
>> think your new version would be faster in the general case (at least 
>> for char[] AA keys).
> 
> How have you tested the number of collisions? It would be fairly trivial 
> to add a counter to aaA.d to keep track of the number of collisions.
> 
> The size of the AA's internal array is the biggest factor, I've noticed. 
> Depending on the benchmark, the new AA might end-up having either a much 
> bigger or smaller array than the old AA, and therefor much less/more 
> collisions. To correctly test the two implementations, we should either 
> add a collision metric (testing the hashing quality) or only compare 
> arrays of similar size (which is not possible given the way the two 
> implementations resize).
> 
> A good thing of the new AA is the fact that it's sizing can be more 
> easily controlled, and the AA could, in theory, be easily tuned for the 
> task at hand. The old AA uses that static list of primes (half of which, 
> in fact, are never being used; see thread in D.learn).
> 
> I'll try using the string hash method from java "h=7; foreach(b;a) 
> h=(h<<5)+b-h;" but I doubt that there'll be any useful result. Although, 
> one less multiplication might make it faster, still ;)
> 
> L.

The java algorithm doesn't improve any of my tests and is slower yet for hash.d.

Just for kicks, I tried changing the current getHash() multiplier (using your new AA) from 11 to 13 
and got slightly better results for everything.

knuc Old: 2.44 (*)
      11:  2.03
      13:  1.93

hash 11: 3.02
      13: 2.94

hash2 11: 2.88
       13: 2.68

wordfreq 11: 0.170
          13: 0.160

The last two are about 5% better - probably worth the minor change to getHash().

hash2.d is lookup intensive and wordfreq along with knuc and your Lisp port have sparse keys so I'd 
say combining your new AA code along with a slight mod. to the getHash() multiplier would be the way 
to go.

The only one that's slower (hash.d) will probably improve as the gc improves because I think part of 
the difference is also more frequent allocations (sorry, I don't have the time right now to 
instrument this stuff).

Nice work! I don't really like the idea of the static prime number array sizes either.

* The knuc in my first message was using a hashtable library, not the built-in AA's. This last 
comparison was with a version using the built-in AA's and is now within a 1/10 of a second of the 
prior version - no need for the custom hashtable!