Reserving/Preallocating associative array?

Benjamin Thaut code at benjamin-thaut.de
Fri Dec 27 01:37:09 PST 2013


Am 25.12.2013 17:12, schrieb Gordon:
> Good question, I did test each one incrementally:
>
> 1. Switching from "split+to!size_t" to "parse" (and commenting out the
> "union") reduced running time from ~26s to ~20s (on average).
>
> 2. Switching from "byLine" to "byLineFast" (and commenting out the
> "union") reduced running time from ~20s to ~14s (on average).
>
> 3. With "parse" and "byLineFast", and with GC still on, and populating
> the "union" the program took about 2m45s .
>
> 4. Adding "GC.disable" brought it down to 25s.
>
> HTH,
>   -gordon


So I was interrested how far you can take this. The version bearophile 
posted runs in 14 seconds (48420527 ticks) on my machine.

I created a new version using my own hashmap implementation and manual 
memory management (no GC present at all).
This version runs in 12 seconds (41614375 ticks)

Preallocating all the entries for the hashmap brings quite a boost. It 
brings it down to 9 seconds (32926550 ticks)

If you replace the default hash function D uses with the MurMur hash 
function it brings it down even more to 8 seconds (29425713 ticks)

I also tried not using any hash function at all, but that made the time 
skyrocket.

I stopped at that point, because now the most time is spend looking for 
a free entry in the hashmap, which pretty much comes down to cache-misses.

At this point the profiler looked something like this:
50%  find free entry in hashmap
21%  parse uint
9%   find end of line + tab
3.5% read data from disk


The ticks in my measurements have been obtained via QueryPerformanceCounter.

Kind Regards
Benjamin Thaut


More information about the Digitalmars-d mailing list