Reserving/Preallocating associative array?
Benjamin Thaut
code at benjamin-thaut.de
Fri Dec 27 01:37:09 PST 2013
Am 25.12.2013 17:12, schrieb Gordon:
> Good question, I did test each one incrementally:
>
> 1. Switching from "split+to!size_t" to "parse" (and commenting out the
> "union") reduced running time from ~26s to ~20s (on average).
>
> 2. Switching from "byLine" to "byLineFast" (and commenting out the
> "union") reduced running time from ~20s to ~14s (on average).
>
> 3. With "parse" and "byLineFast", and with GC still on, and populating
> the "union" the program took about 2m45s .
>
> 4. Adding "GC.disable" brought it down to 25s.
>
> HTH,
> -gordon
So I was interrested how far you can take this. The version bearophile
posted runs in 14 seconds (48420527 ticks) on my machine.
I created a new version using my own hashmap implementation and manual
memory management (no GC present at all).
This version runs in 12 seconds (41614375 ticks)
Preallocating all the entries for the hashmap brings quite a boost. It
brings it down to 9 seconds (32926550 ticks)
If you replace the default hash function D uses with the MurMur hash
function it brings it down even more to 8 seconds (29425713 ticks)
I also tried not using any hash function at all, but that made the time
skyrocket.
I stopped at that point, because now the most time is spend looking for
a free entry in the hashmap, which pretty much comes down to cache-misses.
At this point the profiler looked something like this:
50% find free entry in hashmap
21% parse uint
9% find end of line + tab
3.5% read data from disk
The ticks in my measurements have been obtained via QueryPerformanceCounter.
Kind Regards
Benjamin Thaut
More information about the Digitalmars-d
mailing list