Reserving/Preallocating associative array?

Daniel Kozak kozzi11 at gmail.com
Fri Dec 27 10:25:04 PST 2013


On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
> Hello,
>
> I want to load a large text file containing two numeric fields 
> into an associative array.
> The file looks like:
>    1   40
>    4   2
>    42  11
>    ...
>
> And has 11M lines.
>
> My code looks like this:
> ===
> void main()
> {
>         size_t[size_t] unions;
>         auto f = File("input.txt");
>         foreach ( line ; f.byLine() ) {
>                 auto fields = line.split();
>                 size_t i = to!size_t(fields[0]);
>                 size_t j = to!size_t(fields[1]);
>                 unions[i] = j; // <-- here be question
>         }
> }
> ===
>
> This is just a test code to illustrate my question (though 
> general comments are welcomed - I'm new to D).
>
> Commenting out the highlighted line (not populating the hash), 
> the program completes in 25 seconds.
> Compiling with the highlighted line, the program takes ~3.5 
> minutes.
>
> Is there a way to speed the loading? perhaps reserving memory 
> in the hash before populating it? Or another trick?
>
> Many thanks,
>  -gordon

using OrderedAA improve speed 3x
https://github.com/Kozzi11/Trash/tree/master/util

import util.orderedaa;

int main(string[] args)
{
     import std.stdio, std.conv, std.string, core.memory;
     import bylinefast;

     GC.disable;
     OrderedAA!(size_t, size_t, 1_000_007) unions;
     //size_t[size_t] unions;
     foreach (line; "input.txt".File.byLineFast) {
         line.munch(" \t"); // skip ws
         immutable i = line.parse!size_t;
         line.munch(" \t"); // skip ws
         immutable j = line.parse!size_t;
         unions[i] = j;
     }
     GC.enable;
	
	return 0;
}


More information about the Digitalmars-d mailing list