Reserving/Preallocating associative array?
John Colvin
john.loughran.colvin at gmail.com
Wed Dec 25 10:25:24 PST 2013
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu
wrote:
> On 12/24/13 2:28 PM, Gordon wrote:
>> Hello,
>>
>> I want to load a large text file containing two numeric fields
>> into an
>> associative array.
>> The file looks like:
>> 1 40
>> 4 2
>> 42 11
>> ...
>>
>> And has 11M lines.
>>
>> My code looks like this:
>> ===
>> void main()
>> {
>> size_t[size_t] unions;
>> auto f = File("input.txt");
>> foreach ( line ; f.byLine() ) {
>> auto fields = line.split();
>> size_t i = to!size_t(fields[0]);
>> size_t j = to!size_t(fields[1]);
>> unions[i] = j; // <-- here be question
>> }
>> }
>> ===
>>
>> This is just a test code to illustrate my question (though
>> general
>> comments are welcomed - I'm new to D).
>>
>> Commenting out the highlighted line (not populating the hash),
>> the
>> program completes in 25 seconds.
>> Compiling with the highlighted line, the program takes ~3.5
>> minutes.
>>
>> Is there a way to speed the loading? perhaps reserving memory
>> in the
>> hash before populating it? Or another trick?
>
> void main()
> {
> size_t[size_t] unions;
> foreach (e; "input.txt"
> .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
> unions[e[0]] = e[1];
> }
> }
>
>
> Andrei
watch out for the parenthsesis on sort. As bearophile likes to
point out frequently, without parenthesis you are calling the
builtin sort, not the std.algorithm one.
Gordon, you may find this has better performance if you add () to
sort.
More information about the Digitalmars-d
mailing list