Speeding up text file parser (BLAST tabular format)
Edwin van Leeuwen via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Sep 14 06:10:49 PDT 2015
On Monday, 14 September 2015 at 12:50:03 UTC, Fredrik Boulund
wrote:
> On Monday, 14 September 2015 at 12:44:22 UTC, Edwin van Leeuwen
> wrote:
>> Sounds like this program is actually IO bound. In that case I
>> would not expect a really expect an improvement by using D.
>> What is the CPU usage like when you run this program?
>>
>> Also which dmd version are you using. I think there were some
>> performance improvements for file reading in the latest
>> version (2.068)
>
> Hi Edwin, thanks for your quick reply!
>
> I'm using v2.068.1; I actually got inspired to try this out
> after skimming the changelog :).
>
> Regarding if it is IO-bound. I actually expected it would be,
> but both the Python and the D-version consume 100% CPU while
> running, and just copying the file around only takes a few
> seconds (cf 15-20 sec in runtime for the two programs). There's
> bound to be some aggressive file caching going on, but I figure
> that would rather normalize program runtimes at lower times
> after running them a few times, but I see nothing indicating
> that.
Two things that you could try:
First hitlists.byKey can be expensive (especially if hitlists is
big). Instead use:
foreach( key, value ; hitlists )
Also the filter.array.length is quite expensive. You could use
count instead.
import std.algorithm : count;
value.count!(h => h.pid >= (max_pid - max_pid_diff));
More information about the Digitalmars-d-learn
mailing list