Speeding up text file parser (BLAST tabular format)

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue Sep 15 11:10:23 PDT 2015


On Tue, Sep 15, 2015 at 08:55:43AM +0000, Fredrik Boulund via Digitalmars-d-learn wrote:
> On Monday, 14 September 2015 at 18:31:38 UTC, H. S. Teoh wrote:
> >I tried implementing a crude version of this (see code below), and
> >found that manually calling GC.collect() even as frequently as once
> >every 5000 loop iterations (for a 500,000 line test input file) still
> >gives about 15% performance improvement over completely disabling the
> >GC.  Since most of the arrays involved here are pretty small, the
> >frequency could be reduced to once every 50,000 iterations and you'd
> >pretty much get the 20% performance boost for free, and still not run
> >out of memory too quickly.
> 
> Interesting, I'll have to go through your code to understand exactly
> what's going on. I also noticed some GC-related stuff high up in my
> profiling, but had no idea what could be done about that. Appreciate
> the suggestions!

It's very simple, actually. Basically you just call GC.disable() at the
beginning of the program to disable automatic collection cycles, then at
period intervals in you manually trigger collections by calling
GC.collect().

The way I implemented it in my test code was to use a global counter
that I decrement once every loop iteration. When the counter reaches
zero, GC.collect() is called, and then the counter is reset to its
original value.  This is encapsulated in the gcTick() function, so that
it's easy to tweak the frequency of the manual collections without
modifying several different places in the code each time.


T

-- 
BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL


More information about the Digitalmars-d-learn mailing list