Speeding up text file parser (BLAST tabular format)

Mon Sep 14 07:54:32 PDT 2015

On Monday, 14 September 2015 at 14:40:29 UTC, H. S. Teoh wrote:
> If performance is a problem, the first thing I'd recommend is 
> to use a profiler to find out where the hotspots are. (More 
> often than not, I have found that the hotspots are not where I 
> expected them to be; sometimes a 1-line change to an 
> unanticipated hotspot can result in a huge performance boost.)

I agree with you on that. I used Python's cProfile module to find 
the performance bottleneck in the Python version I posted, and 
shaved off 8-10 seconds of runtime on an extraneous str.split() I 
had missed.
I tried using the built-in profiler in DMD on the D program but 
to no avail. I couldn't really make any sense of the output other 
than that were enormous amounts of calls to lots of functions I 
couldn't find a way to remove from the code. Here's a paste of 
the trace output from the version I posted in the original post: 
http://dpaste.com/1AXPK9P

> The next thing I'd try is to use gdc instead of dmd. ;-)  IME, 
> code produced by `gdc -O3` is at least 20-30% faster than code 
> produced by `dmd -O -inline`. Sometimes the difference can be 
> up to 40-50%, depending on the kind of code you're compiling.

Yes, it really seems that gdc or ldc is the way to go.