How to read fastly files ( I/O operation)
monarch_dodra
monarchdodra at gmail.com
Wed Feb 6 08:06:19 PST 2013
On Wednesday, 6 February 2013 at 15:40:39 UTC, bioinfornatics
wrote:
> It seem in any case is not easy to parse fastly a file in D
I don't think that's true. D provides the same "FILE" primitive
you'd get in C, so there is no reason for it to be slower than C.
It is the "range" approach that, as convenient as it is, is not
well adapted for certain things.
As I had said, I tried to write my own program. In it, I devised
a range that, instead of exposing things to parse character by
character, parses an entire "object" (a ... "genome" ... maybe ?
I called them "Q" in my program) at once into an object. I
decided to use the very simple "byLine" primitive.
From there, you can query the object for their
name/sequence/quality. The irony is that by "parsing twice" (once
to do the io read, once to do the actual processing of the text),
and taking into account I'm allocating each object individually,
I'm running twice as fast as my original already improved
implementation. Not only is it faster, it is also more
convenient, since you can extract an entire Q object at once, and
then operate on that as you would so please: Separation of
algorithm and parsing.
It correctly takes into account that a sequence can be multiple
lines. It does not strip whitespace because according to
http://maq.sourceforge.net/fastq.shtml whitespace is not a legal
character.
Now: Keep in mind that this approach allocates (3) new strings
for each Q. You could *try* an approach with a pre-allocated
re-useable buffer. This would mean you can only operate on 1 Q at
once, but you'd probably iterate on them faster.
In any case, you can try it out:
http://dpaste.dzfl.pl/8bdd0c84
More information about the Digitalmars-d-learn
mailing list