How to read fastly files ( I/O operation)
monarch_dodra
monarchdodra at gmail.com
Fri Feb 8 06:26:42 PST 2013
On Friday, 8 February 2013 at 09:08:48 UTC, bioinfornatics wrote:
> And use size_t instead to int for getChar/getInt method as type
> returned
>
> gdmd -w -O -release monarch.d
> ~ $ time ./monarch
> /env/cns/proj/projet_AZH/A/RunsSolexa/121114_FLUOR_C16L5ACXX/AZH_AOSC_8_1_C16L5ACXX.IND1_clean.fastq
> globalStats:
> A: 1007129068. C: 1350576504. G: 1353023772. M: 0. D: 0. S:
> 0. H: 0. N: 39413. V: 0. U: 0. W: 0. R: 0. B: 0.
> Y: 0. K: 0. T: 999786820.
> time: 176585
>
> real 2m56.635s
> user 2m31.376s
> sys 0m23.077s
>
>
> this program is little less fast than f's program
I've re-tried running both mine and FG's on a HDD based machine,
with dmd, -O -release. Also optional "inline"
I also wrote a new parser, which does as FG suggested, and just
parses straight up (byLine is indeed more expensive). This one
handles whites and line breaks correctly. It also accepts lines
of any size (the internal buffer is auto-grow).
My results are different from yours though:
w/o inline w inline
FG 105s 77s
MD 72s 64s
newMD 61s 59s
I have no idea why you guys are getting better results with FG,
and I'm getting better results with mine. Is this a win/linux or
dmd/gdc issue. My new parser is based on raw reads, so that
should be much faster on your machines.
> about parser I would like create a set a biology parser and put
> into a lib with a set of common compute as letter counter.
> By example you could run a letter counter compute throw a fata
> or fastq file.
> rename identifier thwow a fata or fastq file.
I don't really understand what all that means.
In any case, I've been able to implement some cool features so
far. My parser is a "true" range you can pass around, and you
won't have any problems with it.
It returns "shallow" objects that reference a mutable string,
however, the user can call "dup" or "idup" to have a new object.
Said objects can be printed directly, so there is no need for a
specialized "writer". As a matter of fact, this little program
will allow you to "clean" a file (strip spaces), and potentially,
line-wrap at 80 chars:
//----
import std.stdio;
import fastq.parser;
import fastq.q;
void main(string[] args)
{
Parser parser = new Parser(args[1]);
File output = File(args[2], "wb");
foreach(entry; parser)
writefln("%80s", entry);
}
//----
I'll submit it for your review, once it is perfectly implemented.
More information about the Digitalmars-d-learn
mailing list