How to read fastly files ( I/O operation)

monarch_dodra monarchdodra at gmail.com
Fri Feb 8 06:26:42 PST 2013


On Friday, 8 February 2013 at 09:08:48 UTC, bioinfornatics wrote:
> And use size_t instead to int for getChar/getInt method as type 
> returned
>
> gdmd -w -O -release monarch.d
> ~ $ time ./monarch 
> /env/cns/proj/projet_AZH/A/RunsSolexa/121114_FLUOR_C16L5ACXX/AZH_AOSC_8_1_C16L5ACXX.IND1_clean.fastq
> globalStats:
> A: 1007129068. C: 1350576504. G: 1353023772. M:   0. D:   0. S:
>   0. H:   0. N: 39413. V:   0. U:   0. W:   0. R:   0. B:   0. 
> Y:   0. K:   0. T: 999786820.
> time: 176585
>
> real	2m56.635s
> user	2m31.376s
> sys	0m23.077s
>
>
> this program is little less fast than f's program

I've re-tried running both mine and FG's on a HDD based machine, 
with dmd, -O -release. Also optional "inline"

I also wrote a new parser, which does as FG suggested, and just 
parses straight up (byLine is indeed more expensive). This one 
handles whites and line breaks correctly. It also accepts lines 
of any size (the internal buffer is auto-grow).

My results are different from yours though:

         w/o inline  w inline
FG      105s        77s
MD       72s        64s
newMD    61s        59s

I have no idea why you guys are getting better results with FG, 
and I'm getting better results with mine. Is this a win/linux or 
dmd/gdc issue. My new parser is based on raw reads, so that 
should be much faster on your machines.

> about parser I would like create a set a biology parser and put 
> into a lib with a set of common compute as letter counter.
> By example you could run a letter counter compute throw a fata 
> or fastq file.
> rename identifier thwow a fata or fastq file.

I don't really understand what all that means.

In any case, I've been able to implement some cool features so 
far. My parser is a "true" range you can pass around, and you 
won't have any problems with it.

It returns "shallow" objects that reference a mutable string, 
however, the user can call "dup" or "idup" to have a new object.

Said objects can be printed directly, so there is no need for a 
specialized "writer". As a matter of fact, this little program 
will allow you to "clean" a file (strip spaces), and potentially, 
line-wrap at 80 chars:

//----
import std.stdio;

import fastq.parser;
import fastq.q;

void main(string[] args)
{
     Parser parser = new Parser(args[1]);
     File   output = File(args[2], "wb");
     foreach(entry; parser)
         writefln("%80s", entry);
}
//----

I'll submit it for your review, once it is perfectly implemented.


More information about the Digitalmars-d-learn mailing list