How to read fastly files ( I/O operation)

monarch_dodra monarchdodra at gmail.com
Wed Feb 6 23:26:17 PST 2013


On Wednesday, 6 February 2013 at 22:55:14 UTC, FG wrote:
> On 2013-02-06 21:43, monarch_dodra wrote:
>> On Wednesday, 6 February 2013 at 19:19:52 UTC, FG wrote:
>>> I have processed the file SRR077487_1.filt.fastq from
>>> ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00096/sequence_read/
>>> and expect this syntax (no multiline sequences or whitespace).
>>> File takes up almost 6 GB processing took 1m45s - twice as 
>>> fast as the
>>> fastest D solution so far
>>
>> Do you mean my solution above? I tried your solution with dmd, 
>> with -release -O
>> -inline, and both gave about the same result (69s yours, 67s 
>> mine).
>
> Yes. Maybe CPU is the bottleneck on my end.
> With DMD32 2.060 on win7-64 compiled with same flags I got:
> MD: 4m30 / FG: 1m55s - both using 100% of one core.
> Quite similar results with GDC64.
>
> You have timed the same file SRR077487_1.filt.fastq at 67s?

Yes, that file exactly. That said, I'm working on an SSD, so 
maybe I'm less IO bound than you are?

My attempt was mostly to try and see how fast we could go, while 
doing it only with high level stuff (eg, no fSomething calls).

Probably, going lower level, and parsing the text manually, 
waiting for magic characters could yield better result (like what 
you did).

I'm going to also try playing around with threads: Just last week 
I wrote a program that did exactly this (asynchronous file reads).

That said, I'll be making this priority n°2. I'd like to make the 
parser work perfectly first, and in a way that is easily 
upgradeable/useable. Mr. bio made it perfectly clear that he 
needed support for whites and line feeds ;)


More information about the Digitalmars-d-learn mailing list