Using iopipe to stream a gzipped file
Andrew
aabrown24 at hotmail.com
Wed Jan 3 17:03:03 UTC 2018
On Wednesday, 3 January 2018 at 16:09:19 UTC, Steven
Schveighoffer wrote:
> On 1/3/18 9:45 AM, Andrew wrote:
>> Hi,
>>
>> I have a very large gziped text file (all ASCII characters and
>> ~500GB) that I want to stream and process line-by-line, and I
>> thought the iopipe library would be perfect for this, but I
>> can't seem to get it to work. So far, this is the closest I
>> have to getting it to work:
>>
>> import iopipe.textpipe;
>> import iopipe.zip;
>> import iopipe.bufpipe;
>> import iopipe.stream;
>>
>> void main()
>> {
>>
>> auto fileToRead =
>> openDev("file.gz").bufd.unzip(CompressionFormat.gzip);
>>
>> foreach (line; fileToRead.assumeText.byLineRange!false)
>> {
>> \\ do stuff
>> }
>> }
>>
>> but this only processes the first ~200 odd lines (I guess the
>> initial read into the buffer). Can anyone help me out?
>
> Do you have a sample file I can play with? Your iopipe chain
> looks correct, so I'm not sure why it wouldn't work.
>
> -Steve
A sample file (about 250MB) can be found here:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
It should have 1,103,800 lines, but the following code only
reports 256:
import iopipe.textpipe;
import iopipe.zip;
import iopipe.bufpipe;
import iopipe.stream;
import std.stdio;
void main()
{
auto fileToRead =
openDev("ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz").bufd.unzip(CompressionFormat.gzip);
auto counter = 0;
foreach (line; fileToRead.assumeText.byLineRange!false)
{
counter++;
}
writeln(counter);
}
Thanks for looking into this.
Andrew
More information about the Digitalmars-d-learn
mailing list