Using iopipe to stream a gzipped file

Andrew aabrown24 at hotmail.com
Wed Jan 3 17:03:03 UTC 2018


On Wednesday, 3 January 2018 at 16:09:19 UTC, Steven 
Schveighoffer wrote:
> On 1/3/18 9:45 AM, Andrew wrote:
>> Hi,
>> 
>> I have a very large gziped text file (all ASCII characters and 
>> ~500GB) that I want to stream and process line-by-line, and I 
>> thought the iopipe library would be perfect for this, but I 
>> can't seem to get it to work. So far, this is the closest I 
>> have to getting it to work:
>> 
>> import iopipe.textpipe;
>> import iopipe.zip;
>> import iopipe.bufpipe;
>> import iopipe.stream;
>> 
>> void main()
>> {
>> 
>>    auto fileToRead = 
>> openDev("file.gz").bufd.unzip(CompressionFormat.gzip);
>> 
>>    foreach (line; fileToRead.assumeText.byLineRange!false)
>>    {
>>       \\ do stuff
>>    }
>> }
>> 
>> but this only processes the first ~200 odd lines (I guess the 
>> initial read into the buffer). Can anyone help me out?
>
> Do you have a sample file I can play with? Your iopipe chain 
> looks correct, so I'm not sure why it wouldn't work.
>
> -Steve

A sample file (about 250MB) can be found here:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

It should have 1,103,800 lines, but the following code only 
reports 256:

import iopipe.textpipe;
import iopipe.zip;
import iopipe.bufpipe;
import iopipe.stream;
import std.stdio;

void main()
{

    auto fileToRead = 
openDev("ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz").bufd.unzip(CompressionFormat.gzip);

    auto counter = 0;
    foreach (line; fileToRead.assumeText.byLineRange!false)
    {
       counter++;
    }
    writeln(counter);
}

Thanks for looking into this.

Andrew


More information about the Digitalmars-d-learn mailing list