Processing a gzipped csv-file by line-by-line

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri May 12 06:17:25 PDT 2017


On 5/11/17 8:18 PM, H. S. Teoh via Digitalmars-d-learn wrote:
> On Wed, May 10, 2017 at 11:40:08PM +0000, Jesse Phillips via Digitalmars-d-learn wrote:

>> If you can get the zip to decompress into a range of dchar then
>> std.csv will work with it. It is by far not the fastest, but much
>> speed is lost since it supports input ranges and doesn't specialize on
>> any other range type.
>
> I actually spent some time today to look into whether fastcsv can
> possibly be made to work with general input ranges as long as they
> support slicing... and immediately ran into the infamous autodecoding
> issue: strings are not random-access ranges because of autodecoding, so
> it would require either extensive code surgery to make it work, or ugly
> hacks to bypass autodecoding.  I'm quite tempted to attempt the latter,
> in fact, but not now since it's getting busier at work and I don't have
> that much free time to spend on a major refactoring of fastcsv.

Yeah, iopipe treats char[] as a random-access sliceable range :) 
Autodecoding gets annoying if you want to do anything fancy (like 
chain(somestr, someotherstr))

> Alternatively, I could possibly hack together a version of fastcsv that
> took a range of const(char)[] as input (rather than a single string), so
> that, in theory, it could handle arbitrarily large input files as long
> as the caller can provide a range of data blocks, e.g., File.byChunk, or
> in this particular case, a range of decompressed data blocks from
> whatever decompressor is used to extract the data.  As long as you
> consume the individual rows without storing references to them
> indefinitely (don't try to make an array of the entire dataset),
> fastcsv's optimizations should still work, since unreferenced blocks
> will eventually get cleaned up by the GC when memory runs low.

I'm interested in getting a fast CSV parser built on top of iopipe. I 
may fork your code and see if I can get it to work. Since you already 
work on arrays, it should be quite simple, since arrays are also iopipes 
by default.

-Steve


More information about the Digitalmars-d-learn mailing list