Processing a gzipped csv-file by line-by-line
Steven Schveighoffer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri May 12 06:17:25 PDT 2017
On 5/11/17 8:18 PM, H. S. Teoh via Digitalmars-d-learn wrote:
> On Wed, May 10, 2017 at 11:40:08PM +0000, Jesse Phillips via Digitalmars-d-learn wrote:
>> If you can get the zip to decompress into a range of dchar then
>> std.csv will work with it. It is by far not the fastest, but much
>> speed is lost since it supports input ranges and doesn't specialize on
>> any other range type.
>
> I actually spent some time today to look into whether fastcsv can
> possibly be made to work with general input ranges as long as they
> support slicing... and immediately ran into the infamous autodecoding
> issue: strings are not random-access ranges because of autodecoding, so
> it would require either extensive code surgery to make it work, or ugly
> hacks to bypass autodecoding. I'm quite tempted to attempt the latter,
> in fact, but not now since it's getting busier at work and I don't have
> that much free time to spend on a major refactoring of fastcsv.
Yeah, iopipe treats char[] as a random-access sliceable range :)
Autodecoding gets annoying if you want to do anything fancy (like
chain(somestr, someotherstr))
> Alternatively, I could possibly hack together a version of fastcsv that
> took a range of const(char)[] as input (rather than a single string), so
> that, in theory, it could handle arbitrarily large input files as long
> as the caller can provide a range of data blocks, e.g., File.byChunk, or
> in this particular case, a range of decompressed data blocks from
> whatever decompressor is used to extract the data. As long as you
> consume the individual rows without storing references to them
> indefinitely (don't try to make an array of the entire dataset),
> fastcsv's optimizations should still work, since unreferenced blocks
> will eventually get cleaned up by the GC when memory runs low.
I'm interested in getting a fast CSV parser built on top of iopipe. I
may fork your code and see if I can get it to work. Since you already
work on arrays, it should be quite simple, since arrays are also iopipes
by default.
-Steve
More information about the Digitalmars-d-learn
mailing list