GZip File Reading

Lars T. Kyllingstad public at kyllingen.NOSPAMnet
Thu Mar 10 06:45:32 PST 2011


On Thu, 10 Mar 2011 09:20:34 -0500, dsimcha wrote:

> On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote:
>> Nope, a gzip or bzip2 file only contains a single file.  To zip several
>> files, you first make a tar archive, and then you run gzip or bzip2 on
>> it.  That's why most compressed archives targeted at the Linux platform
>> have extensions like .tar.gz, .tar.bz2, and so on.
>>
>> -Lars
> 
> This is **exactly** my point.  These single-file gzip and bzip2 files
> should be usable with exactly the same API as uncompressed file I/O.  My
> personal use case for this is files that contain large amounts of DNA
> sequence.  This compresses very well, since besides a little meta-info
> it's just a bunch of A's, C's, G's and T's.  I want to be able to read
> in these huge files and decompress them transparently on the fly.
> 
> Another example (and the one that brought the subject of these
> non-tarred gzips to my attention) is the svgz format.  This is an image
> format, and is literally just a gzipped SVG.  Uncompressed SVG is a
> ridiculously bloated format but compresses very well, so the SVG
> standard requires that gzipped SVG files "just work" transparently with
> any SVG-compliant program.  I recently added svgz support to plot2kill,
> and it was somewhat of a PITA because I had to find the C API buried in
> etc.c.zlib and then I got stuck using it instead of a nice D API.
>
> The bigger point, though, is that use cases for non-tarred single-file
> gzips do exist and they should be handled transparently via an interface
> identical to normal file I/O.

Although I agree this would be nice, I don't think std.stdio.File is the 
right place to put it.  I think a general streaming framework should be 
in place first, and File be made to work with it.  Then, working with a 
gzipped/bzipped file should be as simple as wrapping the raw File stream 
in a compression/decompression stream.

-Lars


More information about the Digitalmars-d mailing list