GZip File Reading
dsimcha
dsimcha at yahoo.com
Thu Mar 10 06:20:34 PST 2011
On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote:
> Nope, a gzip or bzip2 file only contains a single file. To zip several
> files, you first make a tar archive, and then you run gzip or bzip2 on
> it. That's why most compressed archives targeted at the Linux platform
> have extensions like .tar.gz, .tar.bz2, and so on.
>
> -Lars
This is **exactly** my point. These single-file gzip and bzip2 files
should be usable with exactly the same API as uncompressed file I/O. My
personal use case for this is files that contain large amounts of DNA
sequence. This compresses very well, since besides a little meta-info
it's just a bunch of A's, C's, G's and T's. I want to be able to read
in these huge files and decompress them transparently on the fly.
Another example (and the one that brought the subject of these
non-tarred gzips to my attention) is the svgz format. This is an image
format, and is literally just a gzipped SVG. Uncompressed SVG is a
ridiculously bloated format but compresses very well, so the SVG
standard requires that gzipped SVG files "just work" transparently with
any SVG-compliant program. I recently added svgz support to plot2kill,
and it was somewhat of a PITA because I had to find the C API buried in
etc.c.zlib and then I got stuck using it instead of a nice D API.
The bigger point, though, is that use cases for non-tarred single-file
gzips do exist and they should be handled transparently via an interface
identical to normal file I/O.
More information about the Digitalmars-d
mailing list