GZip File Reading

dsimcha dsimcha at yahoo.com
Thu Mar 10 06:20:34 PST 2011


On 3/10/2011 5:57 AM, Lars T. Kyllingstad wrote:
> Nope, a gzip or bzip2 file only contains a single file.  To zip several
> files, you first make a tar archive, and then you run gzip or bzip2 on
> it.  That's why most compressed archives targeted at the Linux platform
> have extensions like .tar.gz, .tar.bz2, and so on.
>
> -Lars

This is **exactly** my point.  These single-file gzip and bzip2 files 
should be usable with exactly the same API as uncompressed file I/O.  My 
personal use case for this is files that contain large amounts of DNA 
sequence.  This compresses very well, since besides a little meta-info 
it's just a bunch of A's, C's, G's and T's.  I want to be able to read 
in these huge files and decompress them transparently on the fly.

Another example (and the one that brought the subject of these 
non-tarred gzips to my attention) is the svgz format.  This is an image 
format, and is literally just a gzipped SVG.  Uncompressed SVG is a 
ridiculously bloated format but compresses very well, so the SVG 
standard requires that gzipped SVG files "just work" transparently with 
any SVG-compliant program.  I recently added svgz support to plot2kill, 
and it was somewhat of a PITA because I had to find the C API buried in 
etc.c.zlib and then I got stuck using it instead of a nice D API.

The bigger point, though, is that use cases for non-tarred single-file 
gzips do exist and they should be handled transparently via an interface 
identical to normal file I/O.


More information about the Digitalmars-d mailing list