Help optimizing UnCompress for gzipped files

Steven Schveighoffer schveiguy at yahoo.com
Tue Jan 2 20:13:00 UTC 2018


On 1/2/18 1:01 PM, Christian Köstlin wrote:
> On 02.01.18 15:09, Steven Schveighoffer wrote:
>> On 1/2/18 8:57 AM, Adam D. Ruppe wrote:
>>> On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
>>>> You can make it much faster by using a sliced static array as buffer.
>>>
>>> Only if you want data corruption! It keeps a copy of your pointer
>>> internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605
>>>
>>> It also will always overallocate new buffers on each call
>>> <https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>
>>>
>>> There is no efficient way to use it. The implementation is substandard
>>> because the API limits the design.
>>
>> iopipe handles this quite well. And deals with the buffers properly
>> (yes, it is very tricky. You have to ref-count the zstream structure,
>> because it keeps internal pointers to *itself* as well!). And no, iopipe
>> doesn't use std.zlib, I use the etc.zlib functions (but I poached some
>> ideas from std.zlib when writing it).
>>
>> https://github.com/schveiguy/iopipe/blob/master/source/iopipe/zip.d
>>
>> I even wrote a json parser for iopipe. But it's far from complete. And
>> probably needs updating since I changed some of the iopipe API.
>>
>> https://github.com/schveiguy/jsoniopipe
>>
>> Depending on the use case, it might be enough, and should be very fast.
>>
> Thanks Steve for this proposal (actually I already had an iopipe version
> on my harddisk that I applied to this problem) Its more or less your
> unzip example + putting the data to an appender (I hope this is how it
> should be done, to get the data to RAM).

Well, you don't need to use appender for that (and doing so is copying a 
lot of the data an extra time). All you need is to extend the pipe until 
there isn't any more new data, and it will all be in the buffer.

// almost the same line from your current version
auto mypipe = openDev("../out/nist/2011.json.gz")
                   .bufd.unzip(CompressionFormat.gzip);

// This line here will work with the current release (0.0.2):
while(mypipe.extend(0) != 0) {}

//But I have a fix for a bug that hasn't been released yet, this would 
work if you use iopipe-master:
mypipe.ensureElems();

// getting the data is as simple as looking at the buffer.
auto data = mypipe.window; // ubyte[] of the data

> iopipe is already better than the normal dlang version, almost like
> java, but still far from the solution. I updated
> https://github.com/gizmomogwai/benchmarks/tree/master/gunzip
> 
> I will give the direct gunzip calls a try ...
> 
> In terms of json parsing, I had really nice results with the fast.json
> pull parser, but its comparing a little bit apples with oranges, because
> I did not pull out all the data there.

Yeah, with jsoniopipe being very raw, I wouldn't be sure it was usable 
in your case. The end goal is to have something fast, but very easy to 
construct. I wasn't planning on focusing on the speed (yet) like other 
libraries do, but ease of writing code to use it.

-Steve


More information about the Digitalmars-d-learn mailing list