Help optimizing UnCompress for gzipped files

Sun Jan 7 13:44:07 UTC 2018

On 1/6/18 11:14 AM, Christian Köstlin wrote:
> On 05.01.18 23:04, Steven Schveighoffer wrote:
>> One thing to try, you preallocate the ENTIRE buffer. This only works if
>> you know how many bytes it will decompress to (not always possible), but
>> it will take the allocator out of the equation completely. And it's
>> probably going to be the most efficient method (you aren't leaving
>> behind smaller unused blocks when you realloc). If for some reason we
>> can't beat/tie the C version doing that, then something else is going on.
> yes ... this is something i forgot to try out ... will do now :)
> mhh .. interesting numbers ... c is even faster, my d lowlevel solution
> is also a little bit faster, but much slower than the no copy version
> (funnily, no copy is the wrong name, it just overwrites all the data in
> a small buffer).

Not from what I'm reading, the C solution is about the same (257 vs. 
261). Not sure if you have averaged these numbers, especially on a real 
computer that might be doing other things.

Note: I would expect it to be a tiny bit faster, but not monumentally 
faster. From my testing with the reallocation, it only reallocates a 
large quantity of data once.

However, the D solution should be much faster. Part of the issue is that 
you still aren't low-level enough :)

Instead of allocating the ubyte array with this line:

ubyte[] buffer = new ubyte[200*1024*1024];

Try this instead:

// from std.array
auto buffer = uninitializedArray!(ubyte[], 200*1024*1024);

The difference is that the first one will have the runtime 0-initialize 
all the data.

> one question about iopipe. is it possible to transform the elements in
> the pipe as well ... e.g. away from a buffer of bytes to json objects?

Yes! I am working on doing just that, but haven't had a chance to update 
the toy project I wrote: https://github.com/schveiguy/jsoniopipe

I was planning actually on having an iopipe of JsonItem, which would 
work just like a normal buffer, but reference the ubyte buffer underneath.

Eventually, the final product should have a range of JsonValue, which 
you would recurse into in order to parse its children. All of it will be 
lazy, and stream-based, so you don't have to load the whole file if it's 
huge.

Note, you can't have an iopipe of JsonValue, because it's a recursive 
format. JsonItems are just individual defined tokens, so they can be linear.

-Steve