Help optimizing UnCompress for gzipped files

Fri Jan 5 14:39:19 UTC 2018

On 1/5/18 1:01 AM, Christian Köstlin wrote:
> On 04.01.18 20:46, Steven Schveighoffer wrote:
>> On 1/4/18 1:57 PM, Christian Köstlin wrote:
>>> Thanks Steve,
>>> this runs now faster, I will update the table.
>>
>> Still a bit irked that I can't match the C speed :/
>>
>> But, I can't get your C speed to duplicate on my mac even with gcc, so
>> I'm not sure where to start. I find it interesting that you are not
>> using any optimization flags for gcc.
> I guess, the code in my program is small enough that the optimize flags
> do not matter... most of the stuff is pulled from libz? Which is
> dynamically linked against /usr/lib/libz.1.dylib.

Yeah, I guess most of the bottlenecks are inside libz, or the memory 
allocator. There isn't much optimization to be done in the main program 
itself.

> I also cannot understand what I should do more (will try realloc with
> Mallocator) for the dlang-low-level variant to get to the c speed.

D compiles just the same as C. So theoretically you should be able to 
get the same performance with a ported version of your C code. It's 
worth a shot.

> rust is doing quite well there

I'll say a few words of caution here:

1. Almost all of these tests use the same C library to unzip. So it's 
really not a test of the performance of decompression, but the 
performance of memory management. And it appears that any test using 
malloc/realloc is in a different tier. Presumably because of the lack of 
copies (as discussed earlier).
2. Your rust test (I think, I'm not sure) is testing 2 things in the 
same run, which could potentially have dramatic consequences for the 
second test. For instance, it could already have all the required memory 
blocks ready, and the allocation strategy suddenly gets better. Or maybe 
there is some kind of caching of the input being done. I think you have 
a fairer test for the second option by running it in a separate program. 
I've never used rust, so I don't know what exactly your code is doing.
3. It's hard to make a decision based on such microbenchmarks as to 
which solution is "better" in an actual real-world program, especially 
when the state/usage of the memory allocator plays a huge role in this.

-Steve