Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant. It is still a mystery to me why that is :) Since the generated code both smaller and works almost without spills.