std.compress

Wed Jun 5 19:56:32 PDT 2013

Am Wed, 05 Jun 2013 13:36:06 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:

> I think it's worth discussing the interface much more than the approach 
> to modularization.
> 
> Walter's algo traffics in InputRange!ubyte and offers an 
> InputRange!ubyte. That makes sense in some situations, but often 
> trafficking one ubyte at a time may be not only inefficient but also the 
> wrong granularity. Consider:
> 
> auto data = File("input.txt").byChunk().compress();

I realized that yesterday. Now that my circular buffer
implementation appears to be thread safe enough *cough* to run
some tests I realized that when I produce 64 KiB input blocks
and mark them as read byte-by-byte I get a massive overhead.

> That won't work because byChunk deals in ubyte[], not ubyte. How do we 
> fix this while keeping everybody efficient?

My first attempt will keep a byte-wise interface but use
larger buffer chunks inside the circular buffer, so that
some of the checks (most notably: consumer marks a byte of
buffer as read, can the producer continue?) can be delayed
until e.g. 4 KiB or whatever is available have been processed.

> I talked to Walter and during this work he figured a lot of things about 
> how ranges work and how they generate code. Turns out that the range 
> equivalent of a tight loop is slightly less efficient with dmd because a 
> range must keep its state together, which is harder to enregister than a 
> bunch of automatic variables.

Then again, for performance critical applications dmd has long
lost touch with GCC and LLVM, so while of big interest for its
author and sometimes in "D is slow" threads, there are good
alternatives. (I think LLVM is a real enabler in the current
explosion of programming languages.)

> Right now we have joiner(), which given several ranges of T, offers a 
> range of T. Developing along that idea, we should have two opposite 
> functions: itemize() and collect().
> 
> itemize() takes a range of ranges of T and offers a range of T. For 
> example, given a range of T[], offers a range of T.
> 
> collect() takes a range of T and offers a range of T[]. The number of 
> items in each chunk can be a parameter.
> 
> With that in tow, we can set things up such that compress() and expand() 
> traffic in ranges of ranges of ubyte (or simply ranges of ubyte[]), 
> which ensures work at maximum speed. Then the matter of adapting to and 
> fro ranges of ubyte is a simple matter of chaining a call to itemize() 
> or collect().
> 
> Destroy.

That's neat from a usability point of view. Does this mean
collect() will always return a range of built-in arrays,
since it uses arrays as the internal buffer?

> I salute this code. It is concise, well engineered, well written, just 
> as general as it needs, uses the right features in the right places, and 
> does real work. A perfect example to follow. The only thing I'd add is 
> this convenience function:
> 
> void putBits(bool[] bits...) {
>      foreach (b; bits) putBit(b);
> }
> 
> That reduces a bunch of code, and leaves room for future optimizations 
> inside putBits.
> 
> 
> Andrei

Putting single bits into the buffer is an important
functionality, but I'm not convinced of this solution. Real
code has the bits in integer variables or could use:

buffer.putBits!3(0b111);
 - or -
buffer.putBits(3, 0b111);

It is always more efficient than using a variadic number of function
parameters or even a bool[]. In many cases the bits can be ORed on
the current write position directly.

One other feature worth adding might be to reverse the bit order
before putting them into or after pulling them out of the buffer.
It happens once in gzip code I've seen.

-- 
Marco