std.compress

Wed Jun 5 10:56:03 PDT 2013

On Wednesday, 5 June 2013 at 17:36:05 UTC, Andrei Alexandrescu 
wrote:
> On 6/5/13 1:10 PM, Jonathan M Davis wrote:
>> On Wednesday, June 05, 2013 07:49:22 H. S. Teoh wrote:
>>> (Actually, std.algorithm currently sits at 11636 lines. I 
>>> call BS on
>>> whoever claims to be able to "skim over" std.algorithm and 
>>> "get a feel
>>> for how it works". Chances are your finger will get so tired 
>>> of hitting
>>> PgDn about 2000 lines into the file that you won't even look 
>>> at the
>>> rest. And most of the code is only superficially related to 
>>> each other
>>> -- about 20 functions into the file you'd have lost track of 
>>> all sense
>>> of how things fit together -- 'cos they *don't* really fit 
>>> together!
>>> It's the epitome of why we *should* move to smaller modules, 
>>> rather than
>>> the current giant monolithic ones.)
>>
>> There's a significant difference between a 10,000+ line file 
>> and a file that's
>> less than 1000 lines. Walter's proposed's std.compress is less 
>> than 1000 lines
>> long, and yet we're talking about splitting it up. That's 
>> going to lead to
>> ridiculously small modules IMHO. There's a balance to be had 
>> here that's
>> somewhere in between the two extremes.
>
> Walter's algo is among the simplest around. I expect any of the 
> industry-standard algos to be significantly larger, most likely 
> each worth a module.
>
> I think it's worth discussing the interface much more than the 
> approach to modularization.
>
> Walter's algo traffics in InputRange!ubyte and offers an 
> InputRange!ubyte. That makes sense in some situations, but 
> often trafficking one ubyte at a time may be not only 
> inefficient but also the wrong granularity. Consider:
>
> auto data = File("input.txt").byChunk().compress();
>
> That won't work because byChunk deals in ubyte[], not ubyte. 
> How do we fix this while keeping everybody efficient?
>
> I talked to Walter and during this work he figured a lot of 
> things about how ranges work and how they generate code. Turns 
> out that the range equivalent of a tight loop is slightly less 
> efficient with dmd because a range must keep its state 
> together, which is harder to enregister than a bunch of 
> automatic variables.
>
> Right now we have joiner(), which given several ranges of T, 
> offers a range of T. Developing along that idea, we should have 
> two opposite functions: itemize() and collect().
>
> itemize() takes a range of ranges of T and offers a range of T. 
> For example, given a range of T[], offers a range of T.
>
> collect() takes a range of T and offers a range of T[]. The 
> number of items in each chunk can be a parameter.
>
> With that in tow, we can set things up such that compress() and 
> expand() traffic in ranges of ranges of ubyte (or simply ranges 
> of ubyte[]), which ensures work at maximum speed. Then the 
> matter of adapting to and fro ranges of ubyte is a simple 
> matter of chaining a call to itemize() or collect().
>
> Destroy.
>
> I salute this code. It is concise, well engineered, well 
> written, just as general as it needs, uses the right features 
> in the right places, and does real work. A perfect example to 
> follow. The only thing I'd add is this convenience function:
>
> void putBits(bool[] bits...) {
>     foreach (b; bits) putBit(b);
> }
>
> That reduces a bunch of code, and leaves room for future 
> optimizations inside putBits.
>
>
> Andrei

That makes a lot of sense about itemize/collect. With regard to 
the efficiency of ranges - assuming the optimiser can inline the 
range code would it not turn into the exact same code as a loop 
using iterators? The length is needed in both cases, so I would 
have thought a good optimiser would produce equivalent code for 
the both of them.

Also about "putBits" - problem I see with that is that currently 
DMD seems to fail at optimising variadics. Even with inlining and 
optimisation turned on, the generated code is very poor: 
http://forum.dlang.org/thread/bug-10085-3@http.d.puremagic.com%2Fissues%2F