std.compress
Diggory
diggsey at googlemail.com
Wed Jun 5 10:56:03 PDT 2013
On Wednesday, 5 June 2013 at 17:36:05 UTC, Andrei Alexandrescu
wrote:
> On 6/5/13 1:10 PM, Jonathan M Davis wrote:
>> On Wednesday, June 05, 2013 07:49:22 H. S. Teoh wrote:
>>> (Actually, std.algorithm currently sits at 11636 lines. I
>>> call BS on
>>> whoever claims to be able to "skim over" std.algorithm and
>>> "get a feel
>>> for how it works". Chances are your finger will get so tired
>>> of hitting
>>> PgDn about 2000 lines into the file that you won't even look
>>> at the
>>> rest. And most of the code is only superficially related to
>>> each other
>>> -- about 20 functions into the file you'd have lost track of
>>> all sense
>>> of how things fit together -- 'cos they *don't* really fit
>>> together!
>>> It's the epitome of why we *should* move to smaller modules,
>>> rather than
>>> the current giant monolithic ones.)
>>
>> There's a significant difference between a 10,000+ line file
>> and a file that's
>> less than 1000 lines. Walter's proposed's std.compress is less
>> than 1000 lines
>> long, and yet we're talking about splitting it up. That's
>> going to lead to
>> ridiculously small modules IMHO. There's a balance to be had
>> here that's
>> somewhere in between the two extremes.
>
> Walter's algo is among the simplest around. I expect any of the
> industry-standard algos to be significantly larger, most likely
> each worth a module.
>
> I think it's worth discussing the interface much more than the
> approach to modularization.
>
> Walter's algo traffics in InputRange!ubyte and offers an
> InputRange!ubyte. That makes sense in some situations, but
> often trafficking one ubyte at a time may be not only
> inefficient but also the wrong granularity. Consider:
>
> auto data = File("input.txt").byChunk().compress();
>
> That won't work because byChunk deals in ubyte[], not ubyte.
> How do we fix this while keeping everybody efficient?
>
> I talked to Walter and during this work he figured a lot of
> things about how ranges work and how they generate code. Turns
> out that the range equivalent of a tight loop is slightly less
> efficient with dmd because a range must keep its state
> together, which is harder to enregister than a bunch of
> automatic variables.
>
> Right now we have joiner(), which given several ranges of T,
> offers a range of T. Developing along that idea, we should have
> two opposite functions: itemize() and collect().
>
> itemize() takes a range of ranges of T and offers a range of T.
> For example, given a range of T[], offers a range of T.
>
> collect() takes a range of T and offers a range of T[]. The
> number of items in each chunk can be a parameter.
>
> With that in tow, we can set things up such that compress() and
> expand() traffic in ranges of ranges of ubyte (or simply ranges
> of ubyte[]), which ensures work at maximum speed. Then the
> matter of adapting to and fro ranges of ubyte is a simple
> matter of chaining a call to itemize() or collect().
>
> Destroy.
>
> I salute this code. It is concise, well engineered, well
> written, just as general as it needs, uses the right features
> in the right places, and does real work. A perfect example to
> follow. The only thing I'd add is this convenience function:
>
> void putBits(bool[] bits...) {
> foreach (b; bits) putBit(b);
> }
>
> That reduces a bunch of code, and leaves room for future
> optimizations inside putBits.
>
>
> Andrei
That makes a lot of sense about itemize/collect. With regard to
the efficiency of ranges - assuming the optimiser can inline the
range code would it not turn into the exact same code as a loop
using iterators? The length is needed in both cases, so I would
have thought a good optimiser would produce equivalent code for
the both of them.
Also about "putBits" - problem I see with that is that currently
DMD seems to fail at optimising variadics. Even with inlining and
optimisation turned on, the generated code is very poor:
http://forum.dlang.org/thread/bug-10085-3@http.d.puremagic.com%2Fissues%2F
More information about the Digitalmars-d
mailing list