std.compress

Wed Jun 5 11:11:17 PDT 2013

On Wednesday, 5 June 2013 at 17:36:05 UTC, Andrei Alexandrescu 
wrote:
> Walter's algo traffics in InputRange!ubyte and offers an 
> InputRange!ubyte. That makes sense in some situations, but 
> often trafficking one ubyte at a time may be not only 
> inefficient but also the wrong granularity. Consider:
>
> auto data = File("input.txt").byChunk().compress();
>
> That won't work because byChunk deals in ubyte[], not ubyte. 
> How do we fix this while keeping everybody efficient?
>
> I talked to Walter and during this work he figured a lot of 
> things about how ranges work and how they generate code. Turns 
> out that the range equivalent of a tight loop is slightly less 
> efficient with dmd because a range must keep its state 
> together, which is harder to enregister than a bunch of 
> automatic variables.
>
> Right now we have joiner(), which given several ranges of T, 
> offers a range of T. Developing along that idea, we should have 
> two opposite functions: itemize() and collect().
>
> itemize() takes a range of ranges of T and offers a range of T. 
> For example, given a range of T[], offers a range of T.
>
> collect() takes a range of T and offers a range of T[]. The 
> number of items in each chunk can be a parameter.
>
> With that in tow, we can set things up such that compress() and 
> expand() traffic in ranges of ranges of ubyte (or simply ranges 
> of ubyte[]), which ensures work at maximum speed. Then the 
> matter of adapting to and fro ranges of ubyte is a simple 
> matter of chaining a call to itemize() or collect().

I like them.  How would itemize differ from joiner though? (apart 
from hopefully not converting narrow strings to wide strings like 
joiner currently seems to).