Ranges usages
Artur Skawina
art.08.09 at gmail.com
Thu Nov 22 06:02:34 PST 2012
On 11/22/12 02:14, bearophile wrote:
> In the following program there are two versions of the compress function, that implement the same bare-bones version of the LZW (http://en.wikipedia.org/wiki/Lempel-Ziv-Welch ) compression algorithm.
>
> compress1 is the simpler version, while compress2 avoids some heap allocations using a Slice that is able to grow (in some of my benchmarks compress2 is about twice faster than compress1).
> import std.stdio, std.array;
>
> alias T = ubyte;
> alias Ta = immutable(T)[];
>
> int[] compress1(immutable T[] original) {
> int[Ta] dictionary;
> foreach (i; 0 .. 256)
> dictionary[cast(Ta)[i]] = i;
>
> Ta w;
> int[] result;
> foreach (ch; original) {
> auto wc = w ~ ch;
> if (wc in dictionary) {
> w = wc;
> } else {
> result ~= dictionary[w];
> dictionary[wc] = dictionary.length - 1;
> w = [ch];
> }
> }
>
> return w.empty ? result : (result ~ dictionary[w]);
> }
>
> int[] compress2(immutable T[] original) {
> int[Ta] dictionary;
> foreach (i; 0 .. 256)
> dictionary[cast(Ta)[i]] = i;
>
> struct Slice {
> size_t start, end;
> @property opSlice() {
> return original[start .. end];
> }
> alias this = opSlice;
> }
>
> Slice w;
> int[] result;
> foreach (i; 0 .. original.length) {
> auto wc = Slice(w.start, w.end + 1);
> if (wc in dictionary) {
> w = wc;
> } else {
> result ~= dictionary[w];
> dictionary[wc] = dictionary.length - 1;
> w = Slice(i, i + 1);
> }
> }
>
> return w.empty ? result : (result ~ dictionary[w]);
> }
>
> void main() {
> auto txt = cast(Ta)"TOBEORNOTTOBEORTOBEORNOT";
> writeln(compress1(txt));
> writeln(compress2(txt));
> }
> Is it possible to write similar code nicely & efficiently with ranges?
First, the code above is practically a textbook example on how /not/ to
write readable and efficient code. So lets do a saner implementation:
import std.stdio, std.array, std.range;
OUT[] compress(OUT=int, R)(const R original) {
alias typeof(original[0..1]) Elems;
alias typeof(cast()Elems[0]) Elem;
IDAA!(OUT, const Elems) dictionary;
OUT[] result;
size_t start, end = 1;
while (end<=original.length) {
if (original[start..end] in dictionary)
{ ++end; continue; }
result ~= dictionary[original[start..end-1]];
dictionary[original[start..end]] = cast(OUT)dictionary.length;
start = end-1;
}
return original[start..end-1].empty ? result : (result ~ dictionary[original[start..end-1]]);
}
struct IDAA(VAL, IDX) {
VAL[IDX] map;
enum M = IDX[0].max;
static immutable VAL[M+1] idmap = {
VAL[M+1] a;
foreach (VAL c; 0..M)
a[c] = c;
return a;
}();
auto opIndex(const IDX idx) inout {
return idx.length==1 ? idmap[idx[0]] : map[idx];
}
auto opIndexAssign(const VAL val, const IDX idx) {
return idx.length==1 ? idx[0] : (map[idx] = val);
}
const opBinaryRight(string op:"in")(const IDX idx) /*inout*/ {
return idx.length==1 ? &idmap[idx[0]] : idx in map;
}
@property length() /*inout*/ { return map.length + idmap.length; }
}
This code is about an order of magnitude faster than both of your versions
(which performed ~ equally here w/ gdc), as the initialization, allocations,
copies and AA ops were completely dominating everything else.
Now, when we have something to compare against, rewriting it to use ranges
is trivial:
OUT[] compress(OUT=int, R)(R original) {
IDAA!(OUT, const typeof(cast()original.front)[]) dictionary;
OUT[] result;
size_t l = 1;
while (original.length>=l) {
if (original.takeExactly(l) in dictionary)
{ ++l; continue; }
result ~= dictionary[original.takeExactly(l-1)];
dictionary[original.takeExactly(l)] = cast(OUT)dictionary.length;
original.popFrontN(l-1);
l = 1;
}
return original.empty ? result : (result ~ dictionary[original]);
}
And the (surprising, for me) result is that this code performs just as well
as the "saner" version above.
So i guess the answer is that it is possible.
For some definition of "nicely", as I don't like takeExactly and popFrontN. ;)
artur
More information about the Digitalmars-d
mailing list