Reducing the cost of autodecoding

Wed Oct 12 06:53:03 PDT 2016

So we've had a good run with making popFront smaller. In ASCII 
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. 
$]. Smaller functions make sure that the impact on instruction cache in 
larger applications is not high.

Now it's time to look at the end-to-end cost of autodecoding. I wrote 
this simple microbenchmark:

=====
import std.range;

alias myPopFront = std.range.popFront;
alias myFront = std.range.front;

void main(string[] args) {
     import std.algorithm, std.array, std.stdio;
     char[] line = "0123456789".dup.repeat(50_000_000).join;
     ulong checksum;
     if (args.length == 1)
     {
         while (line.length) {
             version(autodecode)
             {
                 checksum += line.myFront;
                 line.myPopFront;
             }
             else
             {
                 checksum += line[0];
                 line = line[1 .. $];
             }
         }
         version(autodecode)
             writeln("autodecode ", checksum);
         else
             writeln("bytes ", checksum);
     }
     else
         writeln("overhead");
}
=====

On my machine, with "ldc2 -release -O3 -enable-inlining" I get something 
like 0.54s overhead, 0.81s with no autodecoding, and 1.12s with 
autodecoding.

Your mission, should you choose to accept it, is to define a combination 
front/popFront that reduces the gap.

Andrei