Strange counter-performance in an alternative `decimalLength9` function

Thu Feb 27 21:46:08 UTC 2020

On Thursday, 27 February 2020 at 19:46:23 UTC, Basile B. wrote:
> Yes please, post the benchmark method. You see the benchmarks I 
> run with your version are always slowest. I'm aware that rndGen 
> (and generaly any uniform rnd func) is subject to a bias but I 
> dont thing this bias maters much in the case we talk about.

The code below is the test jig that I'm using currently.  It is 
adopted from yours but has added the -d=distribution command line 
option.

I have omitted the function bodies for the entries in the funcs[] 
table.  You can cut and paste there at will but I wanted to spare 
any other readers the bulk.

The timings I get from the below are entirely consistent with my 
earlier reporting and, I believe, with yours.

I urge you to investigate the difference in timings between the 
original, and currently defaulted, distribution and the -d=edig 
distribution.  If you don't understand what's going on there, or 
if you still don't believe input distributions matter, please get 
in touch.  After all, it's possible that I'm the one with the 
misunderstanding.

Have fun!

<code>
const funcs = [
     &decimalLength9_0, &decimalLength9_1, &decimalLength9_2,
     &decimalLength9_3, &decimalLength9_4, &decimalLength9_5
];

ulong runOriginalFuncCount(ubyte funcIdx, ulong count)
{
     GC.disable;

     uint sum;
     foreach (const ulong i; 0 .. count)
     {
         sum += funcs[funcIdx](rndGen.front());
         rndGen.popFront();
     }
     return sum;
}

ulong runFuncCountDist(ubyte funcIdx, ulong count, string dist)
{
     enum perDigit = 1000;
     // NOTE: this is a gross distortion given uint.max but it is 
"the spec"
     enum maxDigits = 9;
     uint[perDigit * maxDigits] samples = void;
     const chunks = count / samples.length;
     const leftOvers = count % samples.length;

     if (dist == "prng")
     {
         foreach (ref val; samples[])
         {
             val = rndGen.front();
             rndGen.popFront();
         }
     }
     else if (dist == "edig" || dist == "edigsorted")
     {
         // for reference, this is the "6 LOC IIRC"
         uint value = 1;
         foreach (dig; 0 .. maxDigits)
         {
             samples[dig * perDigit .. (dig + 1) * perDigit] = 
value;
             value *= 10;
         }

         if (auto shuffle = dist != "edigsorted")
             randomShuffle(samples[]);
     }
     else
     {
         assert(0, "dist option should be in {oprng, prng, edig, 
edigsorted}");
     }
     const func = funcs[funcIdx];

     if (count < 1) // late check gives us a handle on setup time
         return 0;
     // OK, enough with the setup, time to roll
     ulong sum = 0;
     foreach (chunk; 0 .. chunks)
     {
         foreach (sample; samples[])
             sum += func(sample);
     }
     foreach (sample; samples[$ - leftOvers .. $])
         sum += func(sample);
     return sum;
}

// This is a timing test jig for functions that accept a uint
// and return the number of decimal digits needed to represent
// that uint (except that 10 digit values are lumped in with
// 9 digit values currently).
//
// The command line options here are:
//   -c=count (the number of values to present to the function 
selected)
//   -s=seed (the value supplied to rndGen.seed)
//   -f=funcIndex (identifies the function under test, see source 
funcs[])
//   -d=distribution (one of: oprng, prng, edig, edigsorted)
//
// Distributions explained:
//   oprng: original prng distribution, a rndGen.popFront per 
function call
//   prng:  prng distribution, also rndGen but coming from a 
large, pre-filled, array
//   edig:  same number of 1-digit, 2-digit, ... numbers, shuffled
//   edigsorted: edig values but sorted to highlight the branch 
predictor impact
//
// This test jig could evolve in at least six ways:
//  1) the full digit range could (arguably should) be accounted 
for
//  2) additional distributions, like "favor small numbers", 
could be added
//  3) additional implementation ideas can be explored
//  4) the input range could be made variable
//  5) the number base could be templatized
//  6) inlined performance could be tested by aliasing Func 
(runtime switch to template)
//
//  Final note: the main benefit of working on this problem is 
better understanding, not
//  shaving nanoseconds on this particular function.

void main(string[] args)
{

     ulong count;
     uint seed;
     ubyte func;
     // NOTE: oprng writeln output will usually not equal that of 
-d=prng
     enum defaultDistribution = "oprng";
     string dist = defaultDistribution;

     GC.disable;
     getopt(args, config.passThrough, "c|count", &count, "f|func", 
&func,
             "s|seed", &seed, "d|distribution", &dist);
     rndGen.seed(seed);
     const original = dist == "oprng";
     const sum = original ? runOriginalFuncCount(func, count) : 
runFuncCountDist(func, count, dist);
     writeln(sum);
}

<code>