Strange counter-performance in an alternative `decimalLength9` function

Wed Feb 26 01:10:07 UTC 2020

On Wed, Feb 26, 2020 at 12:50:35AM +0000, Basile B. via Digitalmars-d-learn wrote:
[...]
> #!dmd -boundscheck=off -O -release -inline
[...]

TBH, I'm skeptical of any performance results using dmd.  I wouldn't pay
attention to performance numbers obtained this way, and rather look at
the ldmd/ldc2 numbers.

[...]
> Then bad surprise. Even with ldmd (so ldc2 basically) feeded with the
> args from the script line. Maybe the fdecimalLength9 version is
> slightly faster.  Only *slightly*. Good news, I've lost my time. So I
> try an alternative version that uses a table of delegates instead of a
> switch (ffdecimalLength9) and surprise, "tada", it is like **100x**
> slower then the two others.
> 
> How is that possible ?

Did you check the assembly output to see what the difference is?

Delegates involve a function call, which involves function call
overhead, which includes a CPU pipeline hazard.  Worse yet it's an
indirect call, meaning you're defeating the CPU branch predictor and
invalidating the instruction cache. And on top of that, delegates
involve allocating a context, and you *really* do not want allocations
inside an inner loop.

And of course, normal function calls are easier for compilers to inline,
because the destination is fixed. Indirect calls involving delegates are
hard to predict, and the optimizer is more liable to just give up.

These are just my educated guesses, of course.  For the real answer,
look at the assembly output. :-D

T

-- 
What are you when you run out of Monet? Baroque.