Strange counter-performance in an alternative `decimalLength9` function

Fri Feb 28 10:11:23 UTC 2020

On Friday, 28 February 2020 at 06:50:55 UTC, 9il wrote:
> On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
>> So after reading the translation of RYU I was interested too 
>> see if the decimalLength() function can be written to be 
>> faster, as it cascades up to 8 CMP.
>>
>> [...]
>
> bsr can be done in one/two CPU operation, quite quick. But 
> core.bitop.bsr wouldn't be inlined. Instead, mir-core 
> (mir.bitop: ctlz) or LDC intrinsics llvm_ctlz can be used for 
> to get code with inlining.

That's surprising.  I just got ldc to inline core.bitop.bsr on 
run.dlang.io using ldc -O3 -mcpu=native. (not sure what the 
target CPU is)

Under what conditions should I be guarding against an inlining 
failure?