Strange counter-performance in an alternative `decimalLength9` function

9il ilyayaroshenko at gmail.com
Fri Feb 28 16:03:01 UTC 2020


On Friday, 28 February 2020 at 10:11:23 UTC, Bruce Carneal wrote:
> On Friday, 28 February 2020 at 06:50:55 UTC, 9il wrote:
>> On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. 
>> wrote:
>>> So after reading the translation of RYU I was interested too 
>>> see if the decimalLength() function can be written to be 
>>> faster, as it cascades up to 8 CMP.
>>>
>>> [...]
>>
>> bsr can be done in one/two CPU operation, quite quick. But 
>> core.bitop.bsr wouldn't be inlined. Instead, mir-core 
>> (mir.bitop: ctlz) or LDC intrinsics llvm_ctlz can be used for 
>> to get code with inlining.
>
> That's surprising.  I just got ldc to inline core.bitop.bsr on 
> run.dlang.io using ldc -O3 -mcpu=native. (not sure what the 
> target CPU is)

Ah, my bad. It fails to inline with LDC <= 1.14
https://d.godbolt.org/z/iz9p-6

> Under what conditions should I be guarding against an inlining 
> failure?

Mark it with `pragma(inline, true)`. LDC also has cross-module 
inlining for non-templated functions.


More information about the Digitalmars-d-learn mailing list