inline asm in inlined function / ECX clobbered / stack frame / naked
kinke
noone at nowhere.com
Mon May 6 19:08:47 UTC 2019
On Monday, 6 May 2019 at 03:09:38 UTC, James Blachly wrote:
> I know about core.bitop.bsr and std.math.nextPow2 which uses
> it. My asm code block is 2.5x faster than codegen for (2 <<
> bsr(x)) which surprises me...
Sorry, but I'll just focus on that, and not on the asm questions.
The reason is simple, I discourage anyone from going down to asm
level if it can be avoided.
So, I have:
pragma(inline, true)
uint roundup32(uint x)
{
import core.bitop;
//if (x <= 2) return x;
return 2u << bsr(x-1);
}
`ldc2 -mtriple=x86_64-linux-gnu -O -output-s foo.d` (AT&T
syntax...):
_D3foo9roundup32FkZk:
addl $-1, %edi
bsrl %edi, %ecx
xorl $31, %ecx
xorb $31, %cl
movl $2, %eax
shll %cl, %eax
retq
I can't believe that's 2.5x slower than your almost identical asm
block. And that code is portable, not just OS- and
ABI-independent, but also architecture-wise. 1000x better than
inline asm IMO.
More information about the digitalmars-d-ldc
mailing list