call @PLT Performance

Wed Jan 16 14:19:27 UTC 2019

On Wednesday, 16 January 2019 at 13:03:59 UTC, SrMordred wrote:
> Compiler noob here:
>
> auto a = popcnt(bitset);
> auto b = bsf(bitset);
>
> generate this:
>
> call    pure nothrow @nogc @safe int core.bitop.popcnt(uint)@PLT
> call    pure nothrow @nogc @safe int core.bitop.bsf(uint)@PLT
>
> Why not generate the bsf/popcnt instruction?
>
> Aren't this call's slower?

Yeah this is a known issue: LDC does not cross-module inline. You 
can enable that by passing the "-enable-cross-module-inlining" 
compile flag.
It's a long standing issue, but became a little less urgent 
because of LTO (`-flto=...`).

-Johan