LLVM asm with constraints, and 2 operands
kinke
noone at nowhere.com
Mon Jul 19 17:20:21 UTC 2021
On Monday, 19 July 2021 at 16:44:35 UTC, Guillaume Piolat wrote:
> On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:
>> This workaround is actually missing the clobber constraint for
>> `%2`, which might be problematic after inlining.
>>
>
> An unrelated other issue with asm/__asm is that it doesn't
> follow consistent VEX encoding compared to normal compiler
> output.
>
> sometimes you might want: paddq x, y
> at other times: vpaddq x, y, z
>
> but rarely both in the same program.
> So this can easily nullify any gain obtained with VEX
> transition costs (if they are still a thing).
You know that asm is to be avoided whenever possible, but
unfortunately, AFAIK intel-intrinsics doesn't fit the usual
'don't worry, simply compile all your code with an appropriate
-mattr/-mcpu option' recommendation, as it employs runtime
detection of available CPU instructions.
I've just tried another option, but that doesn't play nice with
inlining:
```
import core.simd;
import ldc.attributes;
@target("sse2") // use SSE2 for this function
int4 _mm_add_int4(int4 a, int4 b)
{
return a + b; // perfect: paddd %xmm1, %xmm0
}
int4 wrapper(int4 a, int4 b)
{
return _mm_add_int4(a, b);
}
```
Compiling with `-O -mtriple=i686-linux-gnu -mcpu=i686` (=> no
SSE2 by default) shows that the inlined version inside
`wrapper()` is the mega slow one, so the extra instructions
aren't applied transitively unfortunately.
More information about the Digitalmars-d-learn
mailing list