LCD inline assembly expressions

Sun Dec 23 15:01:49 UTC 2018

On Sunday, 23 December 2018 at 13:33:51 UTC, kinke wrote:
> On Sunday, 23 December 2018 at 13:00:54 UTC, NaN wrote:
>> Is there any difference between using this vs the other method 
>> of doing intrinsics?
>
> Assuming there's really no LLVM intrinsic for your desired 
> instruction, the manual variant is what it is, a regular 
> function with an inline asm expression. I guess the LLVM 
> backends lower calls to these instruction-intrinsics directly 
> to inline asm expressions in the caller. With inlining, it 
> might result in equivalent final asm.
>
> My version above with the memory indirection isn't nice, this 
> is better:
>
> extern(C) int4 _mm_cmpgt_epi32(int4 a, int4 b) {
>   return __asm!int4("pcmpgtd $2,$1", "={xmm0},{xmm0},{xmm1}", 
> a, b);
> }
>
> and is going to be inlined with `-O`.

that's pretty much what I've got, i've been using compiler 
explorer so I can see what actually gets generated. Been quite an 
eye opener how good the LLVM optimizer is tbh.

> Note that if you used equivalent naked DMD-style inline asm 
> instead, e.g.,
>
> extern(C) int4 _mm_cmpgt_epi32(int4 a, int4 b) {
>   asm {
>     naked;
>     pcmpgtd XMM0, XMM1;
>     ret;
>   }
> }
>
> that is lowered to *module*-level inline asm and the function 
> is NOT inline-able.

Im ignoring DMD since it kills performance by about 60% anyway.