LCD inline assembly expressions
NaN
divide at by.zero
Sun Dec 23 15:01:49 UTC 2018
On Sunday, 23 December 2018 at 13:33:51 UTC, kinke wrote:
> On Sunday, 23 December 2018 at 13:00:54 UTC, NaN wrote:
>> Is there any difference between using this vs the other method
>> of doing intrinsics?
>
> Assuming there's really no LLVM intrinsic for your desired
> instruction, the manual variant is what it is, a regular
> function with an inline asm expression. I guess the LLVM
> backends lower calls to these instruction-intrinsics directly
> to inline asm expressions in the caller. With inlining, it
> might result in equivalent final asm.
>
> My version above with the memory indirection isn't nice, this
> is better:
>
> extern(C) int4 _mm_cmpgt_epi32(int4 a, int4 b) {
> return __asm!int4("pcmpgtd $2,$1", "={xmm0},{xmm0},{xmm1}",
> a, b);
> }
>
> and is going to be inlined with `-O`.
that's pretty much what I've got, i've been using compiler
explorer so I can see what actually gets generated. Been quite an
eye opener how good the LLVM optimizer is tbh.
> Note that if you used equivalent naked DMD-style inline asm
> instead, e.g.,
>
> extern(C) int4 _mm_cmpgt_epi32(int4 a, int4 b) {
> asm {
> naked;
> pcmpgtd XMM0, XMM1;
> ret;
> }
> }
>
> that is lowered to *module*-level inline asm and the function
> is NOT inline-able.
Im ignoring DMD since it kills performance by about 60% anyway.
More information about the digitalmars-d-ldc
mailing list