[Issue 19443] core.simd generates incorrect code for MOVHLPS

Sun Mar 21 06:44:15 UTC 2021

https://issues.dlang.org/show_bug.cgi?id=19443

--- Comment #5 from Walter Bright <bugzilla at digitalmars.com> ---
The MOVHLPS instruction is encoded:

NP 0F 12 /r MOVHLPS xmm1, xmm2

"Moves two packed single-precision floating-point values from the high quadword
of the second XMM argument (second operand) to the low quadword of the first
XMM register (first argument). The quadword at bits 127:64 of the destination
operand is left unchanged. Bits (MAXVL-1:128) of the corresponding destination
register remain unchanged."

The MOVLPS instruction is encoded:

NP 0F 12 /r MOVLPS xmm1, m64

"Moves two packed single-precision floating-point values from the source 64-bit
memory operand and stores them in the low 64-bits of the destination XMM
register. The upper 64bits of the XMM register are preserved. Bits
(MAXVL-1:128) of the corresponding destination register are preserved."

https://www.felixcloutier.com/x86/movlps
https://www.felixcloutier.com/x86/movhlps

Looking at the code:

    float4 a = [1.0f, 2.0f, 3.0f, 4.0f];
    float4 b = [5.0f, 6.0f, 7.0f, 8.0f];
    float4 r = cast(float4) __simd(XMM.MOVHLPS, a, b);
    float[4] correct = [7.0f, 8.0f, 3.0f, 4.0f];
    assert(r.array == correct); // FAIL, produces [5, 6, 3, 4] instead

The problem appears to be that the second operand needs to be forced into an
XMM register rather than remaining in memory.

--