[Issue 17484] New: high penalty for vbroadcastsd with -mcpu=avx
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Thu Jun 8 20:58:02 PDT 2017
https://issues.dlang.org/show_bug.cgi?id=17484
Issue ID: 17484
Summary: high penalty for vbroadcastsd with -mcpu=avx
Product: D
Version: D2
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P3
Component: dmd
Assignee: nobody at puremagic.com
Reporter: code at dawg.eu
With -mcpu=avx, the compiler emits
vbroadcastsd ymm2, qword ptr [rsp]
even when initializing only 128-bit wide double2 variables.
This causes a high 50-80 cycle penalty when later some legacy SSE instruction
is used with such a register value (or a derived value), because the CPU does
not know that the upper bits are zero, and apparently preserves them in an
internal register buffer.
https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx
We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM
registers are used, and B avoid mixing legacy encoded SSE instructions (movsd)
with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd.
--
More information about the Digitalmars-d-bugs
mailing list