[Issue 17484] New: high penalty for vbroadcastsd with -mcpu=avx
    via Digitalmars-d-bugs 
    digitalmars-d-bugs at puremagic.com
       
    Thu Jun  8 20:58:02 PDT 2017
    
    
  
https://issues.dlang.org/show_bug.cgi?id=17484
          Issue ID: 17484
           Summary: high penalty for vbroadcastsd with -mcpu=avx
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P3
         Component: dmd
          Assignee: nobody at puremagic.com
          Reporter: code at dawg.eu
With -mcpu=avx, the compiler emits
  vbroadcastsd ymm2, qword ptr [rsp]
even when initializing only 128-bit wide double2 variables.
This causes a high 50-80 cycle penalty when later some legacy SSE instruction
is used with such a register value (or a derived value), because the CPU does
not know that the upper bits are zero, and apparently preserves them in an
internal register buffer.
https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx
We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM
registers are used, and B avoid mixing legacy encoded SSE instructions (movsd)
with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd.
--
    
    
More information about the Digitalmars-d-bugs
mailing list