__restrict, architecture intrinsics vs asm, consoles, and other

a a at a.com
Thu Sep 22 05:11:58 PDT 2011


> which compiles to a single shufps instruction.

Doesn't it often require additional needless  movaps instructions?
For example, the following: 

  asm
  {
    movaps XMM0, a;
    movaps XMM1, b;
    addps  XMM0, XMM1;
    movaps a, XMM0;
  }
  asm
  {
    movaps XMM0, a;
    movaps XMM1, b;
    addps  XMM0, XMM1;
    movaps a, XMM0;
  }

compiles to

movaps -0x48(%rsp),%xmm0
movaps -0x38(%rsp),%xmm1
addps    %xmm1,%xmm0
movaps %xmm0,-0x48(%rsp)
movaps -0x48(%rsp),%xmm0
movaps -0x38(%rsp),%xmm1
addps    %xmm1,%xmm0
movaps %xmm0,-0x48(%rsp)

Is it possible to avoid needlless loading and storing of values when calling multiple functions that use asm blocks? It also seems that the compiler doesn't inline functions containing asm.



More information about the Digitalmars-d mailing list