By ref and by pointer kills performance.

claptrap clap at
Tue Feb 13 02:11:45 UTC 2024

I was refactoring some code and changed a parameter from by 
value, to by pointer, and saw the performance drop by 50%. This 
is a highly reduced example of what I found, but basically 
passing something into a function by reference or pointer seems 
to make the compilers (it affects both DMD and LDC) treat it as 
if its volatile and must be loaded from memory on every use. This 
also inhibits the auto-vectorization of code by LDC.

void fillBP(uint* value, uint* dest)
     dest[0] = *value;
     dest[1] = *value;
     dest[2] = *value;
     dest[3] = *value;
codegen DMD -->

                 push    RBP
                 mov     RBP,RSP
                 mov     ECX,[RSI]
                 mov     [RDI],ECX
                 mov     EDX,[RSI]
                 mov     4[RDI],EDX
                 mov     R8D,[RSI]
                 mov     8[RDI],R8D
                 mov     R9D,[RSI]
                 mov     0Ch[RDI],R9D
                 pop     RBP

codgen LDC -->

         mov     eax, dword ptr [rdi]
         mov     dword ptr [rsi], eax
         mov     eax, dword ptr [rdi]
         mov     dword ptr [rsi + 4], eax
         mov     eax, dword ptr [rdi]
         mov     dword ptr [rsi + 8], eax
         mov     eax, dword ptr [rdi]
         mov     dword ptr [rsi + 12], eax
void fillBV(uint value, uint* dest)
     dest[0] = value;
     dest[1] = value;
     dest[2] = value;
     dest[3] = value;
codgen DMD -->

                 push    RBP
                 mov     RBP,RSP
                 mov     [RDI],ESI
                 mov     4[RDI],ESI
                 mov     8[RDI],ESI
                 mov     0Ch[RDI],ESI
                 pop     RBP

codegen LDC -->

         movd    xmm0, edi
         pshufd  xmm0, xmm0, 0
         movdqu  xmmword ptr [rsi], xmm0

Interestingly if you do this...
void fillBP(uint* value, uint* dest)
     uint tmp = *value;
     dest[0] = tmp;
     dest[1] = tmp;
     dest[2] = tmp;
     dest[3] = tmp;
You get identical code to the by value versions. (except the load 
from memory)

I'm not a compiler guy so maybe there's some rationale for this 
that I don't know but it seems like the compiler should be able 
to read "*value" once and cache it.

More information about the Digitalmars-d mailing list