Inlining Ref Functions

Fri May 15 19:00:29 PDT 2009

== Quote from Bill Baxter (wbaxter at gmail.com)'s article
> Well it was shown before in a demo ray-tracer that the inability to
> inline funcs with refs caused a significant speed hit under DMD.  And
> now we're seeing it's causing a significant speed hit for sorting
> because of swap routines.
> There may be some thorny issues regarding inlining with refs in the
> general case but with code like this:
>     float lengthSqr(const ref vec3 v) { return v.x*v.x + v.y*v.y + v.z*v.z;
>  }
>  it really should be trivial for the compiler to figure out that
> direct substitution is possible in a simple case like:
>    vec3 w;
>    ...
>    auto len2 = lengthSqr(w);
> Maybe I'm missing something, but that looks pretty darn straightforward.
> --bb

On second thought, maybe this should be a high priority issue, for two reasons.

1.  Some uber-hardcore performance freaks will not even consider D if it has the
slightest bit of performance overhead compared to C++.

2.  DMD apparently already inlines certain functions with pointers, so why not
references?  Aren't references just syntactic sugar for pointers?  If DMD's
inliner can already handle pointers, I would think (I could be wrong) that it
would be able to trivially handle references.

Here's some test code:

import std.stdio;

// Shouldn't this generate *exactly* the same code as ptrSwap?
void swap(T)(ref T a, ref T b) {
    T temp = a;
    a = b;
    b = temp;
}

void ptrSwap(T)(T* a, T* b) {
    T temp = *a;
    *a = *b;
    *b = temp;
}

void main() {
    uint a, b;
    swap(a, b);
    ptrSwap(&a, &b);
    writeln(a); // Keep DMD from optimizing out ptrSwap entirely.
}

Here's the disassembly of the relevant portion:

  COMDEF __Dmain
        push    eax                                     ; 0000 _ 50
        push    eax                                     ; 0001 _ 50
        xor     eax, eax                                ; 0002 _ 31. C0
        push    ebx                                     ; 0004 _ 53
        lea     ecx, [esp+4H]                           ; 0005 _ 8D. 4C 24, 04
        mov     dword ptr [esp+4H], eax                 ; 0009 _ 89. 44 24, 04
        mov     dword ptr [esp+8H], eax                 ; 000D _ 89. 44 24, 08
        push    ecx                                     ; 0011 _ 51
        lea     eax, [esp+0CH]                          ; 0012 _ 8D. 44 24, 0C
        call    _D5test711__T4swapTkZ4swapFKkKkZv       ; 0016 _ E8, 00000000(rel)
        mov     edx, dword ptr [esp+4H]                 ; 001B _ 8B. 54 24, 04
        mov     ebx, dword ptr [esp+8H]                 ; 001F _ 8B. 5C 24, 08
        mov     dword ptr [esp+4H], ebx                 ; 0023 _ 89. 5C 24, 04
        mov     eax, offset FLAT:_main                  ; 0027 _ B8, 00000000(segrel)
        mov     dword ptr [esp+8H], edx                 ; 002C _ 89. 54 24, 08
        push    ebx                                     ; 0030 _ 53
        push    10                                      ; 0031 _ 6A, 0A
        call    _D3std5stdio4File14__T5writeTkTaZ5writeMFkaZv; 0033 _ E8,
00000000(rel)
        xor     eax, eax                                ; 0038 _ 31. C0
        pop     ebx                                     ; 003A _ 5B
        add     esp, 8                                  ; 003B _ 83. C4, 08
        ret                                             ; 003E _ C3
__Dmain ENDP

This confirms that DMD inlines ptrSwap, but not swap.  I also did some benchmarks,
and ptrSwap is as fast as manual inlining, but swap is slower by a factor of 2.