Inlining Ref Functions
dsimcha
dsimcha at yahoo.com
Fri May 15 19:00:29 PDT 2009
== Quote from Bill Baxter (wbaxter at gmail.com)'s article
> Well it was shown before in a demo ray-tracer that the inability to
> inline funcs with refs caused a significant speed hit under DMD. And
> now we're seeing it's causing a significant speed hit for sorting
> because of swap routines.
> There may be some thorny issues regarding inlining with refs in the
> general case but with code like this:
> float lengthSqr(const ref vec3 v) { return v.x*v.x + v.y*v.y + v.z*v.z;
> }
> it really should be trivial for the compiler to figure out that
> direct substitution is possible in a simple case like:
> vec3 w;
> ...
> auto len2 = lengthSqr(w);
> Maybe I'm missing something, but that looks pretty darn straightforward.
> --bb
On second thought, maybe this should be a high priority issue, for two reasons.
1. Some uber-hardcore performance freaks will not even consider D if it has the
slightest bit of performance overhead compared to C++.
2. DMD apparently already inlines certain functions with pointers, so why not
references? Aren't references just syntactic sugar for pointers? If DMD's
inliner can already handle pointers, I would think (I could be wrong) that it
would be able to trivially handle references.
Here's some test code:
import std.stdio;
// Shouldn't this generate *exactly* the same code as ptrSwap?
void swap(T)(ref T a, ref T b) {
T temp = a;
a = b;
b = temp;
}
void ptrSwap(T)(T* a, T* b) {
T temp = *a;
*a = *b;
*b = temp;
}
void main() {
uint a, b;
swap(a, b);
ptrSwap(&a, &b);
writeln(a); // Keep DMD from optimizing out ptrSwap entirely.
}
Here's the disassembly of the relevant portion:
COMDEF __Dmain
push eax ; 0000 _ 50
push eax ; 0001 _ 50
xor eax, eax ; 0002 _ 31. C0
push ebx ; 0004 _ 53
lea ecx, [esp+4H] ; 0005 _ 8D. 4C 24, 04
mov dword ptr [esp+4H], eax ; 0009 _ 89. 44 24, 04
mov dword ptr [esp+8H], eax ; 000D _ 89. 44 24, 08
push ecx ; 0011 _ 51
lea eax, [esp+0CH] ; 0012 _ 8D. 44 24, 0C
call _D5test711__T4swapTkZ4swapFKkKkZv ; 0016 _ E8, 00000000(rel)
mov edx, dword ptr [esp+4H] ; 001B _ 8B. 54 24, 04
mov ebx, dword ptr [esp+8H] ; 001F _ 8B. 5C 24, 08
mov dword ptr [esp+4H], ebx ; 0023 _ 89. 5C 24, 04
mov eax, offset FLAT:_main ; 0027 _ B8, 00000000(segrel)
mov dword ptr [esp+8H], edx ; 002C _ 89. 54 24, 08
push ebx ; 0030 _ 53
push 10 ; 0031 _ 6A, 0A
call _D3std5stdio4File14__T5writeTkTaZ5writeMFkaZv; 0033 _ E8,
00000000(rel)
xor eax, eax ; 0038 _ 31. C0
pop ebx ; 003A _ 5B
add esp, 8 ; 003B _ 83. C4, 08
ret ; 003E _ C3
__Dmain ENDP
This confirms that DMD inlines ptrSwap, but not swap. I also did some benchmarks,
and ptrSwap is as fast as manual inlining, but swap is slower by a factor of 2.
More information about the Digitalmars-d
mailing list