Hello D programmers.

I'm implementing a simulation in D that will likely make sustained and intense use of method calls.  Naturally I wanted to determine the overhead for various types of call invocations.  Throwing all advice against "premature optimization" aside I wrote a short program to test this.

One of the primary questions I wanted answered was "Would marking methods with the 'final' attribute help the compiler optimize those calls into something very close to a simple function call?"

Here are the class definitions to test the idea:

// ***********************
static void fnmeth() {

class A
	void Ameth() {

class B : A
	final void Bmeth() {

final class C : A
	void Cmeth() {

// *****************

The first definition is a simple static function call.  I'll use this as a baseline.
The second one is a simple virtual method.
The third is a method explicitly marked as 'final'.
The fourth is a class marked as 'final' which would imply that all methods are final.

 The j++ bump is simply something to populate the method and possibly keep it from being optimized into non-existence early on.

I ran each of these calls 4 million times to look at their respective performance.  My program uses the tango StopWatch object to handle the performance timing.  Unfortunately I'm running under VMWare and as I found out later the timers are screwed up when running under a VM.

The upside of all this is that I was forced to disassemble the executable to determined what sort of optimizations gdc performed.  This is actually a good thing since it tells you more about the code generator than you would otherwise be able to see.

Here are the results:

draco% gdc --version 
gdc (GCC) 4.2.3 20080225 (prerelease gdc 0.25 20071215, using dmd 1.022) (Ubuntu 0.25-4.2.3-2ubuntu2)

(no optimization)

Call Type                   Call Overhead  (x86 instructions)        Internal Method Overhead (x86 ins)
------------                    ---------------------------------------------         ------------------------------------------------
static function call            1 (direct call)                                         4 (push,mov,pop,ret)
virtual method call           7 (indirect call)                                      8 (push,mov,sub,mov,mov,call,leave,ret)
final method call               3 (setup & direct call)                         8 (same as above)
final class                           7 (indirect call)                                     8 (same as above)

(optimization level 1)

Call Type                   Call Overhead  (x86 instructions)        Internal Method Overhead (x86 ins)
------------                    ---------------------------------------------         ------------------------------------------------
static function call            1 (direct call)                                         4 (push,mov,pop,ret)
virtual method call           3 (indirect call)                                      8 (push,mov,sub,mov,mov,call,leave,ret)
final method call               3 (setup & direct call)                         8 (same as above)
final class                           3 (indirect call)                                     8 (same as above)

(optimization level 2 yielded no improvements in call setup or overhead, but inlined the function call)

(optimization level 3 yielded no significant improvements over that)

(Different instructions will of course have different number of cycles, dispatching etc so the function call count is only a general guide).

Interesting conclusion

(1) At -O1, both final and fully virtual methods have the same call overhead and the same internal method overhead.  The only difference is the final method is a direct call.

(2) With no optimization, the final method call setup is 3 instructions while the final class call setup is 7 instructions.  At -O1 the compiler figures out that they're the same.  

(2) Internal method overhead for any class method invocation looks to be around 2.5x the call setup overhead at -O1.  This is something I hadn't considered.

Lesson learned:

If you're looking towards efficiency for frequently invoked methods, consider other ways of getting to the data.  In other words avoid over using get()/set() type calls and don't force everything into extreme levels of encapsulation just to be "politically correct".  You will pay the price.

BTW: Does anyone know why gdc is doing this " mov    %esi,(%esp)".  Thats two instructions before a direct call.  Why would you want to store the contents of the string index register into the top of the stack?  I'm not doing any string or byte functions.



