Performance of method calls

Wed Nov 29 23:35:58 PST 2006

Bill Baxter wrote:
> Daniel Keep wrote:
> 
>> Hi.
>>
>> I'm currently working on a research project as part of my Honours, and 
>> one of the requirements is speed--the code I'm writing has to be very 
>> efficient.
>>
>> Before I started, my supervisor warned me about object-oriented 
>> programming and that it seems to be much slower than just flat 
>> function calls.
>>
>> Anyway, I was wondering what the difference between three kinds of 
>> function calls would be:
>>
>> 1. Foo x; x.call(); // Where Foo is a struct
>> 2. Foo_call(x); // C-style API
>> 3. auto y = new FooClass; y.call(); // Where call is final
>>
>> I hooked up a test app which used a loop with 100,000 iterations for 
>> each call type, and ran that program 100 times, and averaged the outputs.
>>
>> #1 was 2.84 times slower than #2, and #3 was 3.15 times slower than 
>> #2.  Are those numbers right??  Is it really that much slower?  I 
>> would have thought that they should have been about the same since 
>> each one needs to pass only one thing: a pointer.  I've attached the 
>> test programs I used; if anyone can offer any insight or even 
>> corrections, I'd be very grateful.
>>
>> Incidentally, any other insights regarding performance differences 
>> between OO-style and flat procedural-style would be very welcome.
>>
>>     -- Daniel
>>
>>
>> ------------------------------------------------------------------------
>>
>> #!/bin/bash
>>
>> for I in {1..100}; do
>>     ./struct_calls
>> done | awk '
>>     BEGIN {sum1 = 0; sum2 = 0; count = 0;}
>>     {sum1 += $5; sum2 += $11; count += 1;
>>      print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11;}
>>     END {print "Averages: " sum1/count ", " sum2/count}
>> ' | tee struct_calls.log
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> module struct_calls;
>>
>> import std.perf;
>> import std.stdio;
>>
>> const COUNT = 100_000u;
>>
>> struct Foo
>> {
>>     uint dummy;
>>
>>     void call()
>>     {
>>         //this.dummy = arg;
>>     }
>> }
>>
>> class FooClass
>> {
>>     uint dummy;
>>
>>     final void call()
>>     {
>>         //this.dummy = arg;
>>     }
>> }
>>
>> void Foo_call(Foo* _this)
>> {
>>     //_this.dummy = arg;
>> }
>>
>> void main()
>> {
>>     Foo x;
>>
>>     scope perf = new HighPerformanceCounter();
>>
>>     perf.start();
>>     for( uint i=0; i<COUNT; i++ )
>>         x.call();
>>     perf.stop();
>>
>>     // count1: x.call()
>>     auto count1 = perf.periodCount();
>>
>>     perf.start();
>>     for( uint i=0; i<COUNT; i++ )
>>         Foo_call(&x);
>>     perf.stop();
>>
>>     // count2: Foo_call(&x)
>>     auto count2 = perf.periodCount();
>>
>>     scope y = new FooClass();
>>
>>     perf.start();
>>     for( uint i=0; i<COUNT; i++ )
>>         y.call();
>>     perf.stop();
>>
>>     // count3: y.call()
>>     auto count3 = perf.periodCount();
>>
>>     writefln("%d / %d = %f -- %d / %d = %f",
>>             count1, count2, cast(real)count1 / cast(real)count2,
>>             count3, count2, cast(real)count3 / cast(real)count2);
>> }
>>
> 
> 
> A call to a final method shouldn't be any slower than straight funciton 
> call.  But I have heard grumblings that final in D doesn't actually work 
> as spec'ed.
> 
> But anyway the main point of this reply is that even if method call is 
> 3x slower than function call, it really means nothing if function call 
> overhead is 0.01% of your run time to begin with.  There's a popular 
> thing to teach in computer architecture classes called "Amdahl's Law", 
> and basically to paraphrase it says don't fixate on optimizing things 
> that represent trivial fractions of the overall runtime to begin with. 
> For example if method calls represents 0.03% of your overall runtime the 
> BEST speedup you could achieve by eliminating all calls entirely would 
> be 0.03%.  Pretty trivial.  It's like trying to figure out how to 
> improve driving times from home to work, and focusing on increasing your 
> speed in your driveway.  Even if you achieve a 100x speedup in how long 
> you spend driving on your driveway, you've still only sped up the hour 
> long drive by a second or two.  Not worth the effort.
> 
> In other words, for all but the most rare of cases, the benifits reaped 
> in terms of code reability, maintainablity, and speed of development 
> when using virtual functions vastly outweighs the tiny cost.
> 
> What you should do is put something representative of the work you 
> actually plan to do inside those functions in your test program and see 
> if the 3x difference in call speed actually makes any significant 
> difference.
> 
> --bb

I'm well aware that the time it takes to call into the function and pass 
back out may be a very small amount of the overall time, but when 
there's a 2-3x difference between supposedly identical code, that puts 
up red warning flags in my head.  If this is so much slower, what other 
things are slower?

Also, normally I would agree that the benefits of "elegant" code 
outweigh the performance drop.  But not in this case.  I'm not sure if I 
can go into specifics, but the system will potentially be simulating 
millions of individual agents across a networked cluster, with the 
eventual aim to become faster than realtime.

As for putting code representative of what it'll be doing... I can't at 
this stage since I'm still building the utility libraries.  That 
representative code doesn't exist yet.

I don't want to get bogged down in premature optimisation, but I also 
want to eliminate any potential bottlenecks early if it isn't too much 
effort.

In any case, I just want to understand where the difference is coming 
from.  If I understand the difference, I can hopefully make better 
decisions.  My supervisor originally wanted the code written in C, and I 
want to prove that D is more than up to the task.

	-- Daniel

P.S.  To give you an idea of how badly I want to use D: I ported CWEB 
over to support D :)