Yet another strike against the current AA implementation

Sun Apr 26 22:56:28 PDT 2009

dsimcha wrote:
> == Quote from Christopher Wright (dhasenan at gmail.com)'s article
>> Andrei Alexandrescu wrote:
>>> 2. I haven't measured, but the cost of the indirect call is large enough
>>> to make me suspect that opApply is not as efficient as it's cracked to
>>> be, even when compared with an iterator.
>> When you know the type beforehand or can use templates, that is, rather
>> than wrapping your range struct in a wrapper class. If you can't use a
>> template for whatever reason, ranges are going to suck -- three virtual
>> calls rather than one.
>> I don't usually care sufficiently about performance to worry about
>> whether a call is virtual or not, but you brought that issue up before.
>> And I imagine that, most of the time, you will know the range type in
>> advance.
> 
> Rule number 1 of all performance discussions:  Always measure if possible.
> 
> import std.stdio, std.perf;
> 
> struct Direct {
>     uint n;
> 
>     uint front() {
>         return n;
>     }
> 
>     void popFront() {
>         n++;
>     }
> 
>     bool empty() {
>         return n >= 1_000_000_000;
>     }
> }
> 
> struct Apply {
> 
>     int opApply(int delegate(ref uint) dg) {
>         int res = 0;
>         foreach(i; 0U..1_000_000_000) {
>             res = dg(i);
>             if(res) break;
>         }
>         return res;
>     }
> }
> 
> class Virtual {
>     uint n;
> 
>     uint front() {
>         return n;
>     }
> 
>     void popFront() {
>         n++;
>     }
> 
>     bool empty() {
>         return n >= 1_000_000_000;
>     }
> }
> 
> void main() {
>     scope pc = new PerformanceCounter;
>     pc.start;
>     foreach(elem; Direct.init) {}
>     pc.stop;
>     writeln("Direct:  ", pc.milliseconds);
>     pc.start;
>     auto v = new Virtual;
>     foreach(elem; v) {}
>     pc.stop;
>     writeln("Virtual:  ", pc.milliseconds);
>     pc.start;
>     foreach(elem; Apply.init) {}
>     pc.stop;
>     writeln("opApply:  ", pc.milliseconds);
> }
> 
> Output:
> Direct:  2343
> Virtual:  5695
> opApply:  3014
> 
> Bottom line is that these pretty much come in the order you would expect them to,
> but there are no particularly drastic differences between any of them.  To put
> these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
> 2.7 GHz machine) 15 clock cycles per iteration.  How often does anyone really have
> code that is performance critical *and* where the contents of the loop don't take
> long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
> need the iteration to be polymorphic?

Did you compile that with optimization/inlining? If not, it's not really 
a fair test... and if so why isn't the compiler getting rid of those 
pointless loops?