Yet another strike against the current AA implementation
Robert Fraser
fraserofthenight at gmail.com
Sun Apr 26 22:56:28 PDT 2009
dsimcha wrote:
> == Quote from Christopher Wright (dhasenan at gmail.com)'s article
>> Andrei Alexandrescu wrote:
>>> 2. I haven't measured, but the cost of the indirect call is large enough
>>> to make me suspect that opApply is not as efficient as it's cracked to
>>> be, even when compared with an iterator.
>> When you know the type beforehand or can use templates, that is, rather
>> than wrapping your range struct in a wrapper class. If you can't use a
>> template for whatever reason, ranges are going to suck -- three virtual
>> calls rather than one.
>> I don't usually care sufficiently about performance to worry about
>> whether a call is virtual or not, but you brought that issue up before.
>> And I imagine that, most of the time, you will know the range type in
>> advance.
>
> Rule number 1 of all performance discussions: Always measure if possible.
>
> import std.stdio, std.perf;
>
> struct Direct {
> uint n;
>
> uint front() {
> return n;
> }
>
> void popFront() {
> n++;
> }
>
> bool empty() {
> return n >= 1_000_000_000;
> }
> }
>
> struct Apply {
>
> int opApply(int delegate(ref uint) dg) {
> int res = 0;
> foreach(i; 0U..1_000_000_000) {
> res = dg(i);
> if(res) break;
> }
> return res;
> }
> }
>
> class Virtual {
> uint n;
>
> uint front() {
> return n;
> }
>
> void popFront() {
> n++;
> }
>
> bool empty() {
> return n >= 1_000_000_000;
> }
> }
>
> void main() {
> scope pc = new PerformanceCounter;
> pc.start;
> foreach(elem; Direct.init) {}
> pc.stop;
> writeln("Direct: ", pc.milliseconds);
> pc.start;
> auto v = new Virtual;
> foreach(elem; v) {}
> pc.stop;
> writeln("Virtual: ", pc.milliseconds);
> pc.start;
> foreach(elem; Apply.init) {}
> pc.stop;
> writeln("opApply: ", pc.milliseconds);
> }
>
> Output:
> Direct: 2343
> Virtual: 5695
> opApply: 3014
>
> Bottom line is that these pretty much come in the order you would expect them to,
> but there are no particularly drastic differences between any of them. To put
> these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
> 2.7 GHz machine) 15 clock cycles per iteration. How often does anyone really have
> code that is performance critical *and* where the contents of the loop don't take
> long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
> need the iteration to be polymorphic?
Did you compile that with optimization/inlining? If not, it's not really
a fair test... and if so why isn't the compiler getting rid of those
pointless loops?
More information about the Digitalmars-d
mailing list