Combine Coroutines and Input Ranges for Dead-Simple D Iteration

Wed May 2 03:30:15 PDT 2012

> It may be slow relative to incrementing a integer:

The opApply isn't just incrementing an integer - it's
calling a function through a pointer. A loop that just
increments an integer is an order of magnitude faster.
This code (I used assembly because a compiler would
optimize away such a simple loop) runs in 0.27s on my
machine:

     auto sum = 0;
     auto n = 1000_000_000;

     asm
     {
         mov EAX, n;
         mov EBX, sum;
loop:
         dec EAX;
         inc EBX;
         test EAX, EAX;
         jne loop;
         mov sum, EBX;
     }

Ranges like iota are often as fast as using a for loop.
For example this code:

     auto sum = 0;
     foreach(i; iota(to!int(args[1])))
         sum += i;

runs in 0.52 seconds when compiled with gdc with flags
-O2 -finline-functions -frelease. When compiled with -O3,
gcc uses paddd instruction and it runs in 0.1s.

> And that is for 1000_000_000 Fiber context switches.

I'm not saying that D fibers are slow - fiber context
switches are way faster than thread context switches.
When using them for IO, such as in vibe.d, overhead
of fibers is negligible. But when used for iteration,
they are way slower than the alternatives, because in
that case there shouldn't be any context switches at all.