Internal and external iteration, fibers

Mon Jan 21 14:54:13 PST 2013

On Mon, 21 Jan 2013 21:15:48 +0100
"Rob T" <alanb at ucora.com> wrote:

> On Monday, 21 January 2013 at 08:44:10 UTC, Nick Sabalausky wrote:
> >> I ask, because if except for the overhead, fibers are a good 
> >> general solution, then it makes sense to determine if anything 
> >> can be done to lessen the overhead before trying to implement 
> >> yet another solution.
> >> 
> >
> > Yea, they *would* be a good general solution, and I'm sure the 
> > overhead
> > can be decreased, but AIUI I don't think it's even theoretically
> > possible for them to optimized enough to compete with any of 
> > the other
> > approaches as a general-purpose thing.
> >
> > The other stuff can be optimized down as far as being little 
> > more than
> > an increment per iteration. But AIUI, using a real fiber would 
> > require
> > two context-switches per iteration, so I'm not sure that can 
> > ever
> > compete.
> 
> I have not yet had time to actually use fibers for anything real, 
> but I did play around with them and the usage seemed to be 
> exactly what we're looking for to implement co-routines.

Their usage, yes, and that does make it a very enticing prospect. And
they are indeed fantastic for many things. But as soon as you start
using them to implement something like:

coroutine int myCoroutine()
{
    yield 10;
    yield 20;

    if(blah)
        yield 30;

    foreach(i; arr)
        yield i;

    //etc
}

In other words, exactly the sorts of use-cases that coroutines are
perfectly suited for. Then you have to consider that your coroutine
may get used for anything from:

foreach(i; myCoroutine)
{
    string result = curl("http://example.com/"~to!string(i));
    writeFile(to!string(i)~".txt", result);
}

to things like:

int sum = myCoroutine.reduce!"a+b"();

The first case, with the I/O, is perfectly fine even if the coroutine
operates using fibers. But the second example, as sleek and deceptively
appealing as it looks, would have very poor performance because it's
doing a ton of context switching even though all it *really* needs to
do is grab a bunch of ints and add them.

OTOH, *both* examples would be perfectly fine if the coroutine were
trivially rewritten behind-the-scenes into a stackless event-loop that
*didn't* use fibers:

struct myCoroutine_localVars
{
    int state=0; // Simulate the instruction pointer
    int i; // The foreach loop variable
}
enum IsDone {Yes, No}
IsDone myCoroutine(ref myCoroutine_localVars context, ref int
yieldValue)
{
    switch(context.state)
    {
    case 0:
        context.state = 1;
        yieldValue = 10;
        return IsDone.No; // yield 10

    case 1:
        context.state = 2;
        yieldValue = 20;
        return IsDone.No; // yield 20

    case 2:
        if(blah)
        {
            context.state = 3;
            yieldValue = 30;
            return IsDone.No; // yield 30;
        }
        context.state = 3;
        goto case 3:

    case 3:
        context.i = 0; // Start for loop
        context.state = 4;
        goto case 4:

    case 4:
        if(context.i < arr.length)
        {
            context.i++;
            yieldValue = arr[context.i - 1];
            return IsDone.No; // yield i
        }
        context.state = 5;
        goto case 5:

    case 5:
        return IsDone.Yes;
    }
}

// foreach(val; myCoroutine) {...stuff...}
//   becomes -->
myCoroutine_localVars context;
int yieldValue;
while(myCoroutine(context, yieldValue) == IsDone.No)
{
    int val = yieldValue;
    ...stuff...
}

Note that looks similar to an equivalent hand-written input
range. So you'd have the performance of an input range in *all*
situations, while still having the convenience of coroutine syntax. (My
understanding is that this is how C#'s coroutines work. And I *know*
that's how C/C++'s protothreads work.) Note that transformation can
be easily automated since C/C++'s protothreads does exactly that
using nothing more than some very simple preprocessor macros.

Real fibers are enticing as an easy way to implement coroutines, but
once implemented, the only thing fibers would really accomplish here is
to slow things down for certain use-cases.

Well, also, real fibers would allow yielding from a function called
by myCoroutine. But even without fibers there are ways to get around
that limitation (as demonstrated by C/C++'s protothreads library).

> Since 
> you have experience using them for real, do you think they are 
> useful in general assuming that the performance can be made par 
> with the stackless select switch method?
> 

If they could, then yes, but I don't believe that's a realistic
possibility. (Others can correct me if I'm wrong, but I'm fairly
sure.)

> By chance, do you know if the current implementation is a 
> built-in language feature or if it's implemented entirely as a 
> library?
> 

In short, I don't know. Others could answer better.

In long:

I *think* they're implemented in druntime, or at least their core
functionality anyway. I don't know whether or not the druntime
implementation relies on any special compiler support. Others can
probably answer that.

I do know that D's fibers don't use the OS-provided fibers (at
least on Windows) since, at least on Windows, the OS fibers are
generally considered to not be very fast or good.