[OT] Stackless fibers/coroutines

Sun Sep 25 11:16:03 PDT 2016

On 09/25/2016 06:39 AM, Jacob Carlborg wrote:
>
> I find it quite difficult to find correct definitions with many of these
> terms like: coroutine, fiber, stackless thread, resumeable function.
> Because many languages use the same name of different things and they
> use different names for the same things.
>

FWIW, and this is at least the way I see it:
(BTW, any domain experts please chime in if I'm wrong. I'm thinking it 
may be worth reposting this as a blog entry, as long as I'm not too far 
off-base.)

Fiber:
------
Like a thread, but lighter-weight. Multiple fibers can run within one 
thread.

Like a thread, a fiber has it's own stack, and upon suspend/resume 
various state is saved/restored (I assume that "state" means "CPU 
registers").

Unlike a thread, a fiber's multitasking is sequential and cooperative, 
NOT simultaneous and preemptive: Ie, One fiber must willfully relinquish 
control (by calling a "yield" function) before another fiber in the same 
thread can resume. I'm assuming this sequential/cooperative what allows 
it to be lighter-weight than a true thread, but I'm not 100% certain.

Coroutine:
----------
A function that can "yield" (ie, it can choose to temporarily exit so 
that it can be resumed from the same place later on). Often a return 
value is emitted upon yielding, so these are usually used as generators, 
because they're a convenient way to generate a series of values in an 
easy procedural style.

Coroutines are commonly implemented as fibers, so it is often said that 
fibers and coroutines are the same thing. That's partially accurate, 
except for two things:

1. "Coroutine" refers to a specific function, whereas "fiber" refers to 
the overall sequence of execution.

2. This is probably a debatable matter of semantics, but coroutines 
don't have to be implemented using fibers. For example, opApply is 
essentially a coroutine (it yields by calling the delegate that was 
passed in).

Stackless Thread/Fiber:
-----------------------
Not a true thread or fiber. It's called a "stackless thread" or 
"stackless fiber" simply because it's *like* a thread or fiber (closer 
to a fiber, really), but without a separate stack for each, 
umm..."fiber". Protothreads is an example of a stackless thread/fiber.

A stackless thread/fiber is much more lightweight than even a fiber. But 
it comes with a limitation:

With a true fiber, any function in the callstack can yield. When it 
yields, it suspends the fiber's entire callstack. So inside a fiber, 
your coroutine can call a function and have *that* function yield your 
fiber. That feature requires a separate callstack for each fiber, so you 
cannot do that with the stackless versions.

Instead, in a stackless thread/fiber, yielding will ONLY suspend the 
current function, not a full callstack (since it doesnt have a callstack 
of its own to suspend).

In other words, yielding a stackless thread/fiber works much like a 
normal "return" statement:

-------------------
void foo() {
     bar();
}
void bar() {
     return; // ONLY return from bar

     return from BOTH bar AND foo; // IMPOSSIBLE!!!
}
-------------------

A stackless thread/fiber works the same (psuedocode):

-------------------
void coroutineFoo() {
     coroutineBar();
}
void coroutineBar() {

     // Stackless fiber: ONLY suspends coroutineBar.
     // Regular fiber: Suspends ENTIRE callstack.
     yield;

     yield from BOTH bar AND foo;// IMPOSSIBLE with stackless

     // More code here....
}
-------------------

Luckily, you can work around it, just like you can with normal returning 
(psuedocode):

-------------------
void coroutineFoo() {
     bool shouldYield = coroutineBar();
     if(shouldYield)
         yield;

     // More code here....
}
// Yielded value: Should the caller yield?
bool coroutineBar() {
     yield true; // Yield both bar and foo
     // More code here....
}
-------------------

I'd argue opApply's approach technically qualifies as a stackless 
thread/fiber, since it yields (by calling the delegate it received) and 
does not involve any extra callstack. Plus, like any other stackless 
thread/fiber, it cannot suspend any further up the callstack than just 
itself.

However, from what I've seen, a stackless thread/fiber is typically 
implemented the way Protothreads works:

When the "stackless coroutine" function is called, the first thing done 
inside the function is jump to wherever the function last left off.

Then, every "yield" is constructed as an *actual* "return" followed by a 
jump label. When a function yields, it returns three things:

1. An optional value being yielded (like a normal return value).

2. Which jump label to jump to next time the function is resumed.

3. Any local variables that need to be preserved upon resume (or just 
all local variables).

With a really nice stackless thread/fiber system, like C#'s, this is all 
handled behind-the-scenes.

Stackless Coroutine:
--------------------
A coroutine that uses a stackless thread/fiber instead of a normal fiber.

Resumable Function:
-------------------
This sounds like a vague general term to me. I would think it casually 
refers to any function that can be suspended/resumed: A Fiber-based 
coroutine, a stackless coroutine, or whatever else may or may not exist.

> Ah, right. Previously when I looked at this I only saw examples which
> stored the jump label. That might be possible to automatically rewrite
> in a AST macro.
>
> Would it be possible to store the state in a TLS variable inside the
> function to avoid having to pass in the state?
>

I don't see why not, but there would be one notable drawback: You could 
only have one instance existing at a time (per thread).

Ex: If you pass the state through params and return values instead of a 
static function variable, you can do this:

----------------------
// A manual stackless coroutine in D
// Would be much simpler with built-in support

struct YieldInfo {
     State state;
     int yieldedValue;
}

struct State {
     int jumpLabel = 1;
     // Any additional locals here
}

// Generate three multiples of 'multiplier', repeating forever
YieldInfo generateMultiples(
     State state, int multiplier
) {
     switch(state.jumpLabel) {
     case 1:
         // Yield first multiple
         return YieldInfo(State(2), 1 * multiplier);

     case 2:
         // Yield second multiple
         return YieldInfo(State(3), 2 * multiplier);

     case 3:
         // Yield third multiple
         return YieldInfo(State(1), 3 * multiplier);
     }
}

void main()
{
     // One instance of generateMultiples
     YieldInfo a;

     a = generateMultiples(a, 2); assert(a.yieldedValue == 2);
     a = generateMultiples(a, 2); assert(a.yieldedValue == 4);
     a = generateMultiples(a, 2); assert(a.yieldedValue == 6);
     a = generateMultiples(a, 2); assert(a.yieldedValue == 2);

     // Do another instance at the same time:
     YieldInfo b;

     b = generateMultiples(b, 5); assert(b.yieldedValue == 5);
     a = generateMultiples(a, 2); assert(a.yieldedValue == 4);
     b = generateMultiples(b, 5); assert(b.yieldedValue == 10);
     a = generateMultiples(a, 2); assert(a.yieldedValue == 6);
     b = generateMultiples(b, 5); assert(b.yieldedValue == 15);
}
----------------------

Actual stackless coroutine support would make that much simpler of course:

----------------------
CoroutineInputRange!int generateMultiples(int multiple) {
     while(true) {
         foreach(i; 1...4)
             yield i * multiple;

     }
}
void main()
{
     auto a = generateMultiples(2);

     assert(a.front == 2);
     a.popfront(); assert(a.front == 4);
     a.popfront(); assert(a.front == 6);
     a.popfront(); assert(a.front == 2);

     auto b = generateMultiples(5);

     assert(b.front == 5);
     a.popfront(); assert(a.front == 4);
     b.popfront(); assert(b.front == 10);
     a.popfront(); assert(a.front == 6);
     b.popfront(); assert(b.front == 15);
}
----------------------

Naturally, if the state was stored as the function's static locals, you 
woudn't be able to do that. Instance "b" would clobber instance "a".

>> 2. Compared to C, D has much stricter rules about where a switch's
>> "case" labels can do. Ie, they can't really cross scope boundaries. So
>> *all* the coroutine's "yield" statements would have to be within the
>> same scope *and* at the same nesting-level (or at the very least,
>> there'd be some big annoying restrictions). Otherwise, when it gets
>> converted to a "case:", the resulting code would be invalid.
>
> Do you have any examples?
>

I'd have to look it up and refresh my memory. I just remember D has more 
restrictions regarding jumping to labels than C has. Something like:

------------------
State myCoroutine(State s) {
     yield "hello";

     if(foo) {
         yield " world";
     }

     yield "!";
}
------------------

A Protothreads-like preprocessing approach that used switch/case would 
turn that into:

------------------
State myCoroutine(State s) {
switch(s.jumpLabel){
     case 0:
     // Start converted code: //////////////

     //yield "hello"; -> return+case:
     return State(1, "hello");
     case 1:

     if(int foo = getFoo()) {
         //yield " world"; ->
         return State(2, " world"); return+case:
         case 2:
     }

     //yield "!"; -> return+case:
     return State(3, "!");
     case 3:

     // End converted code //////////////
     break;
}
}
------------------

That "if" really messed things up. AIUI, I don't think D will allow 
things like that.

C doesn't have so much of a problem with any of this, it famously just 
permits pretty much anything, which is why Protothreads works there.