Purely stack-based generators

Thu Mar 18 04:34:42 PDT 2010

I really like that simplified, "yield"-based syntax. However, I believe 
that it conflicts fundamentally with D's confusing concept of generators.

Most other languages that offer generators (e.g. Python) view the 
generator as an object that spits out values one-by-one. This is very 
similar to a co-routine.

Purely for performance reasons, D approaches the issue in a backward 
way, passing the loop body as a delegate. Indeed, this may be faster 
than full-fledged co-routines, but it is rather difficult to map a 
simplified coroutine-like syntax based on 'yield' statements to this 
very different concept by mere "syntactic sugar". Furthermore, the D 
concept scales awkwardly to combining multiple generators in one loop or 
using nested generators.

In fact, there is a simple but powerful way to design purely stack-based 
generators, sticking to the straightforward concept of co-routines 
without the overhead of full co-routines and without the need for 
delegates (inspired by the Sather language):

Make a generator a co-routine but restrict its lifetime to the inside of 
a loop. When the loop starts, it calls all the contained generators to 
initialize their local data on the regular stack. When the generators 
yield, the stack-pointer is *not* reset (as it would be at a return) but 
kept at its value. The generator routines can therefore be called again 
within the same loop and find their local data still on the stack. When 
the loop finishes, the stack pointer is reset to its initial value and 
all the local data of all the contained generators is immediately released.

This design does not interfere with the regular use of the stack for 
regular function calls within the loop. Nested loops are possible, as 
are nested generators or loops that combine values from multiple 
generators. All with a straightforward syntax and all based on pure 
stack operations that are no more costly than regular function calls and 
can be inlined just as easily.

This concept even allows to place calls to generators at arbitrary 
positions inside a loop, rather than at the beginning only. The regular 
D loop:
     foreach(auto x;mygenerator) {
         do_something(x);
     }
could be written as something like
     loop {
         do_something(#mygenerator);
     }
where I have blindly introduced a "#" operator to signify a generator 
call that may either yield a value or break the loop. (I don't care 
about the exact syntax, but it should stick out somehow.

This can be generalised as
     loop {
         do_something();
         do_something_else(#mygenerator);
         yet_something_else(#anothergenerator);
     };
where each expression containing a # generator call may either yield a 
value or break the loop.

The clear restriction of this is that "native" generators of this design 
are not stand-alone objects, cannot survive outside a loop and work only 
within a single thread. More general generators, however, can easily be 
implemented as objects that offer a "native" generator as interface.

The only difficulty of this approach is that it has to be done at the 
assembler level because a unusual calling convention is needed. Typical 
high-level languages do not allow an emulation of this mechanism. I do 
not know about byte-code representations.

This whole mechanism is inspired by the Sather languages that first 
would have allowed it (even though that compiler never actually 
implemented it this way). To my knowledge, all other languages that 
offer generators did break this concept by allowing generators to exist 
beyond the lifetime of a loop and therefore depend on heap-allocated data.

I don't expect that D would ever adopt such a breaking change, even 
though it would be the most efficient implementation of generators while 
retaining the conceptually simple definition of co-routines. Still, I 
find the concept just so beautiful that I have to talk about it once in 
a while...

Greetings,
Norbert

bearophile wrote:
> D currently has two ways to define a generator, the first is the older one with opApply (And opApplyReverse):
> int opApply(int delegate(ref Type [, ...]) dg);
> 
> 
> The syntax of opApply is terrible: hard to remember, bug-prone, noisy, intrusive. And this syntax doesn't even gain performance, because dmd is not very good at inlining here.
> Several times in the past I have said that a very good syntax for this is the Python one:
> 
> def foo():
>   yield 5
> 
> It's really easy to use and remember, not bug prone. This possible syntax is about 75 times better than the current opApply:
> 
> yield(int) foo() {
>   yield 5;
> }
> 
> 
> That's sugar for something like:
> 
> struct foo {
>     int opApply(int delegate(ref int) dg) {
>         int result;
>         int r = 5;
>         result = dg(r);
>         if (result)
>             goto END;
>       END:
>         return result;
>     }
> }
> 
> 
> Python syntax was good enough that C# has copied it.
> 
> 
> The other way to create a struct/class generator in D2 is with the Range protocol, with the methods that ask for the next item, etc. I have seen that such generators sometimes are harder to write, because you have to manage the state yourself, but they can be more efficient and they are more powerful.
> 
> Both ways are useful, because they are useful for different situations. They are one the inverted-out version of the other.
> 
> 
> The implementations of generators in C# and Python (and D, but not Lua) suffer of a problem, that increases the computational complexity of the code when there is a nested generator.
> 
> And both Python and C# have found similar solutions that improve both the syntax and the performance (both languages have not yet implemented it).
> 
> Python version:
> http://python.org/dev/peps/pep-0380/
> 
> With that this code:
> 
> def foo():
>   for x in some_iterable:
>     yield x
> 
> 
> Becomes:
> 
> def foo():
>   yield from some_iterable
> 
> 
> In C# it can be called "yield foreach":
> 
> http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html
> 
> http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections
> 
> http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx
> 
> 
> As D2 will start coming out of its embryo state, it too will probably have to fece this problem with iterators. In the meantime I hope the opApply syntax will be replaced by something better (the yield I've shown).
> 
> Bye,
> bearophile