Purely stack-based generators
Norbert Nemec
Norbert at Nemec-online.de
Thu Mar 18 04:34:42 PDT 2010
I really like that simplified, "yield"-based syntax. However, I believe
that it conflicts fundamentally with D's confusing concept of generators.
Most other languages that offer generators (e.g. Python) view the
generator as an object that spits out values one-by-one. This is very
similar to a co-routine.
Purely for performance reasons, D approaches the issue in a backward
way, passing the loop body as a delegate. Indeed, this may be faster
than full-fledged co-routines, but it is rather difficult to map a
simplified coroutine-like syntax based on 'yield' statements to this
very different concept by mere "syntactic sugar". Furthermore, the D
concept scales awkwardly to combining multiple generators in one loop or
using nested generators.
In fact, there is a simple but powerful way to design purely stack-based
generators, sticking to the straightforward concept of co-routines
without the overhead of full co-routines and without the need for
delegates (inspired by the Sather language):
Make a generator a co-routine but restrict its lifetime to the inside of
a loop. When the loop starts, it calls all the contained generators to
initialize their local data on the regular stack. When the generators
yield, the stack-pointer is *not* reset (as it would be at a return) but
kept at its value. The generator routines can therefore be called again
within the same loop and find their local data still on the stack. When
the loop finishes, the stack pointer is reset to its initial value and
all the local data of all the contained generators is immediately released.
This design does not interfere with the regular use of the stack for
regular function calls within the loop. Nested loops are possible, as
are nested generators or loops that combine values from multiple
generators. All with a straightforward syntax and all based on pure
stack operations that are no more costly than regular function calls and
can be inlined just as easily.
This concept even allows to place calls to generators at arbitrary
positions inside a loop, rather than at the beginning only. The regular
D loop:
foreach(auto x;mygenerator) {
do_something(x);
}
could be written as something like
loop {
do_something(#mygenerator);
}
where I have blindly introduced a "#" operator to signify a generator
call that may either yield a value or break the loop. (I don't care
about the exact syntax, but it should stick out somehow.
This can be generalised as
loop {
do_something();
do_something_else(#mygenerator);
yet_something_else(#anothergenerator);
};
where each expression containing a # generator call may either yield a
value or break the loop.
The clear restriction of this is that "native" generators of this design
are not stand-alone objects, cannot survive outside a loop and work only
within a single thread. More general generators, however, can easily be
implemented as objects that offer a "native" generator as interface.
The only difficulty of this approach is that it has to be done at the
assembler level because a unusual calling convention is needed. Typical
high-level languages do not allow an emulation of this mechanism. I do
not know about byte-code representations.
This whole mechanism is inspired by the Sather languages that first
would have allowed it (even though that compiler never actually
implemented it this way). To my knowledge, all other languages that
offer generators did break this concept by allowing generators to exist
beyond the lifetime of a loop and therefore depend on heap-allocated data.
I don't expect that D would ever adopt such a breaking change, even
though it would be the most efficient implementation of generators while
retaining the conceptually simple definition of co-routines. Still, I
find the concept just so beautiful that I have to talk about it once in
a while...
Greetings,
Norbert
bearophile wrote:
> D currently has two ways to define a generator, the first is the older one with opApply (And opApplyReverse):
> int opApply(int delegate(ref Type [, ...]) dg);
>
>
> The syntax of opApply is terrible: hard to remember, bug-prone, noisy, intrusive. And this syntax doesn't even gain performance, because dmd is not very good at inlining here.
> Several times in the past I have said that a very good syntax for this is the Python one:
>
> def foo():
> yield 5
>
> It's really easy to use and remember, not bug prone. This possible syntax is about 75 times better than the current opApply:
>
> yield(int) foo() {
> yield 5;
> }
>
>
> That's sugar for something like:
>
> struct foo {
> int opApply(int delegate(ref int) dg) {
> int result;
> int r = 5;
> result = dg(r);
> if (result)
> goto END;
> END:
> return result;
> }
> }
>
>
> Python syntax was good enough that C# has copied it.
>
>
> The other way to create a struct/class generator in D2 is with the Range protocol, with the methods that ask for the next item, etc. I have seen that such generators sometimes are harder to write, because you have to manage the state yourself, but they can be more efficient and they are more powerful.
>
> Both ways are useful, because they are useful for different situations. They are one the inverted-out version of the other.
>
>
> The implementations of generators in C# and Python (and D, but not Lua) suffer of a problem, that increases the computational complexity of the code when there is a nested generator.
>
> And both Python and C# have found similar solutions that improve both the syntax and the performance (both languages have not yet implemented it).
>
> Python version:
> http://python.org/dev/peps/pep-0380/
>
> With that this code:
>
> def foo():
> for x in some_iterable:
> yield x
>
>
> Becomes:
>
> def foo():
> yield from some_iterable
>
>
> In C# it can be called "yield foreach":
>
> http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html
>
> http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections
>
> http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx
>
>
> As D2 will start coming out of its embryo state, it too will probably have to fece this problem with iterators. In the meantime I hope the opApply syntax will be replaced by something better (the yield I've shown).
>
> Bye,
> bearophile
More information about the Digitalmars-d
mailing list