pure generators

Wed May 12 16:46:30 PDT 2010

This is a post of unfocused musings, so you can ignore this post if you have better things to do.

'generators' are something like structs that contain an opApply or the methods of the Range protocol:

import std.stdio: write;

struct XPrimes {
    int stop;

    int opApply(int delegate(ref int) dg) {
        int result;
        foreach (x; 2 .. stop) {
            bool all = true;
            foreach (i; 2 .. x)
                if (!(x % i)) {
                    all = false;
                    break;
                }
            if (all) {
                result = dg(x);
                if (result)
                    break;
            }
        }
        return result;
    }
}

void main() {
    foreach (p; XPrimes(30))
        write(p, " ");
}

This is how you can do the same thing in Python, the code is a bit shorter:

>>> p = (x for x in xrange(2,30) if all(x % i for i in xrange(2,x)))
>>> list(p)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
>>> list(p)
[]

That code also shows that Python generators can be used only once. So using Range speech, they are all "Input Ranges", I presume to avoid indemonstrable assumptions about their nature that can lead to bugs.

In D the situation is different, this prints the primes two times:

import std.stdio: write, writeln;

class XPrimes {
    int stop;
    this(int s) { stop = s; }

    int opApply(int delegate(ref int) dg) {
        int result;
        foreach (x; 2 .. stop) {
            bool all = true;
            foreach (i; 2 .. x)
                if (!(x % i)) {
                    all = false;
                    break;
                }
            if (all) {
                result = dg(x);
                if (result)
                    break;
            }
        }
        return result;
    }
}

void main() {
    auto primes = new XPrimes(30);
    foreach (p; primes)
        write(p, " ");
    writeln();
    foreach (p; primes)
        write(p, " ");
}

In few more years of D learning I hope to understand why D iteration is designed that way, and if there are problems in it.

In practice a generator like:

(x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

is kind of pure, because despite being lazy, the output it generates is only influences by an input parameter N. So it's a Forward Range in D speech.

In theory the D Ranges can grow have a standard constant that the programmer can add to tell if it's an Input Range or Forward Range. I don't know if this is a good idea, the compiler can't verify that this constant value is right.

So in practice the idea of pure generators can be seen as a way (that the compiler is able to verify) that a generator is not just an Input Range, but a Forward Range. It can be possible to test it using similar ways used to test that a function marked with pure is actually pure.

What can pure generators be useful for in D?

In Haskell many "generators" are pure, this allows to do things like:

fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)
fibs = take 20 fibonacci
main = print fibs

That prints:
[1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]

The meaning of that little Haskell program is simple:
fibonacci is a symbol, it can be seen as a lazy generator that yields a list of numbers. The first and second are 1 (the colon stands for list item concatenation), to it concatenates the zipping between the result of the generator and another "copy" of the same generator minus the first item.

You can write the same algorithm in Python using lazy iterators:

from itertools import izip, islice

def fibonacci():
    yield 1
    yield 1
    for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)):
        yield nm1 + n

fibs = islice(fibonacci(), 0, 20)
print list(fibs)

(I have tried to write this same program in D1 but it seems rather impossible. I think it's possible in D2 but its debugging gets too much complex for my tastes. In D I'd like a nicer yield as in Python and Chapel).

Even if doesn't show, the Haskell compiler is able to optimize quite well that little program, it can generate many Fibonacci numbers in a short time, while that Python code is really slow, uses tons of RAM, and it soon blows up the stack. (Lazyness in Python is easy to use, but it was bolted on Python at the last minute and it's not always good enough).

A pure generator means that it always yields the same sequence of outputs given the same input. This allows for optimizations done by Haskell compiler in that example. In theory a D compiler can do the same optimizations.

pure yield(int) primes(int stop) {
    foreach (x; 2 .. stop) {
        // an all() HOF can help here
        bool all = true;
        foreach (i; 2 .. x)
            if (!(x % i)) {
                all = false;
                break;
            }
        if (all)
            yield x;
    }
}

I am quite ignorant still about such topics of lazy pure functional programming.

Bye,
bearophile