Parallel prefix algos & D

Mon Jan 30 19:48:33 PST 2012

This post is about many things at the same time.

This is a short slides pack about Parallel Prefix Algorithms (but it's not very readable, it's too much succinct):
http://ocw.mit.edu/courses/mathematics/18-337j-applied-parallel-computing-sma-5505-spring-2005/lecture-notes/lec4.pdf

Not too much time ago Martin Odersky has given a lecture about quite related matters.

literateprograms.org is a site where people show algorithms implemented in various languages, and explain them more or less according to the ideas of Knuth's Literate Programming. The site already allows pages about D language, but I think so far no one has added D implementations (while the site rosettacode has hundreds of D2 implementations). You are invited to add D stuff there.

This little task page of literateprograms.org uses Python as implementation language:
http://en.literateprograms.org/Parallel_prefix_%28Python%29

The basic Python2 code (with small changes) is short and simple, but it's not parallel (I have omitted the code relative to the second part of the page):

from itertools import imap # lazy map (parallel too)

def scan(op, a):
    print "in:", a
    if len(a) > 1:
        a[1::2] = imap(op, a[0::2], a[1::2])
        a[1::2] = scan(op, a[1::2])
        a[2::2] = imap(op, a[1::2], a[2::2])
    print "out:", a
    return a

if __name__ == "__main__":
    from operator import add, mul
    scan(add, range(16))
    print
    scan(mul, range(1, 17))
    print
    scan(add, list("hi world"))

A D2 translation of this Python code is interesting for two reasons:
1) The D code is supposed to use parallelism for real.
2) It shows some consequences of the design decisions of D and Phobos.

Regarding the point 2, std.algorithm.map doesn't work with two or more iterables in parallel as Python (I have not appreciated this design decision, since the beginning: from my experience creating two outputs from a single map scan is quite less commonly needed than having a mapping function that used two or more arguments). And built-in D slices don't have an optional stride, so the map() has to work with a lambda that unpacks the tuples produced by something like:

zip(stride(a,2), stride(a[1..$],2))

and then to perform the strided (striped) assign like "a[1::2] = " you need a copy. D looks quite worse than the very clear original python version (but Python is harder to parallelize).

Regarding the point 1 I think Phobos already has enough to parallelize the code. You are invited to write the code, if you have some free time, and if you want even to see if the parallelism actually reduces the run time here.

Bye,
bearophile