Array comprehensions and the like

bearophile bearophileHUGS at lycos.com
Fri Jul 18 18:21:39 PDT 2008


I have already discussed about this topic a couple of times in the past, but this time I have more things to talk about :-)

I think Python-like array comprehensions and the like are quite useful, their light syntax replaces the need for map, filter, xfilter, xmap, some usages of flatten, xflatten, and some more things, while being quite readable still and short.

Recently a DMDScript too has gained array comprehensions:
http://developer.mozilla.org/en/docs/New_in_JavaScript_1.7#Array_comprehensions


Few examples in Python 3:

# eager list (array) comp, 1D:
a = [x * x for x in lazyit if x > 5]

# eager list comp, 2D:
parts = [[part.strip() for part in line.split()] for line in file("txt")]

# set comprehension
squares = {x*x for x in xrange(100)}

# dict comprehension
multi = {c:c*i for i,c in enumerate("abcd")}

# lazy, 1D
>>> mat = [[0, 3, 0, 3], [9, 8, 1, 4], [6, 7, 6, 6]]
>>> flat = (abs(el) for row in mat for el in row if el % 2)
>>> flat
<generator object at 0x00A8E1E8>
>>> list(flat)
[3, 3, 9, 1, 7]

-----------------

This is how you can express them with normal D v.1 syntax:

int[] a;
foreach (x; lazyit)
    if (x > 5)
        a ~= x * x;


string[][] parts;
foreach (line; xfile("txt")) {
    auto split_line = line.split();
    auto line_parts = new string[split_line.length];
    foreach (i, part; split_line)
        line_parts[i] = part;
    parts ~= line_parts;
}


uint[int] squares;
for (int i; i < 100; i++)
    squares[i * i] = 0;


string[char] multi;
foreach (i, c; "abcdef") {
    string c_i = new char[i];
    c_i[] = c;
    multi[c] = c_i;
}


struct Process(TyItem) {
    TyItem[][] mat;
    int opApply(int delegate(ref TyItem) dg) {
        int result;
        foreach (row; this.mat)
            foreach (el; row)
                if (el % 2) {
                    TyItem aux = abs(el);
                    result = dg(aux);
                    if (result) break;
                }
        return result;
    }
}
...
auto flat = Process!(typeof(mat[0][0]))(mat);

-----------------

This is how they can be expressed in D with my functional libs, note that x-something are lazy (if he wants, downs may show how to write them with his tools):

auto a = filter((int x){return x > 5;}, xmap((int x){return x * x;}, lazyit));

auto parts = map((string l){return map(strip, l);}, xfile("txt"));

Or shorter (and sometimes more readable):

int x;
auto parts = select(x * x, lazyit, x > 5);

string line;
auto parts = select(map(strip, l), line, xfile("txt"));

auto squares = set( xmap((int x){return x*x;}, xrange(100)) );

auto multi = dict(map2((int i, char c){return toStruct(c, mulItem(c, i));}, "abcdef"));

Or:
string s = "abcdef";
auto multi = dict(s, map2((int i, char c){return mulItem(c, i);}, s));

I've tried this too, to swap the arguments of mulItem(), but std.bind is too much fragile for most purposes of mine (or I am not able to use it properly):
auto multi = dict(map2(&bind(&mulItem!(char), _1, _0), "abcdef"));

auto flat = xflatten(xmap((int[] r){return xmapFilter(&abs, r, (int x){return x%2;});}, mat));

The xflatten() is a quite complex lazy functional, it flattens iterables of any kind with any depth, but the tree of iterables must be complete. With static typing I think you need variants to allow a not complete tree.

With a splitting it becomes a bit more readable, but it's hairy still:
auto aux = (int[] row){return xmapFilter(&abs, row, (int el){return el % 2;});};
auto flat2 = xflatten(xmap(aux, mat));

As you can see the D versions are syntax-heavy (especially the last ones), quite less easy to read, you need good memory or a good IDE to remember the names of the various higher order functions, there are various alternative ways to write them, and often you leave an extra bracket or you add one too much, etc. So much, that in real code I'm not going to use some of those hairy lines (while the original python lines can be acceptable still).
This is a good example that shows that if you want to use a functional-style of coding, the syntax of your language must often be short and good.

-----------------

Possible D 2.x syntax (first try at a possible syntax, they may be improved):

auto a = [x * x foreach(x; lazyit) if (x > 5)];
auto parts = [[part.strip() foreach(part; line.split())] foreach(line; xfile("txt"))];

With this syntax you can't use auto:
void[int] squares = [x*x foreach(x; 0..100)];

Alternative that allows the use of auto too, the void denotes a set:
auto squares = void[x*x foreach(x; 0..100)];

That is related to the syntax for built-in sets I have suggested recently:
void[int] s; // set s
s ~= 56;
s & s2 // intersection, etc.

Here auto is enough because the comma denotes an AA:
auto multi = [c:mulIter(c, i) foreach(i,c; "abcdef")];

Lazy:
auto flat = (abs(el) foreach(row; mat) foreach(el; row) if (el % 2));

-----------------

Partially related for Walter: recently you have asked in this newsgroup for articles regarding a functional style of D. In the last days I have posted one such blog post, that's quite long and contains timings, explanations, links, etc. It compares various implementations of a small but real program in Python, Python-looking D, normal D, C-style D, C, ML, and more.

Bye,
bearophile



More information about the Digitalmars-d mailing list