How many HOFs in Phobos?

bearophile bearophileHUGS at lycos.com
Mon Jan 31 13:37:14 PST 2011


This is a high-level implementation of Levenshtein distance in Haskell:


levenshtein s1 s2 = last $ foldl transform [0 .. length s1] s2
  where transform ns@(n:ns') c = scanl calc (n+1) $ zip3 s1 ns ns'
          where calc z (c', x, y) = minimum [y+1, z+1, x + fromEnum (c' /= c)]

main = print (levenshtein "kitten" "sitting")


(I don't explain that Haskell code because below I give a kind of D translation that is probably enough to understand.)
It prints 3, the code comes from the rosettacode.org site:
http://rosettacode.org/wiki/Levenshtein_distance#Haskell

Currently in std.algorithm there are higher order functions (HOFs) like "map", "filter", "reduce" etc, reduce is the right fold.

That Haskell code uses a foldl (fold left), and scanl (like foldl, but keeps and returns all intermediate values too). This is a brutal translation to D2:


import std.stdio, std.range, std.array, std.algorithm, std.typecons;

/// foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
T1 foldl(F, T1, Range)(F f, T1 z, Range xs) {
    foreach (x; xs)
        z = f(z, x);
    return z;
}

/// scanl f z [x1, x2, ...] == [z, z `f` x1, (z `f` x1) `f` x2, ...]
T1[] scanl(F, T1, Range)(F f, T1 z, Range xs) {
    auto result = [z];
    foreach (x; xs) {
        z = f(z, x);
        result ~= z;
    }
    return result;
}

auto levenshtein(string s1, string s2) {
    int[] transform(int[] ns, char c) {
        auto n = ns[0];
        auto ns2 = ns[1..$];
        int compute(int z, Tuple!(dchar, int, int) t) {
            return min(t[2]+1, z+1, t[1] + (t[0] != c));
        }
        return scanl(&compute, n+1, zip(s1, ns, ns2));
    }
    return foldl(&transform, array(iota(cast(int)s1.length+1)), s2)[$-1];
}

void main() {
    string add(string x, string y) { return x ~ y; }
    writeln(foldl(&add, "", ["ab", "cd", "ef"])); // "abcdef"
    writeln(scanl(&add, "", ["ab", "cd", "ef"])); // ["", "ab", "abcd", "abcdef"]

    writeln(levenshtein("kitten", "sitting")); // 3
}


Haskell has tons of built-in HOFs (monadic ones too, like foldM), but D is less functional than Haskell, and reading highly functional-style code is not easy for many programmers (and D syntax limits make such code noisy). What's the right number of HOFs to put in std.algorithm+std.range? Are (better implementations of) scanl and foldl fit for Phobos? (I am not suggesting to rename "reduce" to "foldr". You may add an "alias reduce foldr;" if you want, but that's all).

Hours ago I have added this suggestion to bugzilla:
http://d.puremagic.com/issues/show_bug.cgi?id=5510

Bye,
bearophile


More information about the Digitalmars-d mailing list