Ranges and Algorithms -- Templates, Delegates, or Ranges?

Tue Feb 22 02:29:41 PST 2011

Having learned functional programming in Scheme a couple months ago, I tried my hand at using map(), reduce(), and filter() in D:

    int addend = 5;
    map(delegate int(int x) { return x + addend; }, iota(1, 5));

but it didn't work. It turned out that map() actually took the mapper as its _template_ argument, not as a function argument. Not too much of
a problem, it probably seemed to be...
except that it's a critical problem. It makes map(), reduce(), filter(), etc. to become 10x less useful, because there's no way to change
their behavior at runtime, depending on program's state.

This was disappointing, but it wasn't like I couldn't write a better version of it, so I did:

static T2[] map(T2, T1)(scope T2 delegate(T1) mapper, T1[] arr)
{
	auto result = new T2[arr.length];
	foreach (i, item; arr) { result[i] = mapper(item); }
	return result;
}

except then I quickly realized that this was array-based, and useless for ranges. So I wrote my own MapRange struct (which I didn't paste
here) and map() function:

static auto map(TFn, T...)(TFn map, T args)
    if (isCallable!(TFn) && T.length > 0
        && allSatisfy!(isInputRange, T))
    { return MapRange!(TFn, T)(map, args); }

Everything seemed solved until today, when I realized something critical: the mapper is no longer a 'scope' variable, and will require a heap
allocation each time a lambda with closure is created.

Of course, lambdas with closures are critical in functional programming, and one of the reasons I moved to D was precisely features like the
'scope' keyword, that allowed you to avoid heap allocations. But I couldn't make this a scope variable, because the delegate reference is
obviously escaped from the function.

So my question is, what's the solution to these needs? Do we really need three versions of every function like map()? After all, we should be
able to perform **any** of the following, depending on the situation:

1. Call map() with the function as a template argument, the way it currently works in std.algorithm. This is necessary to allow the compiler
to infer the types; otherwise, if we passed the function addresses, we'd have to instantiate the template for the mapper manually, like:
   map(&mapper!(int[typeof(SomeExpression)]), myRange);

2. Call map() with a *scoped* lambda that has a closure. This is critical to functional programming, but at the same time, we need to somehow
prevent a reference to the mapper from escaping; otherwise, it will trigger a heap allocation, which can be costly in loops. Of course, this
means that the mapping would have to occur entirely _inside_ the map() function (because the delegate will no longer be accessible), and so
it means we would either need to pass it a sink() delegate, or we would need to aggregate the results into an array. Neither is pretty, but
this feature is needed, so I don't know what a good solution is.

3. Call map() with a *range* and some kind of lambda, and have it return another range that maps the original. This prevents the two issues
in #2 (regarding passing a sink or aggregating the results), but it would cause a heap allocation.

Ultimately, since there's no universal solution and because all three of these situations are useful in many cases, it seems to me that we
really need *three* versions of every higher-order function.

What do you think? Is there a better solution?

Thanks!