groupBy predicates: unary, binary, or both?

Sun Dec 29 01:08:38 PST 2013

I'm working on 
https://github.com/D-Programming-Language/phobos/pull/1186, which is 
somewhat important; "group by" is a powerful operation. Along the way, I 
stumbled upon an interesting issue in which I wanted to consult the 
community.

Usually groupBy is used to find runs of equivalent elements in a range. 
For example:

[1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6].groupBy()

yields the range [[1, 1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [6]].

As is usual with Phobos algorithms, groupBy accepts a predicate. The 
default (as illustrated above) is "a = b", i.e. all elements in a group 
are equal to one another.

Equality is transitive and commutative. But there are useful cases in 
which predicates are not commutative. Consider we want to find strictly 
monotonic subranges in the range. We'd write:

auto r = [1, 3, 2, 4, 5, 1].groupBy!"a < b";

That should produce [[1, 3], [2, 4, 5], [1]]. For non-strict monotonic 
runs, the predicate would be "a <= b" etc. All that is pretty awesome.

However, that makes life a bit tougher for the algorithm - it must only 
compare adjacent elements only. In the case of "a = b", it suffices to 
save the first element in a group and compare it against every other 
element in the group.

Meanwhile, a very similar pull request 
(https://github.com/D-Programming-Language/phobos/pull/1453) uses unary 
predicates, i.e. an optional transformation function that is then used 
in conjunction with "==" to decide which elements belong in the same group.

Unary predicates make life simpler for the algorithm (save the transform 
of the first element, then compare it against the transform of the next 
etc) and are often easier to write by the end user, too (e.g. just write 
"a.length" instead of "a.length == b.length" to group by length).

So I was thinking to allow both cases, with the understanding that 
grouping by unary predicates uses "==" for comparison whereas grouping 
by binary predicates looks at adjacent elements to figure out group 
membership. That approach would, however, preclude the use of string 
lambdas (because deducing arity for string lambdas is possible, but 
quite unwieldy).

What do you think?

Andrei