groupBy predicates: unary, binary, or both?
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sun Dec 29 01:08:38 PST 2013
I'm working on
https://github.com/D-Programming-Language/phobos/pull/1186, which is
somewhat important; "group by" is a powerful operation. Along the way, I
stumbled upon an interesting issue in which I wanted to consult the
community.
Usually groupBy is used to find runs of equivalent elements in a range.
For example:
[1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6].groupBy()
yields the range [[1, 1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [6]].
As is usual with Phobos algorithms, groupBy accepts a predicate. The
default (as illustrated above) is "a = b", i.e. all elements in a group
are equal to one another.
Equality is transitive and commutative. But there are useful cases in
which predicates are not commutative. Consider we want to find strictly
monotonic subranges in the range. We'd write:
auto r = [1, 3, 2, 4, 5, 1].groupBy!"a < b";
That should produce [[1, 3], [2, 4, 5], [1]]. For non-strict monotonic
runs, the predicate would be "a <= b" etc. All that is pretty awesome.
However, that makes life a bit tougher for the algorithm - it must only
compare adjacent elements only. In the case of "a = b", it suffices to
save the first element in a group and compare it against every other
element in the group.
Meanwhile, a very similar pull request
(https://github.com/D-Programming-Language/phobos/pull/1453) uses unary
predicates, i.e. an optional transformation function that is then used
in conjunction with "==" to decide which elements belong in the same group.
Unary predicates make life simpler for the algorithm (save the transform
of the first element, then compare it against the transform of the next
etc) and are often easier to write by the end user, too (e.g. just write
"a.length" instead of "a.length == b.length" to group by length).
So I was thinking to allow both cases, with the understanding that
grouping by unary predicates uses "==" for comparison whereas grouping
by binary predicates looks at adjacent elements to figure out group
membership. That approach would, however, preclude the use of string
lambdas (because deducing arity for string lambdas is possible, but
quite unwieldy).
What do you think?
Andrei
More information about the Digitalmars-d
mailing list