[WORK] groupBy is in! Next: aggregate
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Mon Jan 26 10:31:04 PST 2015
On 1/26/15 10:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>>> wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
> [...]
>>> My suggestion was to keep the name but change the code of your
>>> groupBy implementation to return tuple(key, lazyValues) instead of
>>> just lazyValues. That needs to happen only for binary predicates;
>>> unary predicates will all have alternating true/false keys.
>
> Huh, what? I think there's some misunderstanding here. The unary version
> of the current groupBy translates to a binary predicate:
>
> groupBy!(a => a.field)
>
> is equivalent to:
>
> groupBy!((a, b) => a.field == b.field)
>
> I don't see how this has anything to do with alternating keys.
Here's how. Basically the binary-predicate version has only Boolean keys
that may be false or true. They will alternate because it's the change
that triggers creation of a new group. In this example:
[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
.groupBy!((a, b) => (a & 3) == (b & 3))
the groupBy function has no information about the result of a & 3. All
it "sees" is the result of the predicate: true, false, true, false...
HOWEVER, if you write it like this:
[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
.groupBy!(a => (a & 3))
then groupBy sees the actual value of the function and can emit the
proper key.
So the key (ahem) here is to make groupBy with unary predicate different
from groupBy with binary predicate. The former returns the tuple, the
latter is unchanged. Makes sense?
> [...]
>> That's much more harder to implement than what it does right now. I
>> don't know how you'll manage to do the lazyValues thing: you'd need to
>> make multiple passes in the range.
>>
>> Again, other languages return an associative array in this case.
>
> I think we're talking past each other here. What groupBy currently does
> is to group elements by evaluating the predicate on *consecutive runs*
> of elements. What some people seem to demand is a function that groups
> elements by *global evaluation* of the predicate over all elements.
> These two are similar but divergent functions, and conflating them is
> not helping this discussion in any way.
Agreed.
> If "group by" in other languages refers to the latter function, then
> that means "groupBy" is poorly-named and we need to come up with a
> better name for it. Changing it to return tuples and what-not seems to
> be beating around the bush to me.
I like our notion of groupBy the same way I like the notion that
something must be a random-access range in order to be sorted. (Other
languages give the illusion they sort streams by internally converting
them to arrays.) D offers better control, better flexibility, and richer
semantics.
Andrei
More information about the Digitalmars-d
mailing list