[WORK] groupBy is in! Next: aggregate

Mon Jan 26 10:31:04 PST 2015

On 1/26/15 10:11 AM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Jan 26, 2015 at 02:50:16PM -0300, Ary Borenszweig via Digitalmars-d wrote:
>> On 1/26/15 2:34 PM, Andrei Alexandrescu wrote:
>>>> On Mon, Jan 26, 2015 at 11:26:04AM +0000, bearophile via Digitalmars-d
>>>> wrote:
>>>>> Russel Winder:
>>>>>
>>>>>> but is it's name "group by" as understood by the rest of the world?
>>>>>
>>>>> Nope...
> [...]
>>> My suggestion was to keep the name but change the code of your
>>> groupBy implementation to return tuple(key, lazyValues) instead of
>>> just lazyValues. That needs to happen only for binary predicates;
>>> unary predicates will all have alternating true/false keys.
>
> Huh, what? I think there's some misunderstanding here. The unary version
> of the current groupBy translates to a binary predicate:
>
> 	groupBy!(a => a.field)
>
> is equivalent to:
>
> 	groupBy!((a, b) => a.field == b.field)
>
> I don't see how this has anything to do with alternating keys.

Here's how. Basically the binary-predicate version has only Boolean keys 
that may be false or true. They will alternate because it's the change 
that triggers creation of a new group. In this example:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
     .groupBy!((a, b) => (a & 3) == (b & 3))

the groupBy function has no information about the result of a & 3. All 
it "sees" is the result of the predicate: true, false, true, false...

HOWEVER, if you write it like this:

[293, 453, 600, 929, 339, 812, 222, 680, 529, 768]
     .groupBy!(a => (a & 3))

then groupBy sees the actual value of the function and can emit the 
proper key.

So the key (ahem) here is to make groupBy with unary predicate different 
from groupBy with binary predicate. The former returns the tuple, the 
latter is unchanged. Makes sense?

> [...]
>> That's much more harder to implement than what it does right now. I
>> don't know how you'll manage to do the lazyValues thing: you'd need to
>> make multiple passes in the range.
>>
>> Again, other languages return an associative array in this case.
>
> I think we're talking past each other here. What groupBy currently does
> is to group elements by evaluating the predicate on *consecutive runs*
> of elements. What some people seem to demand is a function that groups
> elements by *global evaluation* of the predicate over all elements.
> These two are similar but divergent functions, and conflating them is
> not helping this discussion in any way.

Agreed.

> If "group by" in other languages refers to the latter function, then
> that means "groupBy" is poorly-named and we need to come up with a
> better name for it. Changing it to return tuples and what-not seems to
> be beating around the bush to me.

I like our notion of groupBy the same way I like the notion that 
something must be a random-access range in order to be sorted. (Other 
languages give the illusion they sort streams by internally converting 
them to arrays.) D offers better control, better flexibility, and richer 
semantics.

Andrei