groupBy/chunkBy redux

Sat Feb 14 11:39:43 PST 2015

On 2/13/15 3:45 PM, Peter Alexander wrote:
> On Friday, 13 February 2015 at 18:32:35 UTC, Andrei Alexandrescu wrote:
>> * Perhaps rename groupBy to chunkBy. People coming from SQL and other
>> languages might expect groupBy to do hash-based grouping.
>
> Agreed.
>
>
>> * The unary function implementation must return for each group a tuple
>> consisting of the key and the lazy range of values. The binary
>> function implementation should continue to only return the lazy range
>> of values.
>
> Is the purpose of this just to avoid the user potentially needing to
> evaluate the key function twice?

Yah. Also in many cases of grouping you need the key anyway.

>> * SortedRange should add a method called group(). Invoked with no
>> predicate, group() should do what chunkBy does, using the sorting
>> predicate.
>
> Will need to be called something else since there may be existing code
> trying to call std.algorithm.group using UFCS. This would change its
> behaviour.

Oops, I thought that's groups. I guess we could call it groupBy as well, 
even though it has no predicate so "by" does not participate to a sentence.

>> * aggregate() should detect the two kinds of results per group (well,
>> chunk) and process them accordingly: for unary-predicate chunks, pass
>> the key through and only process the lazy range. Meaning:
>>
>> auto data = [
>>   tuple("John", 100),
>>   tuple("John", 35),
>>   tuple("Jane", 200),
>>   tuple("Jane", 87),
>> ];
>> auto r = data.chunkBy!(x => x[0]).aggregate!sum;
>>
>> yields a range of tuples: tuple("John", 135), tuple("Jane", 187).
>
> Not sure I understand how this is meant to work.
>
> With your second bullet implemented, data.chunkBy!(x => x[0]) will return:
>
> tuple("John", [tuple("John", 100), tuple("John", 35)]),
> tuple("Jane", [tuple("Jane", 200), tuple("Jane", 87)])

Correct.

> (here [...] denotes the sub-range, not an array).
>
> So aggregate will ignore the key part, but how does it know to ignore
> the name in sub-ranges?

Oops, I was wrong here. Let's think about aggregate() integration 
post-2.067 and remove it for now.

Peter, could you please take this?

Andrei