groupBy/chunkBy redux
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Sat Feb 14 11:39:43 PST 2015
On 2/13/15 3:45 PM, Peter Alexander wrote:
> On Friday, 13 February 2015 at 18:32:35 UTC, Andrei Alexandrescu wrote:
>> * Perhaps rename groupBy to chunkBy. People coming from SQL and other
>> languages might expect groupBy to do hash-based grouping.
>
> Agreed.
>
>
>> * The unary function implementation must return for each group a tuple
>> consisting of the key and the lazy range of values. The binary
>> function implementation should continue to only return the lazy range
>> of values.
>
> Is the purpose of this just to avoid the user potentially needing to
> evaluate the key function twice?
Yah. Also in many cases of grouping you need the key anyway.
>> * SortedRange should add a method called group(). Invoked with no
>> predicate, group() should do what chunkBy does, using the sorting
>> predicate.
>
> Will need to be called something else since there may be existing code
> trying to call std.algorithm.group using UFCS. This would change its
> behaviour.
Oops, I thought that's groups. I guess we could call it groupBy as well,
even though it has no predicate so "by" does not participate to a sentence.
>> * aggregate() should detect the two kinds of results per group (well,
>> chunk) and process them accordingly: for unary-predicate chunks, pass
>> the key through and only process the lazy range. Meaning:
>>
>> auto data = [
>> tuple("John", 100),
>> tuple("John", 35),
>> tuple("Jane", 200),
>> tuple("Jane", 87),
>> ];
>> auto r = data.chunkBy!(x => x[0]).aggregate!sum;
>>
>> yields a range of tuples: tuple("John", 135), tuple("Jane", 187).
>
> Not sure I understand how this is meant to work.
>
> With your second bullet implemented, data.chunkBy!(x => x[0]) will return:
>
> tuple("John", [tuple("John", 100), tuple("John", 35)]),
> tuple("Jane", [tuple("Jane", 200), tuple("Jane", 87)])
Correct.
> (here [...] denotes the sub-range, not an array).
>
> So aggregate will ignore the key part, but how does it know to ignore
> the name in sub-ranges?
Oops, I was wrong here. Let's think about aggregate() integration
post-2.067 and remove it for now.
Peter, could you please take this?
Andrei
More information about the Digitalmars-d
mailing list