Pandas example of groupby

Laeeth Isharc via Digitalmars-d digitalmars-d at puremagic.com
Mon Jan 26 11:59:10 PST 2015


>
> If "group by" in other languages refers to the latter function, 
> then
> that means "groupBy" is poorly-named and we need to come up 
> with a
> better name for it. Changing it to return tuples and what-not 
> seems to
> be beating around the bush to me.
>
>
> T

T: you are good with algorithms.  In many applications you have a 
bunch of results and want to summarise them.  This is often what 
the corporate manager is doing with Excel pivot tables, and it is 
what the groupby function is used for in pandas.  See here for a 
simple tutorial.

http://wesmckinney.com/blog/?p=125

And here for a summary of what pandas can do with data:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html

Is there any reason why we shouldn't add to Phobos: median, 
ranking, stddev, variance, correlation, covariance, skew, 
kurtosis, quantile, moving average, exp mov average, rolling 
window (see pandas)?

I personally am fine with the implementation we have (although as 
Ray Dalio would say. I haven't yet earned the right that you 
should care what I think).  All that it means is that you need to 
sort on multi key your results first before passing to groupby.

My question is how much is lost by doing it in two steps (sort, 
groupby) rather than one.  I don't think all that much, but it is 
not my field,  I am also not that bothered, because this comes at 
the end of processing, not within the inner loop, so for me I 
don't think it makes a difference for now.  If data sets reach 
commoncrawl type sizes then it might be different, although I 
will take D over java any day, warts and all.

In any case, the documentation should be very clear on what 
groupby does, and how the user can do what he might be expecting 
to achieve, coming from a different framework.

It would be interesting to benchmark D against pandas (which is 
implemented in cython for the key bits) and see how we look.


More information about the Digitalmars-d mailing list