Skip to content Skip to sidebar Skip to footer

Equivalent Of R Function 'ave' In Python Pandas

I have a dataframe in R. Example: d1<-structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), B = 1:9), .Names = c('A', 'B'), class = 'data.frame', row.names = c(NA, -9L)

Solution 1:

The R ave function (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html) applies the function (default is averaging) to combinations of observations with the same factors levels.

In pandas, there is no such function out of the box, but you can do this with a groupby operation.

Starting from your dataframe:

In [86]: df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2, 2, 3, 3], 'B':range(1,10)})

In [87]: df
Out[87]: 
   A  B
0  1  1
1  1  2
2  1  3
3  2  4
4  2  5
5  2  6
6  2  7
7  3  8
8  3  9

You can add a column C as the result of a grouping by A and calculating the max of B for each group:

In [88]: df['C'] = df.groupby('A')['B'].transform('max')

In [89]: df
Out[89]: 
   A  B  C
011311232133324742575267627773898399

Note: I use the transform method here because I want to end up with the same index as the original dataframe.

For more information on the groupby functionalities in pandas, see http://pandas.pydata.org/pandas-docs/stable/groupby.html

Post a Comment for "Equivalent Of R Function 'ave' In Python Pandas"