Skip to content Skip to sidebar Skip to footer

How To Apply Different Functions To A Groupby Object?

I have a dataframe like this: import pandas as pd df = pd.DataFrame({'id': [1, 2, 1, 1, 2, 1, 2, 2], 'min_max': ['max_val', 'max_val', 'min_val', 'min_val', 'max_va

Solution 1:

Here's a slightly tongue-in-cheek solution:

>>> df.groupby(['id', 'min_max'])['value'].apply(lambda g: getattr(g, g.name[1][:3])()).unstack()
min_max  max_val  min_val
id                       
1              3       10
2             20      -10

This applies a function that grabs the name of the real function to apply from the group key.

Obviously this wouldn't work so simply if there weren't such a simple relationship between the string "max_val" and the function name "max". It could be generalized by having a dict mapping column values to functions to apply, something like this:

func_map = {'min_val': min, 'max_val': max}
df.groupby(['id', 'min_max'])['value'].apply(lambda g: func_map[g.name[1]](g)).unstack()

Note that this is slightly less efficient than the version above, since it calls the plain Python max/min rather than the optimized pandas versions. But if you want a more generalizable solution, that's what you have to do, because there aren't optimized pandas versions of everything. (This is also more or less why there's no built-in way to do this: for most data, you can't assume a priori that your values can be mapped to meaningful functions, so it doesn't make sense to try to determine the function to apply based on the values themselves.)


Solution 2:

One option is to do the customized aggregation with groupby.apply, since it doesn't fit with built in aggregation scenario well:

(df.groupby('id')
 .apply(lambda g: pd.Series({'max': g.value[g.min_max == "max_val"].max(), 
                             'min': g.value[g.min_max == "min_val"].min()})))

#    max    min
#id     
# 1    3     10
# 2   20    -10

Solution 3:

Solution with pivot_table:

df1 = df.pivot_table(index='id', columns='min_max', values='value', aggfunc=[np.min,np.max])
df1 = df1.loc[:, [('amin','min_val'), ('amax','max_val')]]
df1.columns = df1.columns.droplevel(1)
print (df1)
    amin  amax
id            
1     10     3
2    -10    20

Post a Comment for "How To Apply Different Functions To A Groupby Object?"