Pandas Add New "rank" Columns For Every Column

November 10, 2022 Post a Comment

I have a df like so (actual df has 4.5 mil rows, 23 cols): group feature col1 col2 col3 g1 f1 1 10 100 g1 f1 11 9 1000 g1 f2 0 8

Solution 1:

I would use groupby on ['group', 'feature'] to produce an intermediary dataframe containing the sum, avg and max columns (not the ranks), and then again groupby on group only to produce the ranks.

Intermediary dataframe:

df2 = pd.concat([
    df.iloc[:,[0,1,2]].groupby(['group', 'feature']).sum(),
    df.iloc[:,[0,1,3]].groupby(['group', 'feature']).mean(),
    df.iloc[:,[0,1,4]].groupby(['group', 'feature']).max()
    ], axis=1)

The intermediary dataframe is:

               col1      col2  col3
group feature                      
g1    f1         12  9.500000  1000
      f2          0  8.000000   200
g2    f1          2  7.000000   330
      f2          3  7.000000   331
      f3          1  7.000000   100
g3    f1          7  7.666667   101

Now for the final dataframe:

df3 = df2.groupby('group').rank(method='min', ascending=False).reset_index()

which finally gives:

  group feature  col1  col2  col3
0    g1      f1   1.0   1.0   1.0
1    g1      f2   2.0   2.0   2.0
2    g2      f1   2.0   1.0   2.0
3    g2      f2   1.0   1.0   1.0
4    g2      f3   3.0   1.0   3.0
5    g3      f1   1.0   1.0   1.0

For the second part of the question, I would just change the indexing of the intermediary dataframe, and compute ranks after grouping on 'feature':

dfx4 = dfx.reset_index().set_index(['feature', 'group']
                                   ).sort_index().groupby('feature').rank(
                                   method='min', ascending=False
                                   ).reset_index()

which gives:

  feature group  col1  col2  col3
0      f1    g1   1.0   1.0   1.0
1      f1    g2   3.0   3.0   2.0
2      f1    g3   2.0   2.0   3.0
3      f2    g1   2.0   1.0   2.0
4      f2    g2   1.0   2.0   1.0
5      f3    g2   1.0   1.0   1.0

Python Guru

Pandas Add New "rank" Columns For Every Column

Solution 1:

Post a Comment for "Pandas Add New "rank" Columns For Every Column"