Pandas Add New "rank" Columns For Every Column
I have a df like so (actual df has 4.5 mil rows, 23 cols): group feature col1 col2 col3 g1 f1 1 10 100 g1 f1 11 9 1000 g1 f2 0 8
Solution 1:
I would use groupby
on ['group', 'feature']
to produce an intermediary dataframe containing the sum, avg and max columns (not the ranks), and then again groupby
on group
only to produce the ranks.
Intermediary dataframe:
df2 = pd.concat([
df.iloc[:,[0,1,2]].groupby(['group', 'feature']).sum(),
df.iloc[:,[0,1,3]].groupby(['group', 'feature']).mean(),
df.iloc[:,[0,1,4]].groupby(['group', 'feature']).max()
], axis=1)
The intermediary dataframe is:
col1 col2 col3
group feature
g1 f1 12 9.500000 1000
f2 0 8.000000 200
g2 f1 2 7.000000 330
f2 3 7.000000 331
f3 1 7.000000 100
g3 f1 7 7.666667 101
Now for the final dataframe:
df3 = df2.groupby('group').rank(method='min', ascending=False).reset_index()
which finally gives:
group feature col1 col2 col3
0 g1 f1 1.0 1.0 1.0
1 g1 f2 2.0 2.0 2.0
2 g2 f1 2.0 1.0 2.0
3 g2 f2 1.0 1.0 1.0
4 g2 f3 3.0 1.0 3.0
5 g3 f1 1.0 1.0 1.0
For the second part of the question, I would just change the indexing of the intermediary dataframe, and compute ranks after grouping on 'feature'
:
dfx4 = dfx.reset_index().set_index(['feature', 'group']
).sort_index().groupby('feature').rank(
method='min', ascending=False
).reset_index()
which gives:
feature group col1 col2 col3
0 f1 g1 1.0 1.0 1.0
1 f1 g2 3.0 3.0 2.0
2 f1 g3 2.0 2.0 3.0
3 f2 g1 2.0 1.0 2.0
4 f2 g2 1.0 2.0 1.0
5 f3 g2 1.0 1.0 1.0
Post a Comment for "Pandas Add New "rank" Columns For Every Column"