Updating Column In A Dataframe Based On Multiple Columns
I have a column named 'age' with a few NaN; crude logic of deriving the value of the age is finding the mean of age using 2 key categorical variables - job, gender df = pd.DataFra
Solution 1:
Use Series.fillna
with GroupBy.transform
, but because in sample data are not data for combination c, M
there is NaN
:
df['age']= df['age'].fillna(df.groupby(['job','gender'])['age'].transform('mean'))
print (df)
col1 age job gender
0119.0 a M
1223.0 b F21NaNc M
3229.0 d F4370.0 e M
5432.0 a F61127.0 b M
71248.0cF81339.0 d M
91270.0 e M
101129.0 a F11151.0 b F121048.0cF
If need also replace NaN
by groiping only by id
add another fillna
:
avg1 = df.groupby(['job','gender'])['age'].transform('mean')
avg2 = df.groupby('job')['age'].transform('mean')
df['age'] = df['age'].fillna(avg1).fillna(avg2)
print (df)
col1 age job gender
0 1 19.0 a M
1 2 23.0 b F
2 1 48.0 c M
3 2 29.0 d F
4 3 70.0 e M
5 4 32.0 a F
6 11 27.0 b M
7 12 48.0 c F
8 13 39.0 d M
9 12 70.0 e M
10 11 29.0 a F
11 1 51.0 b F
12 10 48.0 c F
Post a Comment for "Updating Column In A Dataframe Based On Multiple Columns"