Reformat Pandas Dataframe
I have a pandas.DataFrame with the following data: country branch Name salary mobile no emailid x a aa 250000 Null
Solution 1:
You can replace
Null
to NaN
and then groupby
with agg
and last reset_index
:
print data_df
country branch Name salary mobile no emailid position
0x a aa 250000 Null Null unknown
1x b bb 3500008976646410 xx@xx.com unknown
2y c cc 4500008777945411 yy@yy.com unknown
3y d dd 589630 Null Null unknown
data_df = data_df.replace('Null', np.nan)
print data_df
country branch Name salary mobile no emailid position
0x a aa 250000 NaN NaN unknown
1x b bb 3500008976646410 xx@xx.com unknown
2y c cc 4500008777945411 yy@yy.com unknown
3y d dd 589630 NaN NaN unknown
df = data_df.groupby(['country', 'branch']).agg({'Name': 'count',
'mobile no':'count',
'emailid': 'count',
'position': 'count'})
print df.reset_index()
country branch emailid position Name mobile no0x a 01101x b 11112y c 11113y d 0110
EDIT:
If you need count positions by category
, create columns
for each category, then groupby
with count
, drop
column salary
and last reset_index
:
print data_df
country branch Name salary mobile no emailid
0 x a aa 250000NullNull1 x a aa 20000NullNull2 x b bb 3500008976646410 xx@xx.com
3 y c cc 450008777945411 yy@yy.com
4 y d dd 589630NullNull
normal = data_df['salary'] <=20000
experienced = (data_df['salary'] >20000) & (data_df['salary'] <=50000)
unknown= data_df['salary'] >50000
data_df.loc[normal, 'position_normal'] ='normal employee'
data_df.loc[experienced,'position_experienced'] ='experienced employee'
data_df.loc[unknown,'position_unknown'] ='unknown employee'
print data_df
country branch Name salary mobile no emailid position_normal \
0 x a aa 250000NullNull NaN
1 x a aa 20000NullNull normal employee
2 x b bb 3500008976646410 xx@xx.com NaN
3 y c cc 450008777945411 yy@yy.com NaN
4 y d dd 589630NullNull NaN
position_experienced position_unknown
0 NaN unknown employee
1 NaN NaN
2 NaN unknown employee
3 experienced employee NaN
4 NaN unknown employee
#replace Null to NaN
data_df = data_df.replace('Null', np.nan)
df = data_df.groupby(['country', 'branch']).count()
#remove column salarydf = df.drop('salary', axis=1)
df = df.reset_index()
printdf
country branch Name mobile no emailid position_normal \
0 x a 2 0 0 1
1 x b 1 1 1 0
2 y c 1 1 1 0
3 y d 1 0 0 0
position_experienced position_unknown
0 0 1
1 0 1
2 1 0
3 0 1
Post a Comment for "Reformat Pandas Dataframe"