Skip to content Skip to sidebar Skip to footer

Reformat Pandas Dataframe

I have a pandas.DataFrame with the following data: country branch Name salary mobile no emailid x a aa 250000 Null

Solution 1:

You can replaceNull to NaN and then groupby with agg and last reset_index:

print data_df
  country branch Name  salary   mobile no    emailid position
0x      a   aa  250000        Null       Null  unknown
1x      b   bb  3500008976646410  xx@xx.com  unknown
2y      c   cc  4500008777945411  yy@yy.com  unknown
3y      d   dd  589630        Null       Null  unknown

data_df = data_df.replace('Null', np.nan)
print data_df
  country branch Name  salary   mobile no    emailid position
0x      a   aa  250000         NaN        NaN  unknown
1x      b   bb  3500008976646410  xx@xx.com  unknown
2y      c   cc  4500008777945411  yy@yy.com  unknown
3y      d   dd  589630         NaN        NaN  unknown

df = data_df.groupby(['country', 'branch']).agg({'Name': 'count',
                                                 'mobile no':'count', 
                                                 'emailid': 'count',
                                                 'position': 'count'})

print df.reset_index()
  country branch  emailid  position  Name  mobile no0x      a        01101x      b        11112y      c        11113y      d        0110

EDIT:

If you need count positions by category, create columns for each category, then groupby with count, drop column salary and last reset_index:

print data_df
  country branch Name  salary   mobile no    emailid
0       x      a   aa  250000NullNull1       x      a   aa   20000NullNull2       x      b   bb  3500008976646410  xx@xx.com
3       y      c   cc   450008777945411  yy@yy.com
4       y      d   dd  589630NullNull

normal = data_df['salary'] <=20000
experienced = (data_df['salary'] >20000) & (data_df['salary'] <=50000)
unknown= data_df['salary'] >50000

data_df.loc[normal, 'position_normal'] ='normal employee'
data_df.loc[experienced,'position_experienced'] ='experienced employee'
data_df.loc[unknown,'position_unknown'] ='unknown employee'
print data_df
  country branch Name  salary   mobile no    emailid  position_normal  \
0       x      a   aa  250000NullNull              NaN   
1       x      a   aa   20000NullNull  normal employee   
2       x      b   bb  3500008976646410  xx@xx.com              NaN   
3       y      c   cc   450008777945411  yy@yy.com              NaN   
4       y      d   dd  589630NullNull              NaN   

   position_experienced  position_unknown  
0                   NaN  unknown employee  
1                   NaN               NaN  
2                   NaN  unknown employee  
3  experienced employee               NaN  
4                   NaN  unknown employee 
#replace Null to NaN
data_df = data_df.replace('Null', np.nan)
df = data_df.groupby(['country', 'branch']).count()
#remove column salarydf = df.drop('salary', axis=1)

df = df.reset_index()
printdf
  country branch  Name  mobile no  emailid  position_normal  \
0       x      a     2          0        0                1   
1       x      b     1          1        1                0   
2       y      c     1          1        1                0   
3       y      d     1          0        0                0   

   position_experienced  position_unknown  
0                     0                 1  
1                     0                 1  
2                     1                 0  
3                     0                 1  

Post a Comment for "Reformat Pandas Dataframe"