Skip to content Skip to sidebar Skip to footer

Row To Columns While Keeping Part Of Dataframe, Display On Same Row

I am trying to move some of my rows and make the them columns, but keep a large portion of the dataframe the same. Resulting Dataframe: ID Thing Level1 Level2 Time OAttribut

Solution 1:

One way would be to create an intermediate dataframe and then use outer merge.

In [102]: df
Out[102]: 
   ID    Thing Level1 Level2  Time OAttribute IsTrue  Score  Value
01  bicycle  value  value  9:30   whatever    yes    1.0  type1
11  bicycle  value  value  9:30   whatever    yes    2.0  type2
22  bicycle  value  value  2:30   whatever     no    NaN    NaN
34  non-bic  value  value  3:30   whatever     no    4.0  type3
41  bicycle  value  value  9:30   whatever    yes    3.0  type3

In [103]: dg = pd.DataFrame(columns=pd.np.append(df['Value'].dropna().unique(), ['ID']))

In [104]: for i in range(len(df)):
     ...:     key = df.loc[i]['Value']
     ...:     value = df.loc[i]['Score']
     ...:     ID = df.loc[i]['ID']
     ...:     if key is not pd.np.nan:
     ...:         dg.loc[i, key] = value
     ...:         dg.loc[i, 'ID'] = ID
     ...:                 

In [105]: dg
Out[105]: 
  type1 type2 type3 ID
01   NaN   NaN  11   NaN     2   NaN  13   NaN   NaN     444   NaN   NaN     31

In [106]: dg.groupby('ID').max().reset_index()

In [107]: dg
Out[107]: 
   ID  type1  type2  type3
0112314    NaN    NaN      4

In [108]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[108]: 
   ID IsTrue Level1 Level2 OAttribute    Thing  Time type1 type2 type3
01    yes  value  value   whatever  bicycle  9:3012312     no  value  value   whatever  bicycle  2:3024     no  value  value   whatever  non-bic  3:304

Another way to calculate the intermediate data frame would be by avoiding the for loop and using unstack():

In [150]: df
Out[150]: 
   ID    Thing Level1 Level2  Time OAttribute IsTrue  Score  Value
01  bicycle  value  value  9:30   whatever    yes    1.0  type1
11  bicycle  value  value  9:30   whatever    yes    2.0  type2
22  bicycle  value  value  2:30   whatever     no    NaN    NaN
34  non-bic  value  value  3:30   whatever     no    4.0  type3
41  bicycle  value  value  9:30   whatever    yes    3.0  type3

In [151]: dg = df[['Score', 'Value']].dropna().set_index('Value', append=True).Score.unstack().join(df['ID']).groupby('ID').max().reset_index()

In [152]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[152]: 
   ID IsTrue Level1 Level2 OAttribute    Thing  Time type1 type2 type3
01    yes  value  value   whatever  bicycle  9:3012312     no  value  value   whatever  bicycle  2:3024     no  value  value   whatever  non-bic  3:304

Solution 2:

Can't really tell what you're trying to do with both of your Score and Value columns at the same time.

But if you're looking to transform your "Value" column, you're looking for something like one-hot encoding of your "Value" column and pandas has a very convenient function for it. All you have to do is:

pd.get_dummies(df['Value'])

That will give you a new data frame with 3 new columns namely [type1,type2,type3] filled a bunch of 1s and 0s.

After that, all you have to do is use the .join command to join it back to your original df. You can then proceed to delete the columns that you don't need.

Post a Comment for "Row To Columns While Keeping Part Of Dataframe, Display On Same Row"