Row To Columns While Keeping Part Of Dataframe, Display On Same Row
Solution 1:
One way would be to create an intermediate dataframe and then use outer merge.
In [102]: df
Out[102]:
ID Thing Level1 Level2 Time OAttribute IsTrue Score Value
01 bicycle value value 9:30 whatever yes 1.0 type1
11 bicycle value value 9:30 whatever yes 2.0 type2
22 bicycle value value 2:30 whatever no NaN NaN
34 non-bic value value 3:30 whatever no 4.0 type3
41 bicycle value value 9:30 whatever yes 3.0 type3
In [103]: dg = pd.DataFrame(columns=pd.np.append(df['Value'].dropna().unique(), ['ID']))
In [104]: for i in range(len(df)):
...: key = df.loc[i]['Value']
...: value = df.loc[i]['Score']
...: ID = df.loc[i]['ID']
...: if key is not pd.np.nan:
...: dg.loc[i, key] = value
...: dg.loc[i, 'ID'] = ID
...:
In [105]: dg
Out[105]:
type1 type2 type3 ID
01 NaN NaN 11 NaN 2 NaN 13 NaN NaN 444 NaN NaN 31
In [106]: dg.groupby('ID').max().reset_index()
In [107]: dg
Out[107]:
ID type1 type2 type3
0112314 NaN NaN 4
In [108]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[108]:
ID IsTrue Level1 Level2 OAttribute Thing Time type1 type2 type3
01 yes value value whatever bicycle 9:3012312 no value value whatever bicycle 2:3024 no value value whatever non-bic 3:304
Another way to calculate the intermediate data frame would be by avoiding the for loop and using unstack():
In [150]: df
Out[150]:
ID Thing Level1 Level2 Time OAttribute IsTrue Score Value
01 bicycle value value 9:30 whatever yes 1.0 type1
11 bicycle value value 9:30 whatever yes 2.0 type2
22 bicycle value value 2:30 whatever no NaN NaN
34 non-bic value value 3:30 whatever no 4.0 type3
41 bicycle value value 9:30 whatever yes 3.0 type3
In [151]: dg = df[['Score', 'Value']].dropna().set_index('Value', append=True).Score.unstack().join(df['ID']).groupby('ID').max().reset_index()
In [152]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[152]:
ID IsTrue Level1 Level2 OAttribute Thing Time type1 type2 type3
01 yes value value whatever bicycle 9:3012312 no value value whatever bicycle 2:3024 no value value whatever non-bic 3:304
Solution 2:
Can't really tell what you're trying to do with both of your Score and Value columns at the same time.
But if you're looking to transform your "Value" column, you're looking for something like one-hot encoding of your "Value" column and pandas has a very convenient function for it. All you have to do is:
pd.get_dummies(df['Value'])
That will give you a new data frame with 3 new columns namely [type1,type2,type3] filled a bunch of 1s and 0s.
After that, all you have to do is use the .join command to join it back to your original df. You can then proceed to delete the columns that you don't need.
Post a Comment for "Row To Columns While Keeping Part Of Dataframe, Display On Same Row"