Generate New Columns As A Combination Of Other Columns
I have a DataFrame that has several components of an identifier in the columns and a value associated with the identifier in another column. I want to be able to create n columns s
Solution 1:
Starting with your example data
In[3]: dfOut[3]:
foobarTypeIDIndexValue25090x9A002327200025090x5A002327200025091x3A102289600025092x3B012004800025093y6A001976000025092y4B0120823342
Concatenate each row's identifer by applying join
row-wise.
In [4]: identifier = df[['Type', 'ID', 'Index']].apply(
lambda x: '_'.join(map(str, x)), axis=1)
Make a Series from your Value column, and index it by the identifer and foo.
In [5]: v = df['Value']
In [6]: v.index = pd.MultiIndex.from_arrays([df['foo'], identifier])
In [7]: v
Out[7]:
foo
x A_0_0 23272000
A_0_0 23272000
A_1_0 22896000
B_0_1 20048000
y A_0_0 19760000
B_0_1 20823342
Name: Value, dtype: int64
Unstack it, and join it to the original DataFrame on 'foo'.
In [8]: df[['foo', 'bar']].join(v.drop_duplicates().unstack(), on='foo')
Out[8]:
foo bar A_0_0 A_1_0 B_0_1
25090 x 923272000228960002004800025090 x 523272000228960002004800025091 x 323272000228960002004800025092 x 323272000228960002004800025093 y 619760000 NaN 2082334225092 y 419760000 NaN 20823342
Notice that I dropped the duplicates in v
before unstacking it. This is essential. If you have different values for the same idenitifer anywhere in your dataset, you will run into trouble.
Minor points: Your example output has a row (25094) that is missing from your example input. Also, the NaNs in my output make sense: no value is specified by A_1_0 when foo='y'.
Post a Comment for "Generate New Columns As A Combination Of Other Columns"