Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables
I have a column with categories (A, B, C, D) I want to turn into dummy variables. Problem is, this column can contain multiple categories per row, like this: DF = pd.DataFrame({'Co
Solution 1:
Simplest way is
DF.Col.str.get_dummies(', ')
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 0 1 0
3 0 1 1 1
4 0 0 0 1
Slightly more complicated
from sklearn.preprocessing import MultiLabelBinarizer
from numpy.core.defchararray import split
mlb = MultiLabelBinarizer()
s = DF.Col.values.astype(str)
d = mlb.fit_transform(split(s, ', '))
pd.DataFrame(d, columns=mlb.classes_)
A B C D
0 1 0 0 0
1 1 1 0 0
2 1 0 1 0
3 0 1 1 1
4 0 0 0 1
Solution 2:
By using pd.crosstab
import pandas as pd
df = pd.DataFrame({'Col':['A', 'A,B', 'A,C', 'B,C,D', 'D']})
df.Col=df.Col.str.split(',')
df1=df.Col.apply(pd.Series).stack()
pd.crosstab(df1.index.get_level_values(0),df1)
Out[893]:
col_0 A B C D
row_0
0 1 0 0 0
1 1 1 0 0
2 1 0 1 0
3 0 1 1 1
4 0 0 0 1
Post a Comment for "Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables"