Skip to content Skip to sidebar Skip to footer

Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables

I have a column with categories (A, B, C, D) I want to turn into dummy variables. Problem is, this column can contain multiple categories per row, like this: DF = pd.DataFrame({'Co

Solution 1:

Simplest way is

DF.Col.str.get_dummies(', ')

   A  B  C  D
0  1  0  0  0
1  1  1  0  0
2  1  0  1  0
3  0  1  1  1
4  0  0  0  1

Slightly more complicated

from sklearn.preprocessing import MultiLabelBinarizer
from numpy.core.defchararray import split

mlb = MultiLabelBinarizer()
s = DF.Col.values.astype(str)
d = mlb.fit_transform(split(s, ', '))

pd.DataFrame(d, columns=mlb.classes_)

   A  B  C  D
0  1  0  0  0
1  1  1  0  0
2  1  0  1  0
3  0  1  1  1
4  0  0  0  1

Solution 2:

By using pd.crosstab

import pandas as pd
df = pd.DataFrame({'Col':['A', 'A,B', 'A,C', 'B,C,D', 'D']})
df.Col=df.Col.str.split(',')
df1=df.Col.apply(pd.Series).stack()
pd.crosstab(df1.index.get_level_values(0),df1)

Out[893]: 
col_0  A  B  C  D
row_0            
0      1  0  0  0
1      1  1  0  0
2      1  0  1  0
3      0  1  1  1
4      0  0  0  1

Post a Comment for "Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables"