Skip to content Skip to sidebar Skip to footer

Reshaping And Encoding Multi-column Categorical Variables To One Hot Encoding

I have some data which looks as follows: Owner Label1 Label2 Label3 Bob Dog N/A N/A John Cat Mouse N/A Lee Dog Cat N/A

Solution 1:

Using

df.set_index('Owner').stack().str.get_dummies().sum(level=0)
Out[535]: 
       Cat  Dog  Ferret  Hamster  Mouse  Rat
Owner                                       
Bob      010000
John     100010
Lee      110000
Jane     001101

Or

s=df.melt('Owner')
pd.crosstab(s.Owner,s.value)
Out[540]: 
value  Cat  Dog  Ferret  Hamster  Mouse  Rat
Owner                                       
Bob      0    1       0        0      0    0
Jane     0    0       1        1      0    1
John     1    0       0        0      1    0
Lee      1    1       0        0      0    0

Solution 2:

You could use get_dummies on the stacked dataset, then groupby and sum:

pd.get_dummies(df.set_index('Owner').stack()).groupby('Owner').sum()

       Cat  Dog  Ferret  Hamster  Mouse  Rat
Owner                                       
Bob      010000
John     100010
Lee      110000
Jane     001101

Solution 3:

sklearn.preprocessing.MultiLabelBinarizer

from sklearn.preprocessing import MultiLabelBinarizer

o, l = zip(*[[o, [*filter(pd.notna, l)]] for o, *l inzip(*map(df.get, df))])

mlb = MultiLabelBinarizer()

d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)

      Cat  Dog  Ferret  Hamster  Mouse  Rat
Bob     010000
John    100010
Lee     110000
Jane    001101

Same-ish answer

o = df.Owner
l = [[x for x in l if pd.notna(x)] for l in df.filter(like='Label').values]

mlb = MultiLabelBinarizer()

d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)

       Cat  Dog  Ferret  Hamster  Mouse  Rat
Owner                                       
Bob      0    1       0        0      0    0
John     1    0       0        0      1    0
Lee      1    1       0        0      0    0
Jane     0    0       1        1      0    1

Solution 4:

The pandas.get_dummies function converts categorical variable into dummy/indicator variables in a single step

Post a Comment for "Reshaping And Encoding Multi-column Categorical Variables To One Hot Encoding"