Reshaping And Encoding Multi-column Categorical Variables To One Hot Encoding
I have some data which looks as follows: Owner Label1 Label2 Label3 Bob Dog N/A N/A John Cat Mouse N/A Lee Dog Cat N/A
Solution 1:
Using
df.set_index('Owner').stack().str.get_dummies().sum(level=0)
Out[535]:
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 010000
John 100010
Lee 110000
Jane 001101
Or
s=df.melt('Owner')
pd.crosstab(s.Owner,s.value)
Out[540]:
value Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
Jane 0 0 1 1 0 1
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Solution 2:
You could use get_dummies
on the stacked dataset, then groupby and sum:
pd.get_dummies(df.set_index('Owner').stack()).groupby('Owner').sum()
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 010000
John 100010
Lee 110000
Jane 001101
Solution 3:
sklearn.preprocessing.MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
o, l = zip(*[[o, [*filter(pd.notna, l)]] for o, *l inzip(*map(df.get, df))])
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Bob 010000
John 100010
Lee 110000
Jane 001101
Same-ish answer
o = df.Owner
l = [[x for x in l if pd.notna(x)] for l in df.filter(like='Label').values]
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
Solution 4:
The pandas.get_dummies
function converts categorical variable into dummy/indicator variables in a single step
Post a Comment for "Reshaping And Encoding Multi-column Categorical Variables To One Hot Encoding"