Transforming A Column Into Multiple Columns According To Their Values
In Python, I am wondering if there is a way to transform a one-column dataframe from this: into this:
Solution 1:
Source DF:
In[204]: dfOut[204]:
Country0Italy1Indonesia2Canada3Italy
we can use pd.get_dummies():
In [205]: pd.get_dummies(df.Country)
Out[205]:
Canada Indonesia Italy
0001101021003001
Or sklearn.feature_extraction.text.CountVectorizer:
In [211]: from sklearn.feature_extraction.text import CountVectorizer
In [212]: cv = CountVectorizer()
In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country),
columns=cv.get_feature_names(),
index=df.index,
default_fill_value=0)
In [214]: r
Out[214]:
canada indonesia italy
0001101021003001
Solution 2:
Couple of additional options
pd.Series.str.get_dummies
df.Country.str.get_dummies()
Canada Indonesia Italy
0001101021003001
pd.DataFrame.groupby
with value_counts
df.groupby(level=0).Country.value_counts().unstack(fill_value=0)
Country Canada Indonesia Italy
0001101021003001
pd.factorize
+ np.bincount
f, u = pd.factorize(df.Country.values)
pd.DataFrame(
np.bincount(
f + np.arange(f.size) * u.size, minlength=u.size * f.size
).reshape(f.size, u.size),
df.index, u
)
Italy Indonesia Canada
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
pd.factorize
+ np.eye
f, u = pd.factorize(df.Country.values)
pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)
Italy Indonesia Canada
0100101020013100
pd.factorize
+ array slice assignment
f, u = pd.factorize(df.Country.values)
a = np.zeros((f.size, u.size), dtype=int)
a[np.arange(f.size), f] = 1
pd.DataFrame(a, df.index, u)
Italy Indonesia Canada
0100101020013100
Post a Comment for "Transforming A Column Into Multiple Columns According To Their Values"