Skip to content Skip to sidebar Skip to footer

Transforming A Column Into Multiple Columns According To Their Values

In Python, I am wondering if there is a way to transform a one-column dataframe from this: into this:

Solution 1:

Source DF:

In[204]: dfOut[204]:
     Country0Italy1Indonesia2Canada3Italy

we can use pd.get_dummies():

In [205]: pd.get_dummies(df.Country)
Out[205]:
   Canada  Indonesia  Italy
0001101021003001

Or sklearn.feature_extraction.text.CountVectorizer:

In [211]: from sklearn.feature_extraction.text import CountVectorizer

In [212]: cv = CountVectorizer()

In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country), 
                                 columns=cv.get_feature_names(), 
                                 index=df.index,
                                 default_fill_value=0)

In [214]: r
Out[214]:
   canada  indonesia  italy
0001101021003001

Solution 2:

Couple of additional options

pd.Series.str.get_dummies

df.Country.str.get_dummies()

   Canada  Indonesia  Italy
0001101021003001

pd.DataFrame.groupby with value_counts

df.groupby(level=0).Country.value_counts().unstack(fill_value=0)

Country  Canada  Indonesia  Italy
0001101021003001

pd.factorize + np.bincount

f, u = pd.factorize(df.Country.values)

pd.DataFrame(
    np.bincount(
        f + np.arange(f.size) * u.size, minlength=u.size * f.size
    ).reshape(f.size, u.size),
    df.index, u
)

   Italy  Indonesia  Canada
0      1          0       0
1      0          1       0
2      0          0       1
3      1          0       0

pd.factorize + np.eye

f, u = pd.factorize(df.Country.values)
pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u)

   Italy  Indonesia  Canada
0100101020013100

pd.factorize + array slice assignment

f, u = pd.factorize(df.Country.values)
a = np.zeros((f.size, u.size), dtype=int)
a[np.arange(f.size), f] = 1
pd.DataFrame(a, df.index, u)

   Italy  Indonesia  Canada
0100101020013100

Post a Comment for "Transforming A Column Into Multiple Columns According To Their Values"