Skip to content Skip to sidebar Skip to footer

Generate New Columns As A Combination Of Other Columns

I have a DataFrame that has several components of an identifier in the columns and a value associated with the identifier in another column. I want to be able to create n columns s

Solution 1:

Starting with your example data

In [3]: df
Out[3]: 
      foo  bar Type  ID  Index     Value

25090   x    9    A   0      0  23272000
25090   x    5    A   0      0  23272000
25091   x    3    A   1      0  22896000
25092   x    3    B   0      1  20048000
25093   y    6    A   0      0  19760000
25092   y    4    B   0      1  20823342

Concatenate each row's identifer by applying join row-wise.

In [4]: identifier = df[['Type', 'ID', 'Index']].apply(
             lambda x: '_'.join(map(str, x)), axis=1)

Make a Series from your Value column, and index it by the identifer and foo.

In [5]: v = df['Value']

In [6]: v.index = pd.MultiIndex.from_arrays([df['foo'], identifier])

In [7]: v
Out[7]: 
foo       
x    A_0_0    23272000
     A_0_0    23272000
     A_1_0    22896000
     B_0_1    20048000
y    A_0_0    19760000
     B_0_1    20823342
Name: Value, dtype: int64

Unstack it, and join it to the original DataFrame on 'foo'.

In [8]: df[['foo', 'bar']].join(v.drop_duplicates().unstack(), on='foo')
Out[8]: 
      foo  bar     A_0_0     A_1_0     B_0_1

25090   x    9  23272000  22896000  20048000
25090   x    5  23272000  22896000  20048000
25091   x    3  23272000  22896000  20048000
25092   x    3  23272000  22896000  20048000
25093   y    6  19760000       NaN  20823342
25092   y    4  19760000       NaN  20823342

Notice that I dropped the duplicates in v before unstacking it. This is essential. If you have different values for the same idenitifer anywhere in your dataset, you will run into trouble.

Minor points: Your example output has a row (25094) that is missing from your example input. Also, the NaNs in my output make sense: no value is specified by A_1_0 when foo='y'.


Post a Comment for "Generate New Columns As A Combination Of Other Columns"