Skip to content Skip to sidebar Skip to footer

Creating A Column Based On The Presence Of Part Of A String In Multiple Other Columns

I have a dataframe called df that looks similar to this (except the number of 'mat_deliv' columns goes up to mat_deliv_8 and there are several hundred clients - I have simplified i

Solution 1:

You need to check each column individually. You can do this via apply, checking that the string contains the target text. Then apply any to the row (by specifying axis=1). Convert the boolean result to an integer via .astype(int), and then used assign to add it as a new column to the dataframe.

I used loc[:, target_cols] to specify my search range as all rows in the dataframe and all of the chosen target_cols.

target_cols = ['mat_deliv_1', 'mat_deliv_2', 'mat_deliv_3', 'mat_deliv_4']
df = (df
      .assign(xxx_deliv=df.loc[:, target_cols].apply(lambda col: col.str.contains('xxx'))
      .any(axis=1)
      .astype(int))
>>> df
  Client_ID  mat_deliv_1  mat_deliv_2 mat_deliv_3 mat_deliv_4  xxx_deliv
0  C1019876  xxx,yyy,zzz  aaa,bbb,xxx         ccc         ddd          11  C1018765      yyy,zzz          xxx         bbb        None12  C1017654      yyy,xxx      aaa,bbb         ccc         ddd          13  C1016543      aaa,bbb          ccc        NoneNone04  C1019876          yyy         NoneNoneNone0

Solution 2:

You could use apply:

defcontains(xs, pat='xxx'):
    returnint(any(pat in x for x in xs.values))


df['xxx_deliv'] = df[['mat_deliv_1', 'mat_deliv_2', 'mat_deliv_3', 'mat_deliv_4']].apply(contains, axis=1)
print(df)

Output

  Client_ID  mat_deliv_1    ...    mat_deliv_4 xxx_deliv
0  C1019876  xxx,yyy,zzz    ...            ddd         11  C1018765      yyy,zzz    ...           None12  C1017654      yyy,xxx    ...            ddd         13  C1016543      aaa,bbb    ...           None04  C1019876          yyy    ...           None0[5 rows x 6 columns]

Post a Comment for "Creating A Column Based On The Presence Of Part Of A String In Multiple Other Columns"