Creating A Column Based On The Presence Of Part Of A String In Multiple Other Columns
I have a dataframe called df that looks similar to this (except the number of 'mat_deliv' columns goes up to mat_deliv_8 and there are several hundred clients - I have simplified i
Solution 1:
You need to check each column individually. You can do this via apply
, checking that the string contains the target text. Then apply any
to the row (by specifying axis=1
). Convert the boolean result to an integer via .astype(int)
, and then used assign
to add it as a new column to the dataframe.
I used loc[:, target_cols]
to specify my search range as all rows in the dataframe and all of the chosen target_cols
.
target_cols = ['mat_deliv_1', 'mat_deliv_2', 'mat_deliv_3', 'mat_deliv_4']
df = (df
.assign(xxx_deliv=df.loc[:, target_cols].apply(lambda col: col.str.contains('xxx'))
.any(axis=1)
.astype(int))
>>> df
Client_ID mat_deliv_1 mat_deliv_2 mat_deliv_3 mat_deliv_4 xxx_deliv
0 C1019876 xxx,yyy,zzz aaa,bbb,xxx ccc ddd 11 C1018765 yyy,zzz xxx bbb None12 C1017654 yyy,xxx aaa,bbb ccc ddd 13 C1016543 aaa,bbb ccc NoneNone04 C1019876 yyy NoneNoneNone0
Solution 2:
You could use apply:
defcontains(xs, pat='xxx'):
returnint(any(pat in x for x in xs.values))
df['xxx_deliv'] = df[['mat_deliv_1', 'mat_deliv_2', 'mat_deliv_3', 'mat_deliv_4']].apply(contains, axis=1)
print(df)
Output
Client_ID mat_deliv_1 ... mat_deliv_4 xxx_deliv
0 C1019876 xxx,yyy,zzz ... ddd 11 C1018765 yyy,zzz ... None12 C1017654 yyy,xxx ... ddd 13 C1016543 aaa,bbb ... None04 C1019876 yyy ... None0[5 rows x 6 columns]
Post a Comment for "Creating A Column Based On The Presence Of Part Of A String In Multiple Other Columns"