How To Flag Last Duplicate Element In A Pandas Dataframe

February 25, 2024 Post a Comment

As you know there is the method .duplicated to find duplicates in a column but what I need is the last duplicated element knowing that my data is ordered by Date. Here is the expec

Solution 1:

Use Series.duplicated or DataFrame.duplicated with specify column and parameter keep='last' and then convert inverted mask to integer for True/False to 1/0 mapping or use numpy.where:

df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)

Or:

df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)

print(df)IdPolicy_idStart_DateLast_dupLast_dup100b1232019/02/240011b1232019/03/240022b1232019/04/241133c1232018/09/010044c1232018/10/011155d1232017/02/240066d1232017/03/2411

Solution 2:

Can be done in below-mentioned way also (without using Series.duplicated) :

Baca Juga

dictionary = df[['Id','Policy_id']].set_index('Policy_id').to_dict()['Id']
#here the dictionary values contains the most recent Id'sdf['Last_dup'] = df.Id.apply(lambda x: 1 if x in list(dictionary.values()) else 0)

Learn Python Tutorials

How To Flag Last Duplicate Element In A Pandas Dataframe

Solution 1:

Solution 2:

Post a Comment for "How To Flag Last Duplicate Element In A Pandas Dataframe"