Skip to content Skip to sidebar Skip to footer

How To Select First And Last Rows Of Each Unique Records In Pandas

How do I select first and last rows of all the unique records. I tried below code but I know it's not correct. First it takes only one column and others are missed in this one. fo

Solution 1:

UPDATE: After understanding OP's question better, I think I've come up with the proper solution

The initial table

+----------------+
|x         |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |2 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |4 |2 |
|111003748 |3 |4 |
|111003748 |2 |3 |
|111003748 |1 |1 |
+----------------+

OP mentioned it was a time-series data, so I grouped the data by the time column ("x") and got the first and last row. I appended the two tables and sorted them by the index ("x") and removed duplicates to clean up the output.

g = df.groupby(['x'])
d = g.first().append(g.last()).sort_index().reset_index().drop_duplicates()

The final result is in d as follows.

+----------------+
|x         |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |1 |1 |
+----------------+

To get all unique rows in a DataFrame, you can do this

unique_df = df.drop_duplicates()

Then to get the first and last row you can call head() and tail() on the unique_df

first = unique_df.head(1)
last = unique_df.tail(1)

Post a Comment for "How To Select First And Last Rows Of Each Unique Records In Pandas"