How To Select First And Last Rows Of Each Unique Records In Pandas
How do I select first and last rows of all the unique records. I tried below code but I know it's not correct. First it takes only one column and others are missed in this one. fo
Solution 1:
UPDATE: After understanding OP's question better, I think I've come up with the proper solution
The initial table
+----------------+
|x |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |2 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |4 |2 |
|111003748 |3 |4 |
|111003748 |2 |3 |
|111003748 |1 |1 |
+----------------+
OP mentioned it was a time-series data, so I grouped the data by the time column ("x") and got the first and last row. I appended the two tables and sorted them by the index ("x") and removed duplicates to clean up the output.
g = df.groupby(['x'])
d = g.first().append(g.last()).sort_index().reset_index().drop_duplicates()
The final result is in d
as follows.
+----------------+
|x |y |z |
+----------------+
|111000004 |1 |1 |
|111000014 |5 |1 |
|111000014 |5 |2 |
|111001605 |2 |1 |
|111001605 |2 |2 |
|111003425 |1 |1 |
|111003425 |1 |3 |
|111003748 |4 |1 |
|111003748 |1 |1 |
+----------------+
To get all unique rows in a DataFrame, you can do this
unique_df = df.drop_duplicates()
Then to get the first and last row you can call head()
and tail()
on the unique_df
first = unique_df.head(1)
last = unique_df.tail(1)
Post a Comment for "How To Select First And Last Rows Of Each Unique Records In Pandas"