Pandas Groupby And Then Select One Row
I hava pandas dataframe where I have to group by some columns. Most groups in the group by only have one row, but a few have more than one row. For each of these, I only want to ke
Solution 1:
Sort by date and then just grab the first row.
df.sort_values('date').groupby(['id', 'period', 'type']).first()
Solution 2:
Could also use nsmallest()
:
df.groupby(['id', 'period', 'type']).apply(lambda g: g.nsmallest(1, "date"))
Solution 3:
filter df
with the index of the minimum date.
idxmin
gets you that index. Then pass it to loc
.
df.loc[df.groupby(['id', 'period', 'type']).date.idxmin()]
consider df
df = pd.DataFrame([
['a', 'q', 'y', '2011-03-31'],
['a', 'q', 'y', '2011-05-31'],
['a', 'q', 'y', '2011-07-31'],
['b', 'q', 'x', '2011-12-31'],
['b', 'q', 'x', '2011-01-31'],
['b', 'q', 'x', '2011-08-31'],
], columns=['id', 'period', 'type', 'date'])
df.date = pd.to_datetime(df.date)
dfid period typedate
0 a q y 2011-03-31
1 a q y 2011-05-31
2 a q y 2011-07-31
3 b q x 2011-12-31
4 b q x 2011-01-31
5 b q x 2011-08-31
Then
df.loc[df.groupby(['id', 'period', 'type']).date.idxmin()]
id period type date
0aq y 2011-03-314bq x 2011-01-31
Post a Comment for "Pandas Groupby And Then Select One Row"