Skip to content Skip to sidebar Skip to footer

Pandas Groupby And Then Select One Row

I hava pandas dataframe where I have to group by some columns. Most groups in the group by only have one row, but a few have more than one row. For each of these, I only want to ke

Solution 1:

Sort by date and then just grab the first row.

df.sort_values('date').groupby(['id', 'period', 'type']).first()

Solution 2:

Could also use nsmallest():

df.groupby(['id', 'period', 'type']).apply(lambda g: g.nsmallest(1, "date"))

Solution 3:

filter df with the index of the minimum date. idxmin gets you that index. Then pass it to loc.

df.loc[df.groupby(['id', 'period', 'type']).date.idxmin()]

consider df

df = pd.DataFrame([
        ['a', 'q', 'y', '2011-03-31'],
        ['a', 'q', 'y', '2011-05-31'],
        ['a', 'q', 'y', '2011-07-31'],
        ['b', 'q', 'x', '2011-12-31'],
        ['b', 'q', 'x', '2011-01-31'],
        ['b', 'q', 'x', '2011-08-31'],
    ], columns=['id', 'period', 'type', 'date'])
df.date = pd.to_datetime(df.date)

dfid period typedate
0  a      q    y 2011-03-31
1  a      q    y 2011-05-31
2  a      q    y 2011-07-31
3  b      q    x 2011-12-31
4  b      q    x 2011-01-31
5  b      q    x 2011-08-31

Then

df.loc[df.groupby(['id', 'period', 'type']).date.idxmin()]

  id period type       date
0aq    y 2011-03-314bq    x 2011-01-31

Post a Comment for "Pandas Groupby And Then Select One Row"