Skip to content Skip to sidebar Skip to footer

Pandas DataFrame: Aggregate Values Within Blocks Of Repeating IDs

Given a DataFrame with an ID column and corresponding values column, how can I aggregate (let's say sum) the values within blocks of repeating IDs? Example DF: import numpy as np i

Solution 1:

Here is necessary create helper Series with compare shifted values for not equal by ne with cumulative sums and pass to groupby, for id column is possible pass together in list, remove first level of MultiIndex by first reset_index(level=0, drop=True) and then convert index to column id:

print (df['id'].ne(df['id'].shift()).cumsum())
0     1
1     1
2     1
3     1
4     1
5     2
6     2
7     2
8     3
9     3
10    4
11    5
12    6
13    6
14    6
Name: id, dtype: int32

df1 = (df.groupby([df['id'].ne(df['id'].shift()).cumsum(), 'id'])['v'].sum()
          .reset_index(level=0, drop=True)
          .reset_index())
print (df1)
  id    v
0  a  5.0
1  b  3.0
2  a  2.0
3  b  1.0
4  a  1.0
5  b  3.0

Another idea is useGroupBy.agg with dictioanry and aggregate id column by GroupBy.first:

df1 = (df.groupby(df['id'].ne(df['id'].shift()).cumsum(), as_index=False)
         .agg({'id':'first', 'v':'sum'}))

Post a Comment for "Pandas DataFrame: Aggregate Values Within Blocks Of Repeating IDs"