Skip to content Skip to sidebar Skip to footer

How To Group Data By Ranges?

I have the following pandas dataframe (it is just a small extract): GROUP AVG_PERCENT_EVAL_1 AVG_PERCENT_NEGATIVE AVG_TOTAL_WAIT_TIME AVG_TOTAL_SERVICE_TIME AAAAA 19

Solution 1:

Steps:

1) Bin AVG_PERCENT_EVAL_1 into appropriate labels using pd.cut() by specifying a bin sequence.

Specifying include_lowest=True would take care of inclusiveness of the left endpoint "[" whereas right=False would make the right endpoint an open interval ")".

2) Using the returned categories, re-label them as per desired requirements.

3) Peform groupby making GROUP and the newly computed binned ranges as the grouped key, aggregate the means of all present columns after dropping AVG_PERCENT_EVAL_1 from them.


binning portion:

step=3kwargs = dict(include_lowest=True, right=False)
bins = pd.cut(df.AVG_PERCENT_EVAL_1, bins=np.arange(18,40+step,step), **kwargs)
labels = [(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)) for cat in bins.cat.categories]
bins.cat.categories = labels

assign and groupby.agg():

df = df.assign(AVG_PERCENT_RANGE=bins).drop("AVG_PERCENT_EVAL_1", axis=1)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')

enter image description here

Post a Comment for "How To Group Data By Ranges?"