How To Group Data By Ranges?
I have the following pandas dataframe (it is just a small extract): GROUP AVG_PERCENT_EVAL_1 AVG_PERCENT_NEGATIVE AVG_TOTAL_WAIT_TIME AVG_TOTAL_SERVICE_TIME AAAAA 19
Solution 1:
Steps:
1) Bin AVG_PERCENT_EVAL_1 into appropriate labels using pd.cut()
by specifying a bin
sequence.
Specifying include_lowest=True
would take care of inclusiveness of the left endpoint "["
whereas right=False
would make the right endpoint an open interval ")"
.
2) Using the returned categories, re-label them as per desired requirements.
3) Peform groupby
making GROUP and the newly computed binned ranges as the grouped key, aggregate the means of all present columns after dropping AVG_PERCENT_EVAL_1 from them.
binning portion:
step=3kwargs = dict(include_lowest=True, right=False)
bins = pd.cut(df.AVG_PERCENT_EVAL_1, bins=np.arange(18,40+step,step), **kwargs)
labels = [(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)) for cat in bins.cat.categories]
bins.cat.categories = labels
assign and groupby.agg()
:
df = df.assign(AVG_PERCENT_RANGE=bins).drop("AVG_PERCENT_EVAL_1", axis=1)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')
Post a Comment for "How To Group Data By Ranges?"