Pandas: Multiple Rolling Periods
Solution 1:
I would suggest creating a DataFrame with a MultiIndex as its columns. There's no way around using a loop here to iterate over your windows. The resulting form will be something that's easy to index and easy to read with pd.read_csv
. Initialize an empty DataFrame with np.empty
of the appropriate shape and use .loc
to assign its values.
import numpy as np
import pandas as pd
np.random.seed(123)
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
windows = [5, 15, 30, 45]
stats = ['mean', 'std']
cols = pd.MultiIndex.from_product([windows, df.columns, stats],
names=['window', 'feature', 'metric'])
df2 = pd.DataFrame(np.empty((df.shape[0], len(cols))), columns=cols,
index=df.index)
forwindowinwindows:
df2.loc[:, window] = df.rolling(window=window).agg(stats).values
Now you have a result df2
that has the same index as your original object. It has 3 column levels: the first is the window, the second is the columns from your original frame, and the third is the statistic.
print(df2.shape)
(100, 24)
This makes it easy to check values for a specific rolling window:
print(df2[5]) # Rolling window = 5
feature col0 col1 col2
metric mean std mean std mean std
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 -0.878791.45348 -0.265590.712360.532330.89430
.. ... ... ... ... ... ...
95 -0.442311.02552 -1.221380.45140 -0.364400.9532496 -0.586381.10246 -0.901650.79723 -0.445431.0016697 -0.705640.85711 -0.426441.07174 -0.447661.0028498 -0.957021.01302 -0.037051.050660.164371.3234199 -0.570261.109780.087301.024380.399301.31240print(df2[5]['col0']) # Rolling window = 5, stats of col0 only
metric mean std
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 -0.878791.45348
.. ... ...
95 -0.442311.0255296 -0.586381.1024697 -0.705640.8571198 -0.957021.0130299 -0.570261.10978print(df2.loc[:, (5, slice(None), 'mean')]) # Rolling window = 5,# means of each column
period 5
feature col0 col1 col2
metric mean mean mean
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 -0.87879 -0.265590.53233
.. ... ... ...
95 -0.44231 -1.22138 -0.3644096 -0.58638 -0.90165 -0.4454397 -0.70564 -0.42644 -0.4476698 -0.95702 -0.037050.1643799 -0.570260.087300.39930
And lastly to make a single-indexed DataFrame, here's some kludgy use of itertools
.
df = pd.DataFrame(np.random.randn(100,3)).add_prefix('col')
import itertools
means = [col + '_mean' forcolin df.columns]
stds = [col + '_std' forcolin df.columns]
iters = [iter(means), iter(stds)]
iters = list(it.__next__() foritin itertools.cycle(iters))
iters = list(itertools.product(iters, [str(win) forwinin windows]))
iters = ['_'.join(it) foritin iters]
df2 = [df.rolling(window=window).agg(stats).values forwindowin windows]
df2 = pd.DataFrame(np.concatenate(df2, axis=1), columns=iters,
index=df.index)
Solution 2:
You can concatenate output of multiple rolling aggregations:
windows = (5, 15, 30, 45)
rolling_dfs = (df.rolling(i) # 1. Create window
.agg(['mean', 'std']) # 1. Aggregate
.rename_axis({col: '{0}_{1:d}'.format(col, i)
for col in df.columns}, axis=1) # 2. Rename columnsfor i in windows) # For each window
pd.concat((df, *rolling_dfs), axis=1) # 3. Concatenate dataframes
This is not pretty but should do what you're looking for from what I understand.
What it does:
- creates a generator
rolling_dfs
with the aggregated dataframes for each rolling window size. - renames all columns so you can know which rolling window size it refers to.
- concatenates the original
df
with the rolling windows.
Post a Comment for "Pandas: Multiple Rolling Periods"