Pandas - Insert Rows Where Data Is Missing
I have a dataset, here is an example: df = DataFrame({'Seconds_left':[5,10,15,25,30,35,5,10,15,30], 'Team':['ATL','ATL','ATL','ATL','ATL','ATL','SAS','SAS','SAS','SAS'], 'Fouls': [
Solution 1:
Create a MultiIndex and reindex + reset_index:
idx = pd.MultiIndex.from_product([df['Team'].unique(),
np.arange(5, df['Seconds_left'].max()+1, 5)],
names=['Team', 'Seconds_left'])
df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out:
Team Seconds_left Fouls
0 ATL 5 1.0
1 ATL 10 2.0
2 ATL 15 3.0
3 ATL 20 NaN
4 ATL 25 3.0
5 ATL 30 4.0
6 ATL 35 5.0
7 SAS 5 5.0
8 SAS 10 4.0
9 SAS 15 1.0
10 SAS 20 NaN
11 SAS 25 NaN
12 SAS 30 1.0
13 SAS 35 NaN
Solution 2:
An approach using groupby
and merge
:
df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})
df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))
df_out['Team'] = df_out['Team'].fillna(method='ffill')
df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])
print(df_out)
Output:
Fouls Seconds_left Team
0 1.0 5 ATL
1 2.0 10 ATL
2 3.0 15 ATL
6 NaN 20 ATL
3 3.0 25 ATL
4 4.0 30 ATL
5 5.0 35 ATL
7 5.0 5 SAS
8 4.0 10 SAS
9 1.0 15 SAS
11 NaN 20 SAS
12 NaN 25 SAS
10 1.0 30 SAS
13 NaN 35 SAS
Solution 3:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['a', 'b'])
df.loc[len(df)] = [1,np.NaN]
Post a Comment for "Pandas - Insert Rows Where Data Is Missing"