Fill Missing Values In Selected Columns With Filtered Values In Other Column
I have a weird column named null in a dataframe that contains some missing values from other columns. One column is lat-lon coordinates named location, the other is an integer repr
Solution 1:
The easiest, if not the simplest approach, is to simply fill all the missing values in df.location
and df.level
with the values in df.null
, then create a boolean filter with regex to return innappropriate/misassigned values in df.location
and df.level
to np.nan
.
pd.fillna()
df=pd.DataFrame(
{'null': {0:'43.70477575,-72.28844073', 1:'2', 2:'43.70637091,-72.28704334', 3:'4', 4:'3'},
'location': {0:nan, 1:nan, 2:nan, 3:nan, 4:nan},
'level': {0:nan, 1:nan, 2:nan, 3:nan, 4:nan}
}
)forcolin ['location', 'level']:df[col].fillna(value=stress.null,inplace=True)
Now we'll use string expressions to correct the mis-assigned values.
str.contains()
# Converting columns to type str so string methods work
df = df.astype(str)
# Using regex to change values that don't belong in column to NaN
regex = '[,]'
df.loc[df.level.str.contains(regex), 'level'] = np.nan
regex = '^\d\.?0?$'
df.loc[df.location.str.contains(regex), 'location'] = np.nan
# Returning `df.level` to float datatype (str is the correct# datatype for `df.location`
df.level.astype(float)
Here's the output:
pd.DataFrame(
{'null': {0:'43.70477575,-72.28844073', 1:'2', 2:'43.70637091,-72.28704334', 3:'4', 4:'3'},
'location': {0:'43.70477575,-72.28844073', 1:nan, 2:'43.70637091,-72.28704334', 3:nan, 4:nan},
'level': {0:nan, 1:'2', 2:nan, 3:'4', 4:'3'}
}
)
Solution 2:
Let us try to_numeric
checker = pd.to_numeric(df.null, errors='coerce')
checker
Out[171]:
0 NaN
12.02 NaN
34.043.0
Name: null, dtype: float64
And apply isnull
, if return NaN
mean that is string not int
isstring=checker.isnull()Out[172]:0True1False2True3False4FalseName:null,dtype:bool# isnumber = checker.notnull()
Fill value
df.loc[isnumber, 'location'] = df['null']
df.loc[isstring, 'level'] = df['null']
Solution 3:
Another approach might use the method pandas.Series.mask
:
>>>df
null location level
0 43.70477575,-72.28844073 NaN NaN
1 2 NaN NaN
2 43.70637091,-72.28704334 NaN NaN
3 4 NaN NaN
4 3 NaN NaN
>>>df.level.mask(df.null.str.isnumeric(), other = df.null, inplace = True)>>>df.location.where(df.null.str.isnumeric(), other = df.null, inplace = True)>>>>>>df
null location level
0 43.70477575,-72.28844073 43.70477575,-72.28844073 NaN
1 2 NaN 2
2 43.70637091,-72.28704334 43.70637091,-72.28704334 NaN
3 4 NaN 4
4 3 NaN 3
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mask.htmlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html
Post a Comment for "Fill Missing Values In Selected Columns With Filtered Values In Other Column"