How Can I Create An Artificial Key Column For Merging Two Datasets Using Difflab When The Column Of Interest Has Missing Cells?
Goal: If the name in df2 in row i is a sub-string or an exact match of a name in df1 in some row N and the state and district columns of row N in df1 are a match to the respective
Solution 1:
You are getting a list
type object back. And these lists dont have index 0
. Thats why you get this error. Second of all, we need to convert these lists
to type string
to be able to do the merge like following:
note: you dont have to use: df1['CandidateName'] = df1['CandidateName'].replace('', 'EMPTY')
import difflib
df1['Name'] = df1['CandidateName'].apply(lambda x: ''.join(difflib.get_close_matches(x, df2['Name'])))
df_merge = df1.merge(df2.drop('Party', axis=1), on=['Name', 'State', 'District'], how='left')
print(df_merge)
CandidateName State District Party Name
0 Theodorick A. Bland VA 9 Theodorick Bland
1 Aedanus Rutherford Burke SC 2 Aedanus Burke
2 Jason Lewis MN 2 Jason Lewis
3 Barbara Comstock VA 10 Democrat Barbara Comstock
4 Theodorick Bland VA 9 Theodorick Bland
5 Aedanus Burke SC 2 Aedanus Burke
6 Jason Initial Lewis MN 2 Democrat Jason Lewis
7 NH 1 Whig
8 NH 1 Whig
Note I added how='left'
argument to our merge
since you want to keep the shape of your original dataframe.
Explanation of ''.join()
We do this to convert the list to string, see example:
lst = ['hello', 'world']
print(' '.join(lst))
'hello world'
Post a Comment for "How Can I Create An Artificial Key Column For Merging Two Datasets Using Difflab When The Column Of Interest Has Missing Cells?"