Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2
I'm new to pandas so likely overlooking something but I've been searching and haven't found anything helpful yet. What I'm trying to do is this. I have 2 dataframes. df1 has only
Solution 1:
I think you can use isin
for testing matching of Series
created from df2
by stack
with Series
created from one column df1
by squeeze
. Last reshape by unstack
:
df3 = df2.stack().isin(df1.squeeze()).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
Then get find all values where at least one True
by any
:
a = df3.any(axis=1)
print (a)
8302813476 False
8302813477 False
8302813478 True
dtype: bool
And last boolean indexing
:
print (a[a].index)
Int64Index([8302813478], dtype='int64')
Another solution is instead squeeze
use df1['col'].unique()
, thank you Ted Petrou:
df3 = df2.stack().isin(df1['col'].unique()).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
---
I like squeeze
more, but same output is simple selecting column of df1
:
df3 = df2.stack().isin(df1['col']).unstack()
print (df3)
1 2 3 4 5 6 7
8302813476 False False False False False False False
8302813477 False False False False False False False
8302813478 False False True False False False False
Solution 2:
As an interesting numpy alternative
l1 = df1.values.ravel()
l2 = df2.values.ravel()
pd.DataFrame(
np.equal.outer(l1, l2).any(0).reshape(df2.values.shape),
df2.index, df2.columns
)
or using set
, list
and comprehension
l1 = set(df1.values.ravel().tolist())
l2 = df2.values.ravel().tolist()
pd.DataFrame(
np.array([bool(l1.intersection([d])) for d in l2]).reshape(df2.values.shape),
df2.index, df2.columns
)
Post a Comment for "Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2"