Skip to content Skip to sidebar Skip to footer

Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2

I'm new to pandas so likely overlooking something but I've been searching and haven't found anything helpful yet. What I'm trying to do is this. I have 2 dataframes. df1 has only

Solution 1:

I think you can use isin for testing matching of Series created from df2 by stack with Series created from one column df1 by squeeze. Last reshape by unstack:

df3 = df2.stack().isin(df1.squeeze()).unstack()
print (df3)
                1      2      3      4      5      6      7
8302813476  False  False  False  False  False  False  False
8302813477  False  False  False  False  False  False  False
8302813478  False  False   True  False  False  False  False

Then get find all values where at least one True by any:

a = df3.any(axis=1)
print (a)
8302813476    False
8302813477    False
8302813478     True
dtype: bool

And last boolean indexing:

print (a[a].index)
Int64Index([8302813478], dtype='int64')

Another solution is instead squeeze use df1['col'].unique(), thank you Ted Petrou:

df3 = df2.stack().isin(df1['col'].unique()).unstack()
print (df3)
                1      2      3      4      5      6      7
8302813476  False  False  False  False  False  False  False
8302813477  False  False  False  False  False  False  False
8302813478  False  False   True  False  False  False  False

---

I like squeeze more, but same output is simple selecting column of df1:

df3 = df2.stack().isin(df1['col']).unstack()
print (df3)
                1      2      3      4      5      6      7
8302813476  False  False  False  False  False  False  False
8302813477  False  False  False  False  False  False  False
8302813478  False  False   True  False  False  False  False

Solution 2:

As an interesting numpy alternative

l1 = df1.values.ravel()
l2 = df2.values.ravel()

pd.DataFrame(
    np.equal.outer(l1, l2).any(0).reshape(df2.values.shape),
    df2.index, df2.columns
)

or using set, list and comprehension

l1 = set(df1.values.ravel().tolist())
l2 = df2.values.ravel().tolist()

pd.DataFrame(
    np.array([bool(l1.intersection([d])) for d in l2]).reshape(df2.values.shape),
    df2.index, df2.columns
)

enter image description here


Post a Comment for "Compare Df1 Column 1 To All Columns In Df2 Returning The Index Of Df2"