Pandas: Calculate Overlapping Words Between Rows Only If Values In Another Column Match
I have a dataframe that looks like the following, but with many rows: import pandas as pd data = {'intent': ['order_food', 'order_food','order_taxi','order_call','order_call','or
Solution 1:
IIUC you just need to iterate over the unique values in the intent
column and then use loc
to grab just the rows that correspond to that. If you have more than two rows you will still need to use combinations
to get the unique combinations
between similar intents.
from itertools import combinations
for intent in df.intent.unique():
# loc returns a DataFrame but we need just the column
rows = df.loc[df.intent == intent, ["Sent"]].Sent.to_list()
combos = combinations(rows, 2)
for combo in combos:
x, y = rows
overlap = lexical_overlap(x, y)
print(f"Overlap for ({x}) and ({y}) is {overlap}")
# Overlap for (i need hamburger) and (she wants sushi) is 46.666666666666664# Overlap for (i need a cab) and (i would like a new taxi) is 40.0# Overlap for (call me at 6) and (she called me) is 54.54545454545454
Solution 2:
ok, so I figured out what to do to get my desired output mentioned in the comments based on @gold_cy 's answer:
for intent in df.intent.unique():
# loc returns a DataFrame but we need just the column
rows = df.loc[df.intent == intent,['intent','key_words','Sent']].values.tolist()
combos = combinations(rows, 2)
for combo in combos:
x, y = rows
overlap = lexical_overlap(x[1], y[1])
print(f"Overlap of intent ({x[0]}) for ({x[2]}) and ({y[2]}) is {overlap}")
Post a Comment for "Pandas: Calculate Overlapping Words Between Rows Only If Values In Another Column Match"