Skip to content Skip to sidebar Skip to footer

Finding Semantic Similarity Between 2 Statements

I am currently working with small application in python and my application has search functionality (currently using difflib) but I want to create Semantic Search which can give to

Solution 1:

I think it is not gensim embedding. It is word2vec embedding. Whatever it is.

You need tensorflow_hub

The Universal Sentence Encoder encodes text into high-dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.

I believe you need here is Text Classification or Semantic Similarity because you want to find nearest top 5 or 10 statements given statement from user.

It is easy to use. But size of model is ≈ 1GB. It works with words, sentences, phrases or short paragraphs. The input is variable length English text and the output is a 512 dimensional vector. You can find more information about it Here

Code

import tensorflow_hub as hub
import numpy as np

# Load model. It will download first time.
module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/5" 
model = hub.load(module_url)

# first data[0] is your actual value
data = ["display classes", "show", "showed" ,"displayed class", "show types"]

# find high-dimensional vectors.
vecs = model(data)

# find distance between statements using inner product
dists = np.inner(vecs[0], vecs)

# print distsprint(dists)

Output

array([0.9999999 , 0.5633253 , 0.46475542, 0.85303843, 0.61701006],dtype=float32)

Conclusion

First value 0.999999 is distance between display classes and display classes itself. second 0.5633253 is distance between display classes and show and last 0.61701006 is distance between display classes and show types.

Using this, you can find distance between given input and statements in db. then rank them according to distance.

Solution 2:

You can use wordnet for finding synonyms and then use these synonyms for finding similar statements.

import nltk
from nltk.corpus import wordnet as wn

nltk.download('wordnet')

defget_syn_list(gword):
  syn_list = []
  try:
    syn_list.extend(wn.synsets(gword,pos=wn.NOUN))
    syn_list.extend(wn.synsets(gword,pos=wn.VERB))
    syn_list.extend(wn.synsets(gword,pos=wn.ADJ))
    syn_list.extend(wn.synsets(gword,pos=wn.ADV))
  except :
    print("Something Wrong Happened")
  syn_words = []
  for i in syn_list:
    syn_words.append(i.lemmas()[0].name())
  return syn_words

Now use split and split your statements in db. like this

stat = ["display classes"]

syn_dict = {}
for i in stat:
   tmp = []
   for x in i.split(" "):
       tmp.extend(get_syn_list(x))
   syn_dict[i] = set(tmp)

Now you have synonyms just compare them with inputted text. And use lemmatizer before comparing words so that displayed become display.

Solution 3:

Hey you can use spacy

This answer is from https://medium.com/better-programming/the-beginners-guide-to-similarity-matching-using-spacy-782fc2922f7c

import spacy

nlp =  spacy.load("en_core_web_lg")

doc1 = nlp("display classes")
doc2 = nlp("show types")
print(doc1.similarity(doc2))

Output

0.6277548513279427

Edit

Run following command, which will download model.

!python -m spacy download en_core_web_lg

Post a Comment for "Finding Semantic Similarity Between 2 Statements"