How To Perform Efficient Queries With Gensim Doc2vec?

August 27, 2023 Post a Comment

I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gen

Solution 1:

Creating your own subset of vectors, as a KeyedVectors instance, isn't quite as easy as it could or should be.

But, you should be able to use a WordEmbeddingsKeyedVectors (even though you're working with doc-vectors) that you load with just the vectors of interest. I haven't tested this, but assuming d2v_model is your Doc2Vec model, and list_of_tags are the tags you want in your subset, try something like:

subset_vectors = WordEmbeddingsKeyedVectors(vector_size)
subset_vectors.add(list_of_tags, d2v_model.docvecs[list_of_tags])

Then you can perform the usual operations, like most_similar() on subset_vectors.

Baca Juga

Finding Most Similar Sentences Among All In Python
Strategies For Finding Duplicate Mailing Addresses
Comparing Similarity Between Multiple Strings With A Random Starting Point

Learn Python Tutorials

How To Perform Efficient Queries With Gensim Doc2vec?

Solution 1:

Post a Comment for "How To Perform Efficient Queries With Gensim Doc2vec?"