How To Perform Efficient Queries With Gensim Doc2vec?
I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gen
Solution 1:
Creating your own subset of vectors, as a KeyedVectors
instance, isn't quite as easy as it could or should be.
But, you should be able to use a WordEmbeddingsKeyedVectors
(even though you're working with doc-vectors) that you load with just the vectors of interest. I haven't tested this, but assuming d2v_model
is your Doc2Vec
model, and list_of_tags
are the tags you want in your subset, try something like:
subset_vectors = WordEmbeddingsKeyedVectors(vector_size)
subset_vectors.add(list_of_tags, d2v_model.docvecs[list_of_tags])
Then you can perform the usual operations, like most_similar()
on subset_vectors
.
Post a Comment for "How To Perform Efficient Queries With Gensim Doc2vec?"