Skip to content Skip to sidebar Skip to footer

How To Perform Efficient Queries With Gensim Doc2vec?

I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gen

Solution 1:

Creating your own subset of vectors, as a KeyedVectors instance, isn't quite as easy as it could or should be.

But, you should be able to use a WordEmbeddingsKeyedVectors (even though you're working with doc-vectors) that you load with just the vectors of interest. I haven't tested this, but assuming d2v_model is your Doc2Vec model, and list_of_tags are the tags you want in your subset, try something like:

subset_vectors = WordEmbeddingsKeyedVectors(vector_size)
subset_vectors.add(list_of_tags, d2v_model.docvecs[list_of_tags])

Then you can perform the usual operations, like most_similar() on subset_vectors.

Post a Comment for "How To Perform Efficient Queries With Gensim Doc2vec?"