Passing Term-document Matrix To Gensim Lda Model
Solution 1:
To treat a 2D numpy
(or even scipy.sparse.csc
) array as a gensim corpus, use the built-in matutils.Scipy2Corpus function.
Solution 2:
I believe Gensim uses pretty much the same structure to represent a bag of words corpus, but I don't think a default dictionary or numpy array would be compatible. Gensim's API lists a few "corpusreaders" that can accommodate various formats, but those seem to be built for importing data from other tool kits. So maybe in your case the easiest solution would be to reconstruct the documents using your matrix and dictionary as a list of separated strings. Then convert your list to Gensim's bag of word corpus and finally to LDA as shown in the tutorials.
This approach has the added benefit that you can apply Gensim's preprocessing functions and filter words with low/high frequencies.
Solution 3:
Given a numpy with with the document vectors in each row just use:
corpus = gensim.matutils.Dense2Corpus(array)
Post a Comment for "Passing Term-document Matrix To Gensim Lda Model"