Skip to content Skip to sidebar Skip to footer

Saving A Feature Vector For New Data In Scikit-learn

To create a machine learning algorithm I made a list of dictionaries and used scikit's DictVectorizer to make a feature vector for each item. I then created an SVM model from a dat

Solution 1:

How do I get new data to conform to the dimensions of the training vectors?

By using the transform method instead of fit_transform. The latter learns a new vocabulary from the data set you feed it.

Is there a way to use pickle to save the feature vector?

Pickle the trained vectorizer. Even better, make a Pipeline of the vectorizer and the SVM and pickle that. You can use sklearn.externals.joblib.dump for efficient pickling.

(Aside: the vectorizer is faster if you pass it the boolean True rather than the string "True".)

Post a Comment for "Saving A Feature Vector For New Data In Scikit-learn"