Skip to content Skip to sidebar Skip to footer

Spacy Lemmatizer Issue/consistency

I'm currently using spaCy for NLP purpose (mainly lemmatization and tokenization). The model used is en-core-web-sm (2.1.0). The following code is run to retrieve a list of words

Solution 1:

The issue was analysed by the spaCy team and they've come up with a solution. Here's the fix : https://github.com/explosion/spaCy/pull/3646

Basically, when the lemmatization rules were applied, a set was used to return a lemma. Since a set has no ordering, the returned lemma could change in between python session.


For example in my case, for the noun "leaves", the potential lemmas were "leave" and "leaf". Without ordering, the result was random - it could be "leave" or "leaf".

Post a Comment for "Spacy Lemmatizer Issue/consistency"