Spacy Lemmatizer Issue/consistency

May 11, 2024 Post a Comment

I'm currently using spaCy for NLP purpose (mainly lemmatization and tokenization). The model used is en-core-web-sm (2.1.0). The following code is run to retrieve a list of words

Solution 1:

The issue was analysed by the spaCy team and they've come up with a solution. Here's the fix : https://github.com/explosion/spaCy/pull/3646

Basically, when the lemmatization rules were applied, a set was used to return a lemma. Since a set has no ordering, the returned lemma could change in between python session.

For example in my case, for the noun "leaves", the potential lemmas were "leave" and "leaf". Without ordering, the result was random - it could be "leave" or "leaf".

Baca Juga

How To Do Text Pre-processing Using Spacy?
Ignore Out-of-vocabulary Words When Averaging Vectors In Spacy
How To Get The Span Of A Conjunct In Spacy?

Learn Python Tutorials

Spacy Lemmatizer Issue/consistency

Solution 1:

Post a Comment for "Spacy Lemmatizer Issue/consistency"