Skip to content Skip to sidebar Skip to footer

Spacy Custom Sentence Spliting

I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the functi

Solution 1:

You could turn your component into a class that can be initialized with a list of delimiters? For example:

class MyCustomBoundary(object):
    def __init__(self, delimiters):
        self.delimiters = delimiters

    def __call__(self, doc):  # this is applied when you call it on a Doc
        for token in doc[:-1]:
            if token.text in self.delimiters:
                doc[token.i+1].is_sent_start = True
        return doc

You can then add it to your pipeline like this:

mycustom_boundary = MyCustomBoundary(delimiters=['...', '---'])
nlp.add_pipe(mycustom_boundary, before='parser')

Post a Comment for "Spacy Custom Sentence Spliting"