Split Text Into Paragraphs Nltk - Usage Of Nltk.tokenize.texttiling?
I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this. Here is my attempt to use it. However, I don't understand h
Solution 1:
I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)
I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:
alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])
It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.
Post a Comment for "Split Text Into Paragraphs Nltk - Usage Of Nltk.tokenize.texttiling?"