Skip to content Skip to sidebar Skip to footer

Split Text Into Paragraphs Nltk - Usage Of Nltk.tokenize.texttiling?

I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this. Here is my attempt to use it. However, I don't understand h

Solution 1:

I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)

I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:

alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])

It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.

Post a Comment for "Split Text Into Paragraphs Nltk - Usage Of Nltk.tokenize.texttiling?"