How To Parse Custom Tags Using Nltk.regexp.parser()
My question is similar to this unanswered question: Using custom POS tags for NLTK chunking?, but the error I am getting is different. I am trying to parse a sentence to which I ha
Solution 1:
nltk.RegexpParser can process custom tags.
Here is how you can modify your code to work:
# Import the RegexpParserfrom nltk.chunk import RegexpParser
# Define your custom tagged data.
tags = [(u'greatest', 'P'), (u'internet', 'NN'), (u'ever', 'A'),
(u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'),
(u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')]
# Define your custom grammar (modified to be a valid regex).
grammar = """ CHUNK: {<A>*<P>+} """# Create an instance of your custom parser.
custom_tag_parser = RegexpParser(grammar)
# Parse!
custom_tag_parser.parse(tags)
This is the result you would get for your test data:
Tree('S', [Tree('CHUNK', [(u'greatest', 'P')]), (u'internet', 'NN'), (u'ever', 'A'), (u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'), (u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')])
Solution 2:
I'm not familiar with NTLK, but in Python regular expressions ?*
is a syntax error. Perhaps you meant *?
which is a lazy quantifier.
Post a Comment for "How To Parse Custom Tags Using Nltk.regexp.parser()"