class documentation
Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here:
https://explosion.ai/blog/part-of-speech-pos-tagger-in-python
>>> from nltk.tag.perceptron import PerceptronTagger
Train the model
>>> tagger = PerceptronTagger(load=False)
>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')], ... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])
>>> tagger.tag(['today','is','a','beautiful','day']) [('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]
Use the pretrain model (the default constructor)
>>> pretrain = PerceptronTagger()
>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split()) [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> pretrain.tag("The red cat".split()) [('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]
Class Method | decode |
Undocumented |
Method | __init__ |
No summary |
Method | encode |
Undocumented |
Method | load |
No summary |
Method | normalize |
Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS |
Method | tag |
Tag tokenized sentences. :params tokens: list of word :type tokens: list(str) |
Method | train |
Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations. |
Constant | END |
Undocumented |
Constant | START |
Undocumented |
Class Variable | json |
Undocumented |
Instance Variable | classes |
Undocumented |
Instance Variable | model |
Undocumented |
Instance Variable | tagdict |
Undocumented |
Method | _get |
Map tokens into a feature representation, implemented as a {hashable: int} dict. If the features change, a new model must be trained. |
Method | _make |
Make a tag dictionary for single-tag words. :param sentences: A list of list of (word, tag) tuples. |
Instance Variable | _sentences |
Undocumented |
Inherited from TaggerI
:
Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
Method | tag |
Apply self.tag() to each element of sentences. I.e.: |
Method | _check |
Undocumented |
Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS
Returns | |
str | Undocumented |
overrides
nltk.tag.api.TaggerI.tag
Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)
Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.
Parameters | |
sentences | A list or iterator of sentences, where each sentence is a list of (words, tags) tuples. |
save | If not None, saves a pickled model in this location. |
nr | Number of training iterations. |
Map tokens into a feature representation, implemented as a {hashable: int} dict. If the features change, a new model must be trained.