nltk.tag.perceptron.PerceptronTagger

class documentation

class PerceptronTagger(TaggerI): (source)

Constructor: PerceptronTagger(load)

Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here:

https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

>>> from nltk.tag.perceptron import PerceptronTagger

Train the model

>>> tagger = PerceptronTagger(load=False)

>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')],
... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])

>>> tagger.tag(['today','is','a','beautiful','day'])
[('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]

Use the pretrain model (the default constructor)

>>> pretrain = PerceptronTagger()

>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

>>> pretrain.tag("The red cat".split())
[('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]

Class Method	`decode_json_obj`	Undocumented
Method	`__init__`	No summary
Method	`encode_json_obj`	Undocumented
Method	`load`	No summary
Method	`normalize`	Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS
Method	`tag`	Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)
Method	`train`	Train a model from sentences, and save it at `save_loc`. `nr_iter` controls the number of Perceptron training iterations.
Constant	`END`	Undocumented
Constant	`START`	Undocumented
Class Variable	`json_tag`	Undocumented
Instance Variable	`classes`	Undocumented
Instance Variable	`model`	Undocumented
Instance Variable	`tagdict`	Undocumented
Method	`_get_features`	Map tokens into a feature representation, implemented as a {hashable: int} dict. If the features change, a new model must be trained.
Method	`_make_tagdict`	Make a tag dictionary for single-tag words. :param sentences: A list of list of (word, tag) tuples.
Instance Variable	`_sentences`	Undocumented

Inherited from TaggerI:

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`tag_sents`	Apply `self.tag()` to each element of sentences. I.e.:
Method	`_check_params`	Undocumented

@classmethod
def decode_json_obj(cls, obj): (source) ¶

Undocumented

def __init__(self, load=True): (source) ¶

Parameters
load	Load the pickled model upon instantiation.

def encode_json_obj(self): (source) ¶

Undocumented

def load(self, loc): (source) ¶

Parameters
loc:str	Load a pickled model at location.

def normalize(self, word): (source) ¶

Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS

Returns
str	Undocumented

def tag(self, tokens, return_conf=False, use_tagdict=True): (source) ¶

overrides nltk.tag.api.TaggerI.tag

Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)

def train(self, sentences, save_loc=None, nr_iter=5): (source) ¶

Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.

Parameters
sentences	A list or iterator of sentences, where each sentence is a list of (words, tags) tuples.
save_loc	If not `None`, saves a pickled model in this location.
nr_iter	Number of training iterations.