«
class documentation

class PerceptronTagger(TaggerI): (source)

Constructor: PerceptronTagger(load)

View In Hierarchy

Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here:

https://explosion.ai/blog/part-of-speech-pos-tagger-in-python
>>> from nltk.tag.perceptron import PerceptronTagger

Train the model

>>> tagger = PerceptronTagger(load=False)
>>> tagger.train([[('today','NN'),('is','VBZ'),('good','JJ'),('day','NN')],
... [('yes','NNS'),('it','PRP'),('beautiful','JJ')]])
>>> tagger.tag(['today','is','a','beautiful','day'])
[('today', 'NN'), ('is', 'PRP'), ('a', 'PRP'), ('beautiful', 'JJ'), ('day', 'NN')]

Use the pretrain model (the default constructor)

>>> pretrain = PerceptronTagger()
>>> pretrain.tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
>>> pretrain.tag("The red cat".split())
[('The', 'DT'), ('red', 'JJ'), ('cat', 'NN')]
Class Method decode_json_obj Undocumented
Method __init__ No summary
Method encode_json_obj Undocumented
Method load No summary
Method normalize Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS
Method tag Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)
Method train Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.
Constant END Undocumented
Constant START Undocumented
Class Variable json_tag Undocumented
Instance Variable classes Undocumented
Instance Variable model Undocumented
Instance Variable tagdict Undocumented
Method _get_features Map tokens into a feature representation, implemented as a {hashable: int} dict. If the features change, a new model must be trained.
Method _make_tagdict Make a tag dictionary for single-tag words. :param sentences: A list of list of (word, tag) tuples.
Instance Variable _sentences Undocumented

Inherited from TaggerI:

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method tag_sents Apply self.tag() to each element of sentences. I.e.:
Method _check_params Undocumented
@classmethod
def decode_json_obj(cls, obj): (source)

Undocumented

def __init__(self, load=True): (source)
Parameters
loadLoad the pickled model upon instantiation.
def encode_json_obj(self): (source)

Undocumented

def load(self, loc): (source)
Parameters
loc:strLoad a pickled model at location.
def normalize(self, word): (source)

Normalization used in pre-processing. - All words are lower cased - Groups of digits of length 4 are represented as !YEAR; - Other digits are represented as !DIGITS

Returns
strUndocumented
def tag(self, tokens, return_conf=False, use_tagdict=True): (source)

Tag tokenized sentences. :params tokens: list of word :type tokens: list(str)

def train(self, sentences, save_loc=None, nr_iter=5): (source)

Train a model from sentences, and save it at save_loc. nr_iter controls the number of Perceptron training iterations.

Parameters
sentencesA list or iterator of sentences, where each sentence is a list of (words, tags) tuples.
save_locIf not None, saves a pickled model in this location.
nr_iterNumber of training iterations.
END: list[str] = (source)

Undocumented

Value
['-END-', '-END2-']
START: list[str] = (source)

Undocumented

Value
['-START-', '-START2-']
json_tag: str = (source)

Undocumented

Undocumented

Undocumented

tagdict: dict = (source)

Undocumented

def _get_features(self, i, word, context, prev, prev2): (source)

Map tokens into a feature representation, implemented as a {hashable: int} dict. If the features change, a new model must be trained.

def _make_tagdict(self, sentences): (source)

Make a tag dictionary for single-tag words. :param sentences: A list of list of (word, tag) tuples.

_sentences = (source)

Undocumented