nltk.tag.sequential.NgramTagger

class documentation

class NgramTagger(ContextTagger): (source)

Known subclasses: nltk.tag.sequential.BigramTagger, nltk.tag.sequential.TrigramTagger, nltk.tag.sequential.UnigramTagger

Constructor: NgramTagger(n, train, model, backoff, ...)

View In Hierarchy

A tagger that chooses a token's tag based on its word string and on the preceding n word's tags. In particular, a tuple (tags[i-n:i-1], words[i]) is looked up in a table, and the corresponding tag is returned. N-gram taggers are typically trained on a tagged corpus.

Train a new NgramTagger using the given training data or the supplied model. In particular, construct a new tagger whose table maps from each context (tag[i-n:i-1], word[i]) to the most frequent tag for that context. But exclude any contexts that are already tagged perfectly by the backoff tagger.

Parameters
train	A tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples.
backoff	A backoff tagger, to be used by the new tagger if it encounters an unknown context.
cutoff	If the most likely tag for a context occurs fewer than cutoff times, then exclude it from the context-to-tag table for the new tagger.

Class Method	`decode_json_obj`	Undocumented
Method	`__init__`	No summary
Method	`context`	No summary
Method	`encode_json_obj`	Undocumented
Class Variable	`json_tag`	Undocumented
Instance Variable	`_n`	Undocumented

Inherited from ContextTagger:

Method	`__repr__`	Undocumented
Method	`choose_tag`	Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Method	`size`	No summary
Method	`_train`	Initialize this ContextTagger's `_context_to_tag` table based on the given training data. In particular, for each context `c` in the training data, set `_context_to_tag[c]` to the most frequent tag for that context...
Instance Variable	`_context_to_tag`	Dictionary mapping contexts to tags.

Inherited from SequentialBackoffTagger (via ContextTagger):

Method	`tag`	Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple `(token, tag)`.
Method	`tag_one`	Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted.
Property	`backoff`	The backoff tagger for this tagger.
Instance Variable	`_taggers`	A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers).

Inherited from TaggerI (via ContextTagger, SequentialBackoffTagger):

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`tag_sents`	Apply `self.tag()` to each element of sentences. I.e.:
Method	`_check_params`	Undocumented

@classmethod
def decode_json_obj(cls, obj): (source) ¶

Undocumented

def __init__(self, n, train=None, model=None, backoff=None, cutoff=0, verbose=False): (source) ¶

overrides nltk.tag.sequential.ContextTagger.__init__

overridden in nltk.tag.sequential.BigramTagger, nltk.tag.sequential.TrigramTagger, nltk.tag.sequential.UnigramTagger

Parameters
n	Undocumented
train	Undocumented
model	Undocumented
backoff	The backoff tagger that should be used for this tagger.
cutoff	Undocumented
verbose	Undocumented
context_to_tag	A dictionary mapping contexts to tags.

def context(self, tokens, index, history): (source) ¶

overrides nltk.tag.sequential.ContextTagger.context

overridden in nltk.tag.sequential.UnigramTagger

Returns
(hashable)	the context that should be used to look up the tag for the specified token; or None if the specified token should not be handled by this tagger.

def encode_json_obj(self): (source) ¶

Undocumented

json_tag: str = (source) ¶

overridden in nltk.tag.sequential.BigramTagger, nltk.tag.sequential.TrigramTagger, nltk.tag.sequential.UnigramTagger

Undocumented

_n = (source) ¶

Undocumented