class documentation

A tagger that chooses a token's tag based on its word string and on the preceding n word's tags. In particular, a tuple (tags[i-n:i-1], words[i]) is looked up in a table, and the corresponding tag is returned. N-gram taggers are typically trained on a tagged corpus.

Train a new NgramTagger using the given training data or the supplied model. In particular, construct a new tagger whose table maps from each context (tag[i-n:i-1], word[i]) to the most frequent tag for that context. But exclude any contexts that are already tagged perfectly by the backoff tagger.

Parameters
trainA tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples.
backoffA backoff tagger, to be used by the new tagger if it encounters an unknown context.
cutoffIf the most likely tag for a context occurs fewer than cutoff times, then exclude it from the context-to-tag table for the new tagger.
Class Method decode_json_obj Undocumented
Method __init__ No summary
Method context No summary
Method encode_json_obj Undocumented
Class Variable json_tag Undocumented
Instance Variable _n Undocumented

Inherited from ContextTagger:

Method __repr__ Undocumented
Method choose_tag Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Method size No summary
Method _train Initialize this ContextTagger's _context_to_tag table based on the given training data. In particular, for each context c in the training data, set _context_to_tag[c] to the most frequent tag for that context...
Instance Variable _context_to_tag Dictionary mapping contexts to tags.

Inherited from SequentialBackoffTagger (via ContextTagger):

Method tag Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).
Method tag_one Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted.
Property backoff The backoff tagger for this tagger.
Instance Variable _taggers A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers).

Inherited from TaggerI (via ContextTagger, SequentialBackoffTagger):

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method tag_sents Apply self.tag() to each element of sentences. I.e.:
Method _check_params Undocumented
@classmethod
def decode_json_obj(cls, obj): (source)

Undocumented

def __init__(self, n, train=None, model=None, backoff=None, cutoff=0, verbose=False): (source)
Parameters
nUndocumented
trainUndocumented
modelUndocumented
backoffThe backoff tagger that should be used for this tagger.
cutoffUndocumented
verboseUndocumented
context_to_tagA dictionary mapping contexts to tags.
def context(self, tokens, index, history): (source)
Returns
(hashable)the context that should be used to look up the tag for the specified token; or None if the specified token should not be handled by this tagger.
def encode_json_obj(self): (source)

Undocumented

Undocumented