class documentation

An abstract base class for sequential backoff taggers that choose a tag for a token based on the value of its "context". Different subclasses are used to define different contexts.

A ContextTagger chooses the tag for a token by calculating the token's context, and looking up the corresponding tag in a table. This table can be constructed manually; or it can be automatically constructed based on a training corpus, using the _train() factory method.

Method __init__ No summary
Method __repr__ Undocumented
Method choose_tag Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Method context No summary
Method size No summary
Method _train Initialize this ContextTagger's _context_to_tag table based on the given training data. In particular, for each context c in the training data, set _context_to_tag[c] to the most frequent tag for that context...
Instance Variable _context_to_tag Dictionary mapping contexts to tags.

Inherited from SequentialBackoffTagger:

Method tag Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).
Method tag_one Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted.
Property backoff The backoff tagger for this tagger.
Instance Variable _taggers A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers).

Inherited from TaggerI (via SequentialBackoffTagger):

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method tag_sents Apply self.tag() to each element of sentences. I.e.:
Method _check_params Undocumented
def __init__(self, context_to_tag, backoff=None): (source)
Parameters
context_to_tagA dictionary mapping contexts to tags.
backoffThe backoff tagger that should be used for this tagger.
def __repr__(self): (source)

Undocumented

def choose_tag(self, tokens, index, history): (source)

Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.

Parameters
tokens:listThe list of words that are being tagged.
index:intThe index of the word whose tag should be returned.
history:list(str)A list of the tags for all words before index.
Returns
strUndocumented
@abstractmethod
def context(self, tokens, index, history): (source)
Returns
(hashable)the context that should be used to look up the tag for the specified token; or None if the specified token should not be handled by this tagger.
def size(self): (source)
Returns
The number of entries in the table used by this tagger to map from contexts to tags.
def _train(self, tagged_corpus, cutoff=0, verbose=False): (source)

Initialize this ContextTagger's _context_to_tag table based on the given training data. In particular, for each context c in the training data, set _context_to_tag[c] to the most frequent tag for that context. However, exclude any contexts that are already tagged perfectly by the backoff tagger(s).

The old value of self._context_to_tag (if any) is discarded.

Parameters
tagged_corpusA tagged corpus. Each item should be a list of (word, tag tuples.
cutoffIf the most likely tag for a context occurs fewer than cutoff times, then exclude it from the context-to-tag table for the new tagger.
verboseUndocumented
_context_to_tag = (source)

Dictionary mapping contexts to tags.