class documentation

A sequential tagger that uses a classifier to choose the tag for each token in a sentence. The featureset input for the classifier is generated by a feature detector function:

feature_detector(tokens, index, history) -> featureset

Where tokens is the list of unlabeled tokens in the sentence; index is the index of the token for which feature detection should be performed; and history is list of the tags for all tokens before index.

Construct a new classifier-based sequential tagger.

Parameters
feature_detectorA function used to generate the featureset input for the classifier:: feature_detector(tokens, index, history) -> featureset
trainA tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples.
backoffA backoff tagger, to be used by the new tagger if it encounters an unknown context.
classifier_builderA function used to train a new classifier based on the data in train. It should take one argument, a list of labeled featuresets (i.e., (featureset, label) tuples).
classifierThe classifier that should be used by the tagger. This is only useful if you want to manually construct the classifier; normally, you would use train instead.
backoffA backoff tagger, used if this tagger is unable to determine a tag for a given token.
cutoff_probIf specified, then this tagger will fall back on its backoff tagger if the probability of the most likely tag is less than cutoff_prob.
Method __init__ Undocumented
Method __repr__ Undocumented
Method choose_tag Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Method classifier Return the classifier that this tagger uses to choose a tag for each word in a sentence. The input for this classifier is generated using this tagger's feature detector. See feature_detector()
Method feature_detector Return the feature detector that this tagger uses to generate featuresets for its classifier. The feature detector is a function with the signature:
Method _train Build a new classifier, based on the given training data tagged_corpus.
Instance Variable _classifier The classifier used to choose a tag for each token.
Instance Variable _cutoff_prob Cutoff probability for tagging -- if the probability of the most likely tag is less than this, then use backoff.
Instance Variable _feature_detector Undocumented

Inherited from SequentialBackoffTagger:

Method tag Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).
Method tag_one Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted.
Property backoff The backoff tagger for this tagger.
Instance Variable _taggers A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers).

Inherited from TaggerI (via SequentialBackoffTagger, FeaturesetTaggerI):

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method tag_sents Apply self.tag() to each element of sentences. I.e.:
Method _check_params Undocumented
def __init__(self, feature_detector=None, train=None, classifier_builder=NaiveBayesClassifier.train, classifier=None, backoff=None, cutoff_prob=None, verbose=False): (source)
def __repr__(self): (source)

Undocumented

def choose_tag(self, tokens, index, history): (source)

Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.

Parameters
tokens:listThe list of words that are being tagged.
index:intThe index of the word whose tag should be returned.
history:list(str)A list of the tags for all words before index.
Returns
strUndocumented
def classifier(self): (source)

Return the classifier that this tagger uses to choose a tag for each word in a sentence. The input for this classifier is generated using this tagger's feature detector. See feature_detector()

def feature_detector(self, tokens, index, history): (source)

Return the feature detector that this tagger uses to generate featuresets for its classifier. The feature detector is a function with the signature:

feature_detector(tokens, index, history) -> featureset

See classifier()

def _train(self, tagged_corpus, classifier_builder, verbose): (source)

Build a new classifier, based on the given training data tagged_corpus.

_classifier = (source)

The classifier used to choose a tag for each token.

_cutoff_prob = (source)

Cutoff probability for tagging -- if the probability of the most likely tag is less than this, then use backoff.

_feature_detector = (source)

Undocumented