class ClassifierBasedTagger(SequentialBackoffTagger, FeaturesetTaggerI): (source)
Known subclasses: nltk.tag.sequential.ClassifierBasedPOSTagger
Constructor: ClassifierBasedTagger(feature_detector, train, classifier_builder, classifier, ...)
A sequential tagger that uses a classifier to choose the tag for each token in a sentence. The featureset input for the classifier is generated by a feature detector function:
feature_detector(tokens, index, history) -> featureset
Where tokens is the list of unlabeled tokens in the sentence; index is the index of the token for which feature detection should be performed; and history is list of the tags for all tokens before index.
Construct a new classifier-based sequential tagger.
Parameters | |
feature | A function used to generate the featureset input for the classifier:: feature_detector(tokens, index, history) -> featureset |
train | A tagged corpus consisting of a list of tagged sentences, where each sentence is a list of (word, tag) tuples. |
backoff | A backoff tagger, to be used by the new tagger if it encounters an unknown context. |
classifier | A function used to train a new classifier based on the data in train. It should take one argument, a list of labeled featuresets (i.e., (featureset, label) tuples). |
classifier | The classifier that should be used by the tagger. This is only useful if you want to manually construct the classifier; normally, you would use train instead. |
backoff | A backoff tagger, used if this tagger is unable to determine a tag for a given token. |
cutoff | If specified, then this tagger will fall back on its backoff tagger if the probability of the most likely tag is less than cutoff_prob. |
Method | __init__ |
Undocumented |
Method | __repr__ |
Undocumented |
Method | choose |
Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger. |
Method | classifier |
Return the classifier that this tagger uses to choose a tag for each word in a sentence. The input for this classifier is generated using this tagger's feature detector. See feature_detector() |
Method | feature |
Return the feature detector that this tagger uses to generate featuresets for its classifier. The feature detector is a function with the signature: |
Method | _train |
Build a new classifier, based on the given training data tagged_corpus. |
Instance Variable | _classifier |
The classifier used to choose a tag for each token. |
Instance Variable | _cutoff |
Cutoff probability for tagging -- if the probability of the most likely tag is less than this, then use backoff. |
Instance Variable | _feature |
Undocumented |
Inherited from SequentialBackoffTagger
:
Method | tag |
Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag). |
Method | tag |
Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted. |
Property | backoff |
The backoff tagger for this tagger. |
Instance Variable | _taggers |
A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers). |
Inherited from TaggerI
(via SequentialBackoffTagger
, FeaturesetTaggerI
):
Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
Method | tag |
Apply self.tag() to each element of sentences. I.e.: |
Method | _check |
Undocumented |
Undocumented
Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Parameters | |
tokens:list | The list of words that are being tagged. |
index:int | The index of the word whose tag should be returned. |
history:list(str) | A list of the tags for all words before index. |
Returns | |
str | Undocumented |
Return the classifier that this tagger uses to choose a tag for each word in a sentence. The input for this classifier is generated using this tagger's feature detector. See feature_detector()
nltk.tag.sequential.ClassifierBasedPOSTagger
Return the feature detector that this tagger uses to generate featuresets for its classifier. The feature detector is a function with the signature:
feature_detector(tokens, index, history) -> featureset
See classifier()