nltk.tag.api.TaggerI

class documentation

class TaggerI: (source)

Known subclasses: nltk.classify.senna.Senna, nltk.parse.corenlp.GenericCoreNLPParser, nltk.tag.api.FeaturesetTaggerI, nltk.tag.crf.CRFTagger, nltk.tag.hmm.HiddenMarkovModelTagger, nltk.tag.hunpos.HunposTagger, nltk.tag.perceptron.PerceptronTagger, nltk.tag.sequential.SequentialBackoffTagger, nltk.tag.stanford.StanfordTagger, nltk.tag.tnt.TnT

View In Hierarchy

A processing interface for assigning a tag to each token in a list. Tags are case sensitive strings that identify some property of each token, such as its part of speech or its sense.

Some taggers require specific types for their tokens. This is generally indicated by the use of a sub-interface to TaggerI. For example, featureset taggers, which are subclassed from FeaturesetTagger, require that each token be a featureset.

Subclasses must define:

either tag() or tag_sents() (or both)

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`tag`	Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple `(token, tag)`.
Method	`tag_sents`	Apply `self.tag()` to each element of sentences. I.e.:
Method	`_check_params`	Undocumented

def evaluate(self, gold): (source) ¶

Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.

Parameters
gold:list(list(tuple(str, str)))	The list of tagged sentences to score the tagger on.
Returns
float	Undocumented

@abstractmethod
def tag(self, tokens): (source) ¶

overridden in nltk.classify.senna.Senna, nltk.parse.corenlp.GenericCoreNLPParser, nltk.tag.crf.CRFTagger, nltk.tag.hmm.HiddenMarkovModelTagger, nltk.tag.hunpos.HunposTagger, nltk.tag.perceptron.PerceptronTagger, nltk.tag.sequential.SequentialBackoffTagger, nltk.tag.stanford.StanfordTagger, nltk.tag.tnt.TnT

Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple (token, tag).

Returns
list(tuple(str, str))	Undocumented

def tag_sents(self, sentences): (source) ¶

overridden in nltk.classify.senna.Senna, nltk.parse.corenlp.GenericCoreNLPParser, nltk.tag.crf.CRFTagger, nltk.tag.stanford.StanfordTagger

Apply self.tag() to each element of sentences. I.e.:

return [self.tag(sent) for sent in sentences]

def _check_params(self, train, model): (source) ¶

Undocumented