nltk.tag.sequential.RegexpTagger

class documentation

class RegexpTagger(SequentialBackoffTagger): (source)

Constructor: RegexpTagger(regexps, backoff)

Regular Expression Tagger

The RegexpTagger assigns tags to tokens by comparing their word strings to a series of regular expressions. The following tagger uses word suffixes to make guesses about the correct Brown Corpus part of speech tag:

>>> from nltk.corpus import brown
>>> from nltk.tag import RegexpTagger
>>> test_sent = brown.sents(categories='news')[0]
>>> regexp_tagger = RegexpTagger(
...     [(r'^-?[0-9]+(.[0-9]+)?$', 'CD'),   # cardinal numbers
...      (r'(The|the|A|a|An|an)$', 'AT'),   # articles
...      (r'.*able$', 'JJ'),                # adjectives
...      (r'.*ness$', 'NN'),                # nouns formed from adjectives
...      (r'.*ly$', 'RB'),                  # adverbs
...      (r'.*s$', 'NNS'),                  # plural nouns
...      (r'.*ing$', 'VBG'),                # gerunds
...      (r'.*ed$', 'VBD'),                 # past tense verbs
...      (r'.*', 'NN')                      # nouns (default)
... ])
>>> regexp_tagger
<Regexp Tagger: size=9>
>>> regexp_tagger.tag(test_sent)
[('The', 'AT'), ('Fulton', 'NN'), ('County', 'NN'), ('Grand', 'NN'), ('Jury', 'NN'),
('said', 'NN'), ('Friday', 'NN'), ('an', 'AT'), ('investigation', 'NN'), ('of', 'NN'),
("Atlanta's", 'NNS'), ('recent', 'NN'), ('primary', 'NN'), ('election', 'NN'),
('produced', 'VBD'), ('``', 'NN'), ('no', 'NN'), ('evidence', 'NN'), ("''", 'NN'),
('that', 'NN'), ('any', 'NN'), ('irregularities', 'NNS'), ('took', 'NN'),
('place', 'NN'), ('.', 'NN')]

Parameters
regexps	A list of `(regexp, tag)` pairs, each of which indicates that a word matching `regexp` should be tagged with `tag`. The pairs will be evalutated in order. If none of the regexps match a word, then the optional backoff tagger is invoked, else it is assigned the tag None.

Class Method	`decode_json_obj`	Undocumented
Method	`__init__`	No summary
Method	`__repr__`	Undocumented
Method	`choose_tag`	Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.
Method	`encode_json_obj`	Undocumented
Class Variable	`json_tag`	Undocumented
Instance Variable	`_regexps`	Undocumented

Inherited from SequentialBackoffTagger:

Method	`tag`	Determine the most appropriate tag sequence for the given token sequence, and return a corresponding list of tagged tokens. A tagged token is encoded as a tuple `(token, tag)`.
Method	`tag_one`	Determine an appropriate tag for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, then its backoff tagger is consulted.
Property	`backoff`	The backoff tagger for this tagger.
Instance Variable	`_taggers`	A list of all the taggers that should be tried to tag a token (i.e., self and its backoff taggers).

Inherited from TaggerI (via SequentialBackoffTagger):

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`tag_sents`	Apply `self.tag()` to each element of sentences. I.e.:
Method	`_check_params`	Undocumented

@classmethod
def decode_json_obj(cls, obj): (source) ¶

Undocumented

def __init__(self, regexps, backoff=None): (source) ¶

overrides nltk.tag.sequential.SequentialBackoffTagger.__init__

def __repr__(self): (source) ¶

Undocumented

def choose_tag(self, tokens, index, history): (source) ¶

overrides nltk.tag.sequential.SequentialBackoffTagger.choose_tag

Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.

Parameters
tokens:list	The list of words that are being tagged.
index:int	The index of the word whose tag should be returned.
history:list(str)	A list of the tags for all words before index.
Returns
str	Undocumented

def encode_json_obj(self): (source) ¶

Undocumented

json_tag: str = (source) ¶

Undocumented

_regexps = (source) ¶

Undocumented