nltk.tag.brill.BrillTagger

class documentation

class BrillTagger(TaggerI): (source)

Constructor: BrillTagger(initial_tagger, rules, training_stats)

Brill's transformational rule-based tagger. Brill taggers use an initial tagger (such as tag.DefaultTagger) to assign an initial tag sequence to a text; and then apply an ordered list of transformational rules to correct the tags of individual tokens. These transformation rules are specified by the TagRule interface.

Brill taggers can be created directly, from an initial tagger and a list of transformational rules; but more often, Brill taggers are created by learning rules from a training corpus, using one of the TaggerTrainers available.

Class Method	`decode_json_obj`	Undocumented
Method	`__init__`	No summary
Method	`batch_tag_incremental`	Tags by applying each rule to the entire corpus (rather than all rules to a single sequence). The point is to collect statistics on the test set for individual rules.
Method	`encode_json_obj`	Undocumented
Method	`print_template_statistics`	Print a list of all templates, ranked according to efficiency.
Method	`rules`	Return the ordered list of transformation rules that this tagger has learnt
Method	`tag`	Undocumented
Method	`train_stats`	Return a named statistic collected during training, or a dictionary of all available statistics if no name given
Class Variable	`json_tag`	Undocumented
Instance Variable	`_initial_tagger`	Undocumented
Instance Variable	`_rules`	Undocumented
Instance Variable	`_training_stats`	Undocumented

@classmethod
def decode_json_obj(cls, obj): (source) ¶

Undocumented

def __init__(self, initial_tagger, rules, training_stats=None): (source) ¶

Parameters
initial_tagger:TaggerI	The initial tagger
rules:list(TagRule)	An ordered list of transformation rules that should be used to correct the initial tagging.
training_stats:dict	A dictionary of statistics collected during training, for possible later use

def batch_tag_incremental(self, sequences, gold): (source) ¶

Tags by applying each rule to the entire corpus (rather than all rules to a single sequence). The point is to collect statistics on the test set for individual rules.

NOTE: This is inefficient (does not build any index, so will traverse the entire corpus N times for N rules) -- usually you would not care about statistics for individual rules and thus use batch_tag() instead

Parameters
sequences:list of list of strings	lists of token sequences (sentences, in some applications) to be tagged
gold:list of list of strings	the gold standard
Returns
tuple of (tagged_sequences, ordered list of rule scores (one for each rule))

def encode_json_obj(self): (source) ¶

Undocumented

def print_template_statistics(self, test_stats=None, printunused=True): (source) ¶

Print a list of all templates, ranked according to efficiency.

If test_stats is available, the templates are ranked according to their relative contribution (summed for all rules created from a given template, weighted by score) to the performance on the test set. If no test_stats, then statistics collected during training are used instead. There is also an unweighted measure (just counting the rules). This is less informative, though, as many low-score rules will appear towards end of training.

Parameters
test_stats:dict of str -> any (but usually numbers)	dictionary of statistics collected during testing
printunused:bool	if True, print a list of all unused templates
Returns
None	None

def rules(self): (source) ¶

Return the ordered list of transformation rules that this tagger has learnt

Returns
list of Rules	the ordered list of transformation rules that correct the initial tagging

def tag(self, tokens): (source) ¶

Undocumented

def train_stats(self, statistic=None): (source) ¶

Return a named statistic collected during training, or a dictionary of all available statistics if no name given

Parameters
statistic:str	name of statistic
Returns
any (but usually a number)	some statistic collected during training of this tagger

json_tag: str = (source) ¶

Undocumented

_initial_tagger = (source) ¶

Undocumented

_rules = (source) ¶

Undocumented

_training_stats = (source) ¶

Undocumented