class documentation

A regular expression based chunk parser. RegexpChunkParser uses a sequence of "rules" to find chunks of a single type within a text. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ChunkString. The rules are all implemented using regular expression matching and substitution.

The RegexpChunkRule class and its subclasses (ChunkRule, StripRule, UnChunkRule, MergeRule, and SplitRule) define the rules that are used by RegexpChunkParser. Each rule defines an apply() method, which modifies the chunking encoded by a given ChunkString.

Method __init__ Construct a new RegexpChunkParser.
Method __repr__ No summary
Method __str__ No summary
Method parse No summary
Method rules No summary
Method _notrace_apply Apply each rule of this RegexpChunkParser to chunkstr, in turn.
Method _trace_apply Apply each rule of this RegexpChunkParser to chunkstr, in turn. Generate trace output between each rule. If verbose is true, then generate verbose output.
Instance Variable _chunk_label Undocumented
Instance Variable _root_label Undocumented
Instance Variable _rules The list of rules that should be applied to a text.
Instance Variable _trace The default level of tracing.

Inherited from ChunkParserI:

Method evaluate Score the accuracy of the chunker against the gold standard. Remove the chunking the gold standard text, rechunk it using the chunker, and return a ChunkScore object reflecting the performance of this chunk peraser.
def __init__(self, rules, chunk_label='NP', root_label='S', trace=0): (source)

Construct a new RegexpChunkParser.

Parameters
rules:list(RegexpChunkRule)The sequence of rules that should be used to generate the chunking for a tagged text.
chunk_label:strThe node value that should be used for chunk subtrees. This is typically a short string describing the type of information contained by the chunk, such as "NP" for base noun phrases.
root_label:strThe node value that should be used for the top node of the chunk structure.
trace:intThe level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or higher will generate verbose tracing output.
def __repr__(self): (source)
Returns
stra concise string representation of this RegexpChunkParser.
def __str__(self): (source)
Returns
stra verbose string representation of this RegexpChunkParser.
def parse(self, chunk_struct, trace=None): (source)
Parameters
chunk_struct:Treethe chunk structure to be (further) chunked
trace:intThe level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or highter will generate verbose tracing output. This value overrides the trace level value that was given to the constructor.
Returns
Treea chunk structure that encodes the chunks in a given tagged sentence. A chunk is a non-overlapping linguistic group, such as a noun phrase. The set of chunks identified in the chunk structure depends on the rules used to define this RegexpChunkParser.
def rules(self): (source)
Returns
list(RegexpChunkRule)the sequence of rules used by RegexpChunkParser.
def _notrace_apply(self, chunkstr): (source)

Apply each rule of this RegexpChunkParser to chunkstr, in turn.

Parameters
chunkstr:ChunkStringThe chunk string to which each rule should be applied.
Returns
NoneUndocumented
def _trace_apply(self, chunkstr, verbose): (source)

Apply each rule of this RegexpChunkParser to chunkstr, in turn. Generate trace output between each rule. If verbose is true, then generate verbose output.

Parameters
chunkstr:ChunkStringThe chunk string to which each rule should be applied.
verbose:boolWhether output should be verbose.
Returns
NoneUndocumented
_chunk_label = (source)

Undocumented

_root_label = (source)

Undocumented

_rules: list(RegexpChunkRule) = (source)

The list of rules that should be applied to a text.

_trace: int = (source)

The default level of tracing.