class documentation

A grammar based chunk parser. chunk.RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ChunkString. The rules are all implemented using regular expression matching and substitution.

A grammar contains one or more clauses in the following form:

NP:
  {<DT|JJ>}          # chunk determiners and adjectives
  }<[\.VI].*>+{      # strip any tag beginning with V, I, or .
  <.*>}{<DT>         # split a chunk at a determiner
  <DT|JJ>{}<NN.*>    # merge chunk ending with det/adj
                     # with one starting with a noun

The patterns of a clause are executed in order. An earlier pattern may introduce a chunk boundary that prevents a later pattern from executing. Sometimes an individual pattern will match on multiple, overlapping extents of the input. As with regular expression substitution more generally, the chunker will identify the first match possible, then continue looking for matches after this one has ended.

The clauses of a grammar are also executed in order. A cascaded chunk parser is one having more than one clause. The maximum depth of a parse tree created by this chunk parser is the same as the number of clauses in the grammar.

When tracing is turned on, the comment portion of a line is displayed each time the corresponding pattern is applied.

Method __init__ Create a new chunk parser, from the given start state and set of chunk patterns.
Method __repr__ No summary
Method __str__ No summary
Method parse Apply the chunk parser to this input.
Method _add_stage Helper function for __init__: add a new stage to the parser.
Method _read_grammar Helper function for __init__: read the grammar if it is a string.
Instance Variable _grammar Undocumented
Instance Variable _loop Undocumented
Instance Variable _stages The list of parsing stages corresponding to the grammar
Instance Variable _start The start symbol of the grammar (the root node of resulting trees)
Instance Variable _trace Undocumented

Inherited from ChunkParserI:

Method evaluate Score the accuracy of the chunker against the gold standard. Remove the chunking the gold standard text, rechunk it using the chunker, and return a ChunkScore object reflecting the performance of this chunk peraser.
def __init__(self, grammar, root_label='S', loop=1, trace=0): (source)

Create a new chunk parser, from the given start state and set of chunk patterns.

Parameters
grammar:str or list(RegexpChunkParser)The grammar, or a list of RegexpChunkParser objects
root_label:str or NonterminalThe top node of the tree being created
loop:intThe number of times to run through the patterns
trace:intThe level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or higher will generate verbose tracing output.
def __repr__(self): (source)
Returns
stra concise string representation of this chunk.RegexpParser.
def __str__(self): (source)
Returns
stra verbose string representation of this RegexpParser.
def parse(self, chunk_struct, trace=None): (source)

Apply the chunk parser to this input.

Parameters
chunk_struct:Treethe chunk structure to be (further) chunked (this tree is modified, and is also returned)
trace:intThe level of tracing that should be used when parsing a text. 0 will generate no tracing output; 1 will generate normal tracing output; and 2 or highter will generate verbose tracing output. This value overrides the trace level value that was given to the constructor.
Returns
Treethe chunked output.
def _add_stage(self, rules, lhs, root_label, trace): (source)

Helper function for __init__: add a new stage to the parser.

def _read_grammar(self, grammar, root_label, trace): (source)

Helper function for __init__: read the grammar if it is a string.

_grammar = (source)

Undocumented

Undocumented

_stages: int = (source)

The list of parsing stages corresponding to the grammar

_start: str = (source)

The start symbol of the grammar (the root node of resulting trees)

Undocumented