«
module documentation

Undocumented

Class ChunkRule A rule specifying how to add chunks to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any substring that matches this tag pattern and that is not already part of a chunk, and create a new chunk containing that substring.
Class ChunkRuleWithContext A rule specifying how to add chunks to a ChunkString, using three matching tag patterns: one for the left context, one for the chunk, and one for the right context. When applied to a ChunkString, it will find any substring that matches the chunk tag pattern, is surrounded by substrings that match the two context patterns, and is not already part of a chunk; and create a new chunk containing the substring that matched the chunk tag pattern.
Class ChunkString A string-based encoding of a particular chunking of a text. Internally, the ChunkString class uses a single string to encode the chunking of the input text. This string contains a sequence of angle-bracket delimited tags, with chunking indicated by braces...
Class ExpandLeftRule A rule specifying how to expand chunks in a ChunkString to the left, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose beginning matches right pattern, and immediately preceded by a strip whose end matches left pattern...
Class ExpandRightRule A rule specifying how to expand chunks in a ChunkString to the right, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose end matches left pattern, and immediately followed by a strip whose beginning matches right pattern...
Class MergeRule A rule specifying how to merge chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose end matches left pattern, and immediately followed by a chunk whose beginning matches right pattern...
Class RegexpChunkParser A regular expression based chunk parser. RegexpChunkParser uses a sequence of "rules" to find chunks of a single type within a text. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ...
Class RegexpChunkRule A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression. The RegexpChunkRule class itself can be used to implement any transformational rule based on regular expressions...
Class RegexpParser A grammar based chunk parser. chunk.RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ...
Class SplitRule A rule specifying how to split chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk that matches the left pattern followed by the right pattern...
Class StripRule A rule specifying how to remove strips to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any substring that matches this tag pattern and that is contained in a chunk, and remove it from that chunk, thus creating two new chunks.
Class UnChunkRule A rule specifying how to remove chunks to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any complete chunk that matches this tag pattern, and un-chunk it.
Function demo A demonstration for the RegexpChunkParser class. A single text is parsed with four different chunk parsers, using a variety of rules and strategies.
Function demo_eval Demonstration code for evaluating a chunk parser, using a ChunkScore. This function assumes that text contains one sentence per line, and that each sentence has the form expected by tree.chunk. It runs the given chunk parser on each sentence in the text, and scores the result...
Function tag_pattern2re_pattern Convert a tag pattern to a regular expression pattern. A "tag pattern" is a modified version of a regular expression, designed for matching sequences of tags. The differences between regular expression patterns and tag patterns are:...
Constant CHUNK_TAG_PATTERN Undocumented
def demo(): (source)

A demonstration for the RegexpChunkParser class. A single text is parsed with four different chunk parsers, using a variety of rules and strategies.

def demo_eval(chunkparser, text): (source)

Demonstration code for evaluating a chunk parser, using a ChunkScore. This function assumes that text contains one sentence per line, and that each sentence has the form expected by tree.chunk. It runs the given chunk parser on each sentence in the text, and scores the result. It prints the final score (precision, recall, and f-measure); and reports the set of chunks that were missed and the set of chunks that were incorrect. (At most 10 missing chunks and 10 incorrect chunks are reported).

Parameters
chunkparser:ChunkParserIThe chunkparser to be tested
text:strThe chunked tagged text that should be used for evaluation.
def tag_pattern2re_pattern(tag_pattern): (source)

Convert a tag pattern to a regular expression pattern. A "tag pattern" is a modified version of a regular expression, designed for matching sequences of tags. The differences between regular expression patterns and tag patterns are:

  • In tag patterns, '<' and '>' act as parentheses; so '<NN>+' matches one or more repetitions of '<NN>', not '<NN' followed by one or more repetitions of '>'.
  • Whitespace in tag patterns is ignored. So '<DT> | <NN>' is equivalant to '<DT>|<NN>'
  • In tag patterns, '.' is equivalant to '[^{}<>]'; so '<NN.*>' matches any single tag starting with 'NN'.

In particular, tag_pattern2re_pattern performs the following transformations on the given pattern:

  • Replace '.' with '[^<>{}]'
  • Remove any whitespace
  • Add extra parens around '<' and '>', to make '<' and '>' act like parentheses. E.g., so that in '<NN>+', the '+' has scope over the entire '<NN>'; and so that in '<NN|IN>', the '|' has scope over 'NN' and 'IN', but not '<' or '>'.
  • Check to make sure the resulting pattern is valid.
Parameters
tag_pattern:strThe tag pattern to convert to a regular expression pattern.
Returns
strA regular expression pattern corresponding to tag_pattern.
Raises
ValueErrorIf tag_pattern is not a valid tag pattern. In particular, tag_pattern should not include braces; and it should not contain nested or mismatched angle-brackets.
CHUNK_TAG_PATTERN = (source)

Undocumented

Value
re.compile(('^((%s|<%s>)*)$' % ('([^\\{\\}<>]|\\{\\d+,?\\}|\\{\\d*,\\d+\\})+',
                               '[^\\{\\}<>]+')))