Undocumented
Class |
|
A rule specifying how to add chunks to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any substring that matches this tag pattern and that is not already part of a chunk, and create a new chunk containing that substring. |
Class |
|
A rule specifying how to add chunks to a ChunkString, using three matching tag patterns: one for the left context, one for the chunk, and one for the right context. When applied to a ChunkString, it will find any substring that matches the chunk tag pattern, is surrounded by substrings that match the two context patterns, and is not already part of a chunk; and create a new chunk containing the substring that matched the chunk tag pattern. |
Class |
|
A string-based encoding of a particular chunking of a text. Internally, the ChunkString class uses a single string to encode the chunking of the input text. This string contains a sequence of angle-bracket delimited tags, with chunking indicated by braces... |
Class |
|
A rule specifying how to expand chunks in a ChunkString to the left, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose beginning matches right pattern, and immediately preceded by a strip whose end matches left pattern... |
Class |
|
A rule specifying how to expand chunks in a ChunkString to the right, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose end matches left pattern, and immediately followed by a strip whose beginning matches right pattern... |
Class |
|
A rule specifying how to merge chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk whose end matches left pattern, and immediately followed by a chunk whose beginning matches right pattern... |
Class |
|
A regular expression based chunk parser. RegexpChunkParser uses a sequence of "rules" to find chunks of a single type within a text. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ... |
Class |
|
A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression. The RegexpChunkRule class itself can be used to implement any transformational rule based on regular expressions... |
Class |
|
A grammar based chunk parser. chunk.RegexpParser uses a set of regular expression patterns to specify the behavior of the parser. The chunking of the text is encoded using a ChunkString, and each rule acts by modifying the chunking in the ... |
Class |
|
A rule specifying how to split chunks in a ChunkString, using two matching tag patterns: a left pattern, and a right pattern. When applied to a ChunkString, it will find any chunk that matches the left pattern followed by the right pattern... |
Class |
|
A rule specifying how to remove strips to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any substring that matches this tag pattern and that is contained in a chunk, and remove it from that chunk, thus creating two new chunks. |
Class |
|
A rule specifying how to remove chunks to a ChunkString, using a matching tag pattern. When applied to a ChunkString, it will find any complete chunk that matches this tag pattern, and un-chunk it. |
Function | demo |
A demonstration for the RegexpChunkParser class. A single text is parsed with four different chunk parsers, using a variety of rules and strategies. |
Function | demo |
Demonstration code for evaluating a chunk parser, using a ChunkScore. This function assumes that text contains one sentence per line, and that each sentence has the form expected by tree.chunk. It runs the given chunk parser on each sentence in the text, and scores the result... |
Function | tag |
Convert a tag pattern to a regular expression pattern. A "tag pattern" is a modified version of a regular expression, designed for matching sequences of tags. The differences between regular expression patterns and tag patterns are:... |
Constant | CHUNK |
Undocumented |
A demonstration for the RegexpChunkParser class. A single text is parsed with four different chunk parsers, using a variety of rules and strategies.
Demonstration code for evaluating a chunk parser, using a ChunkScore. This function assumes that text contains one sentence per line, and that each sentence has the form expected by tree.chunk. It runs the given chunk parser on each sentence in the text, and scores the result. It prints the final score (precision, recall, and f-measure); and reports the set of chunks that were missed and the set of chunks that were incorrect. (At most 10 missing chunks and 10 incorrect chunks are reported).
Parameters | |
chunkparser:ChunkParserI | The chunkparser to be tested |
text:str | The chunked tagged text that should be used for evaluation. |
Convert a tag pattern to a regular expression pattern. A "tag pattern" is a modified version of a regular expression, designed for matching sequences of tags. The differences between regular expression patterns and tag patterns are:
- In tag patterns, '<' and '>' act as parentheses; so '<NN>+' matches one or more repetitions of '<NN>', not '<NN' followed by one or more repetitions of '>'.
- Whitespace in tag patterns is ignored. So '<DT> | <NN>' is equivalant to '<DT>|<NN>'
- In tag patterns, '.' is equivalant to '[^{}<>]'; so '<NN.*>' matches any single tag starting with 'NN'.
In particular, tag_pattern2re_pattern performs the following transformations on the given pattern:
- Replace '.' with '[^<>{}]'
- Remove any whitespace
- Add extra parens around '<' and '>', to make '<' and '>' act like parentheses. E.g., so that in '<NN>+', the '+' has scope over the entire '<NN>'; and so that in '<NN|IN>', the '|' has scope over 'NN' and 'IN', but not '<' or '>'.
- Check to make sure the resulting pattern is valid.
Parameters | |
tag | The tag pattern to convert to a regular expression pattern. |
Returns | |
str | A regular expression pattern corresponding to tag_pattern. |
Raises | |
ValueError | If tag_pattern is not a valid tag pattern. In particular, tag_pattern should not include braces; and it should not contain nested or mismatched angle-brackets. |