class RegexpChunkRule(object): (source)
Known subclasses: nltk.chunk.regexp.ChunkRule
, nltk.chunk.regexp.ChunkRuleWithContext
, nltk.chunk.regexp.ExpandLeftRule
, nltk.chunk.regexp.ExpandRightRule
, nltk.chunk.regexp.MergeRule
, nltk.chunk.regexp.SplitRule
, nltk.chunk.regexp.StripRule
, nltk.chunk.regexp.UnChunkRule
Constructor: RegexpChunkRule(regexp, repl, descr)
A rule specifying how to modify the chunking in a ChunkString, using a transformational regular expression. The RegexpChunkRule class itself can be used to implement any transformational rule based on regular expressions. There are also a number of subclasses, which can be used to implement simpler types of rules, based on matching regular expressions.
Each RegexpChunkRule has a regular expression and a replacement expression. When a RegexpChunkRule is "applied" to a ChunkString, it searches the ChunkString for any substring that matches the regular expression, and replaces it using the replacement expression. This search/replace operation has the same semantics as re.sub.
Each RegexpChunkRule also has a description string, which gives a short (typically less than 75 characters) description of the purpose of the rule.
This transformation defined by this RegexpChunkRule should only add and remove braces; it should not modify the sequence of angle-bracket delimited tags. Furthermore, this transformation may not result in nested or mismatched bracketing.
Static Method | fromstring |
Create a RegexpChunkRule from a string description. Currently, the following formats are supported: |
Method | __init__ |
Construct a new RegexpChunkRule. |
Method | __repr__ |
Return a string representation of this rule. It has the form: |
Method | apply |
Apply this rule to the given ChunkString. See the class reference documentation for a description of what it means to apply a rule. |
Method | descr |
Return a short description of the purpose and/or effect of this rule. |
Instance Variable | _descr |
Undocumented |
Instance Variable | _regexp |
Undocumented |
Instance Variable | _repl |
Undocumented |
Create a RegexpChunkRule from a string description. Currently, the following formats are supported:
{regexp} # chunk rule }regexp{ # strip rule regexp}{regexp # split rule regexp{}regexp # merge rule
Where regexp is a regular expression for the rule. Any text following the comment marker (#) will be used as the rule's description:
>>> from nltk.chunk.regexp import RegexpChunkRule >>> RegexpChunkRule.fromstring('{<DT>?<NN.*>+}') <ChunkRule: '<DT>?<NN.*>+'>
nltk.chunk.regexp.ChunkRule
, nltk.chunk.regexp.ChunkRuleWithContext
, nltk.chunk.regexp.ExpandLeftRule
, nltk.chunk.regexp.ExpandRightRule
, nltk.chunk.regexp.MergeRule
, nltk.chunk.regexp.SplitRule
, nltk.chunk.regexp.StripRule
, nltk.chunk.regexp.UnChunkRule
Construct a new RegexpChunkRule.
Parameters | |
regexp:regexp or str | The regular expression for this RegexpChunkRule. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using the replacement string repl. Note that this must be a normal regular expression, not a tag pattern. |
repl:str | The replacement expression for this RegexpChunkRule. When this rule is applied to a ChunkString, any substring that matches regexp will be replaced using repl. |
descr:str | A short description of the purpose and/or effect of this rule. |
nltk.chunk.regexp.ChunkRule
, nltk.chunk.regexp.ChunkRuleWithContext
, nltk.chunk.regexp.ExpandLeftRule
, nltk.chunk.regexp.ExpandRightRule
, nltk.chunk.regexp.MergeRule
, nltk.chunk.regexp.SplitRule
, nltk.chunk.regexp.StripRule
, nltk.chunk.regexp.UnChunkRule
Return a string representation of this rule. It has the form:
<RegexpChunkRule: '{<IN|VB.*>}'->'<IN>'>
Note that this representation does not include the description string; that string can be accessed separately with the descr() method.
Returns | |
str | Undocumented |
Apply this rule to the given ChunkString. See the class reference documentation for a description of what it means to apply a rule.
Parameters | |
chunkstr:ChunkString | The chunkstring to which this rule is applied. |
Returns | |
None | Undocumented |
Raises | |
ValueError | If this transformation generated an invalid chunkstring. |