«
class documentation

A simple top-down CFG parser that parses texts by recursively expanding the fringe of a Tree, and matching it against a text.

RecursiveDescentParser uses a list of tree locations called a "frontier" to remember which subtrees have not yet been expanded and which leaves have not yet been matched against the text. Each tree location consists of a list of child indices specifying the path from the root of the tree to a subtree or a leaf; see the reference documentation for Tree for more information about tree locations.

When the parser begins parsing a text, it constructs a tree containing only the start symbol, and a frontier containing the location of the tree's root node. It then extends the tree to cover the text, using the following recursive procedure:

  • If the frontier is empty, and the text is covered by the tree, then return the tree as a possible parse.
  • If the frontier is empty, and the text is not covered by the tree, then return no parses.
  • If the first element of the frontier is a subtree, then use CFG productions to "expand" it. For each applicable production, add the expanded subtree's children to the frontier, and recursively find all parses that can be generated by the new tree and frontier.
  • If the first element of the frontier is a token, then "match" it against the next token from the text. Remove the token from the frontier, and recursively find all parses that can be generated by the new tree and frontier.
See Also
nltk.grammar
Method __init__ Create a new RecursiveDescentParser, that uses grammar to parse texts.
Method grammar No summary
Method parse When possible this list is sorted from most likely to least likely.
Method trace Set the level of tracing output that should be generated when parsing a text.
Method _expand No summary
Method _match No summary
Method _parse Recursively expand and match each elements of tree specified by frontier, to cover remaining_text. Return a list of all parses found.
Method _production_to_tree No summary
Method _trace_backtrack Undocumented
Method _trace_expand Undocumented
Method _trace_fringe Print trace output displaying the fringe of tree. The fringe of tree consists of all of its leaves and all of its childless subtrees.
Method _trace_match Undocumented
Method _trace_start Undocumented
Method _trace_succeed Undocumented
Method _trace_tree Print trace output displaying the parser's current state.
Instance Variable _grammar Undocumented
Instance Variable _trace Undocumented

Inherited from ParserI:

Method parse_all No summary
Method parse_one No summary
Method parse_sents Apply self.parse() to each element of sents. :rtype: iter(iter(Tree))
def __init__(self, grammar, trace=0): (source)

Create a new RecursiveDescentParser, that uses grammar to parse texts.

Parameters
grammar:CFGThe grammar used to parse texts.
trace:intThe level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output.
def grammar(self): (source)
Returns
The grammar used by this parser.
def parse(self, tokens): (source)

When possible this list is sorted from most likely to least likely.

Parameters
tokensUndocumented
sent:list(str)The sentence to be parsed
Returns
iter(Tree)An iterator that generates parse trees for the sentence.
def trace(self, trace=2): (source)

Set the level of tracing output that should be generated when parsing a text.

Parameters
trace:intThe trace level. A trace level of 0 will generate no tracing output; and higher trace levels will produce more verbose tracing output.
Returns
NoneUndocumented
def _expand(self, remaining_text, tree, frontier, production=None): (source)
Parameters
remaining_text:list(str)The portion of the text that is not yet covered by tree.
tree:TreeA partial structure for the text that is currently being parsed. The elements of tree that are specified by frontier have not yet been expanded or matched.
frontier:list(tuple(int))A list of the locations within tree of all subtrees that have not yet been expanded, and all leaves that have not yet been matched.
productionUndocumented
Returns
iter(Tree)An iterator of all parses that can be generated by expanding the first element of frontier with production. In particular, if the first element of frontier is a subtree whose node type is equal to production's left hand side, then add a child to that subtree for each element of production's right hand side; and return all parses that can be generated by matching and expanding the remaining elements of frontier. If the first element of frontier is not a subtree whose node type is equal to production's left hand side, then return an empty list. If production is not specified, then return a list of all parses that can be generated by expanding the first element of frontier with any CFG production.
def _match(self, rtext, tree, frontier): (source)
Parameters
rtext:list(str)The portion of the text that is not yet covered by tree.
tree:TreeA partial structure for the text that is currently being parsed. The elements of tree that are specified by frontier have not yet been expanded or matched.
frontier:list of tuple of intA list of the locations within tree of all subtrees that have not yet been expanded, and all leaves that have not yet been matched.
Returns
iter(Tree)an iterator of all parses that can be generated by matching the first element of frontier against the first token in rtext. In particular, if the first element of frontier has the same type as the first token in rtext, then substitute the token into tree; and return all parses that can be generated by matching and expanding the remaining elements of frontier. If the first element of frontier does not have the same type as the first token in rtext, then return empty list.
def _parse(self, remaining_text, tree, frontier): (source)

Recursively expand and match each elements of tree specified by frontier, to cover remaining_text. Return a list of all parses found.

Parameters
remaining_text:list(str)The portion of the text that is not yet covered by tree.
tree:TreeA partial structure for the text that is currently being parsed. The elements of tree that are specified by frontier have not yet been expanded or matched.
frontier:list(tuple(int))A list of the locations within tree of all subtrees that have not yet been expanded, and all leaves that have not yet been matched. This list sorted in left-to-right order of location within the tree.
Returns
iter(Tree)An iterator of all parses that can be generated by matching and expanding the elements of tree specified by frontier.
def _production_to_tree(self, production): (source)
Parameters
production:ProductionThe CFG production that licenses the tree token that should be returned.
Returns
TreeThe Tree that is licensed by production. In particular, given the production [lhs -> elt[1] ... elt[n]] return a tree that has a node lhs.symbol, and n children. For each nonterminal element elt[i] in the production, the tree token has a childless subtree with node value elt[i].symbol; and for each terminal element elt[j], the tree token has a leaf token with type elt[j].
def _trace_backtrack(self, tree, frontier, toks=None): (source)

Undocumented

def _trace_expand(self, tree, frontier, production): (source)

Undocumented

def _trace_fringe(self, tree, treeloc=None): (source)

Print trace output displaying the fringe of tree. The fringe of tree consists of all of its leaves and all of its childless subtrees.

Returns
NoneUndocumented
def _trace_match(self, tree, frontier, tok): (source)

Undocumented

def _trace_start(self, tree, frontier, text): (source)

Undocumented

def _trace_succeed(self, tree, frontier): (source)

Undocumented

def _trace_tree(self, tree, frontier, operation): (source)

Print trace output displaying the parser's current state.

Parameters
treeUndocumented
frontierUndocumented
operationA character identifying the operation that generated the current state.
Returns
NoneUndocumented

Undocumented