nltk.parse.nonprojectivedependencyparser.ProbabilisticNonprojectiveParser

class documentation

class ProbabilisticNonprojectiveParser(object): (source)

Constructor: ProbabilisticNonprojectiveParser()

A probabilistic non-projective dependency parser.

Nonprojective dependencies allows for "crossing branches" in the parse tree which is necessary for representing particular linguistic phenomena, or even typical parses in some languages. This parser follows the MST parsing algorithm, outlined in McDonald(2005), which likens the search for the best non-projective parse to finding the maximum spanning tree in a weighted directed graph.

>>> class Scorer(DependencyScorerI):
...     def train(self, graphs):
...         pass
...
...     def score(self, graph):
...         return [
...             [[], [5],  [1],  [1]],
...             [[], [],   [11], [4]],
...             [[], [10], [],   [5]],
...             [[], [8],  [8],  []],
...         ]

>>> npp = ProbabilisticNonprojectiveParser()
>>> npp.train([], Scorer())

>>> parses = npp.parse(['v1', 'v2', 'v3'], [None, None, None])
>>> len(list(parses))
1

Rule based example

>>> from nltk.grammar import DependencyGrammar

>>> grammar = DependencyGrammar.fromstring('''
... 'taught' -> 'play' | 'man'
... 'man' -> 'the' | 'in'
... 'in' -> 'corner'
... 'corner' -> 'the'
... 'play' -> 'golf' | 'dachshund' | 'to'
... 'dachshund' -> 'his'
... ''')

>>> ndp = NonprojectiveDependencyParser(grammar)
>>> parses = ndp.parse(['the', 'man', 'in', 'the', 'corner', 'taught', 'his', 'dachshund', 'to', 'play', 'golf'])
>>> len(list(parses))
4

Method	`__init__`	Creates a new non-projective parser.
Method	`best_incoming_arc`	Returns the source of the best incoming arc to the node with address: node_index
Method	`collapse_nodes`	Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.
Method	`compute_max_subtract_score`	When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.
Method	`compute_original_indexes`	As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.
Method	`initialize_edge_scores`	Assigns a score to every edge in the `DependencyGraph` graph. These scores are generated via the parser's scorer which was assigned during the training process.
Method	`original_best_arc`	Undocumented
Method	`parse`	Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the ...
Method	`train`	Trains a `DependencyScorerI` from a set of `DependencyGraph` objects, and establishes this as the parser's scorer. This is used to initialize the scores on a `DependencyGraph` during the parsing procedure.
Method	`update_edge_scores`	Updates the edge scores to reflect a collapse operation into new_node.
Instance Variable	`inner_nodes`	Undocumented
Instance Variable	`scores`	Undocumented
Instance Variable	`_scorer`	Undocumented

def __init__(self): (source) ¶

Creates a new non-projective parser.

def best_incoming_arc(self, node_index): (source) ¶

Returns the source of the best incoming arc to the node with address: node_index

the node that is arced to.

Parameters
node_index:integer.	The address of the 'destination' node,

def collapse_nodes(self, new_node, cycle_path, g_graph, b_graph, c_graph): (source) ¶

Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.

Parameters
new_node:Node.	A Node (Dictionary) to collapse the cycle nodes into.
cycle_path:A list of integers.	A list of node addresses, each of which is in the cycle.
g_graph	Undocumented
b_graph	Undocumented
c_graph	Undocumented
g_graph, b_graph, c_graph:DependencyGraph	Graphs which need to be updated.

def compute_max_subtract_score(self, column_index, cycle_indexes): (source) ¶

When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.

to a particular node being updated :type cycle_indexes: A list of integers. :param cycle_indexes: Only arcs from cycle nodes are considered. This is a list of such nodes addresses.

Parameters
column_index:integer.	A index representing the column of incoming arcs
cycle_indexes	Undocumented

def compute_original_indexes(self, new_indexes): (source) ¶

As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.

subsumed nodes.

Parameters
new_indexes:A list of integers.	A list of node addresses to check for

def initialize_edge_scores(self, graph): (source) ¶

Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser's scorer which was assigned during the training process.

Parameters
graph:DependencyGraph	A dependency graph to assign scores to.

def original_best_arc(self, node_index): (source) ¶

Undocumented

def parse(self, tokens, tags): (source) ¶

Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the DependencyScorerI interface and passing it to the training algorithm.

Parameters
tokens:list(str)	A list of words or punctuation to be parsed.
tags:list(str)	A list of tags corresponding by index to the words in the tokens list.
Returns
iter(DependencyGraph)	An iterator of non-projective parses.

def train(self, graphs, dependency_scorer): (source) ¶

Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser's scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.

Parameters
graphs:list(DependencyGraph)	A list of dependency graphs to train the scorer.
dependency_scorer:DependencyScorerI	A scorer which implements the `DependencyScorerI` interface.

def update_edge_scores(self, new_node, cycle_path): (source) ¶

Updates the edge scores to reflect a collapse operation into new_node.

Parameters
new_node:A Node.	The node which cycle nodes are collapsed into.
cycle_path:A list of integers.	A list of node addresses that belong to the cycle.

inner_nodes: dict = (source) ¶

Undocumented

scores = (source) ¶

Undocumented

_scorer = (source) ¶

Undocumented