class documentation

class ProbabilisticNonprojectiveParser(object): (source)

Constructor: ProbabilisticNonprojectiveParser()

View In Hierarchy

A probabilistic non-projective dependency parser.

Nonprojective dependencies allows for "crossing branches" in the parse tree which is necessary for representing particular linguistic phenomena, or even typical parses in some languages. This parser follows the MST parsing algorithm, outlined in McDonald(2005), which likens the search for the best non-projective parse to finding the maximum spanning tree in a weighted directed graph.

>>> class Scorer(DependencyScorerI):
...     def train(self, graphs):
...         pass
...
...     def score(self, graph):
...         return [
...             [[], [5],  [1],  [1]],
...             [[], [],   [11], [4]],
...             [[], [10], [],   [5]],
...             [[], [8],  [8],  []],
...         ]
>>> npp = ProbabilisticNonprojectiveParser()
>>> npp.train([], Scorer())
>>> parses = npp.parse(['v1', 'v2', 'v3'], [None, None, None])
>>> len(list(parses))
1

Rule based example

>>> from nltk.grammar import DependencyGrammar
>>> grammar = DependencyGrammar.fromstring('''
... 'taught' -> 'play' | 'man'
... 'man' -> 'the' | 'in'
... 'in' -> 'corner'
... 'corner' -> 'the'
... 'play' -> 'golf' | 'dachshund' | 'to'
... 'dachshund' -> 'his'
... ''')
>>> ndp = NonprojectiveDependencyParser(grammar)
>>> parses = ndp.parse(['the', 'man', 'in', 'the', 'corner', 'taught', 'his', 'dachshund', 'to', 'play', 'golf'])
>>> len(list(parses))
4
Method __init__ Creates a new non-projective parser.
Method best_incoming_arc Returns the source of the best incoming arc to the node with address: node_index
Method collapse_nodes Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.
Method compute_max_subtract_score When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.
Method compute_original_indexes As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.
Method initialize_edge_scores Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser's scorer which was assigned during the training process.
Method original_best_arc Undocumented
Method parse Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the ...
Method train Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser's scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.
Method update_edge_scores Updates the edge scores to reflect a collapse operation into new_node.
Instance Variable inner_nodes Undocumented
Instance Variable scores Undocumented
Instance Variable _scorer Undocumented
def __init__(self): (source)

Creates a new non-projective parser.

def best_incoming_arc(self, node_index): (source)

Returns the source of the best incoming arc to the node with address: node_index

the node that is arced to.

Parameters
node_index:integer.The address of the 'destination' node,
def collapse_nodes(self, new_node, cycle_path, g_graph, b_graph, c_graph): (source)

Takes a list of nodes that have been identified to belong to a cycle, and collapses them into on larger node. The arcs of all nodes in the graph must be updated to account for this.

Parameters
new_node:Node.A Node (Dictionary) to collapse the cycle nodes into.
cycle_path:A list of integers.A list of node addresses, each of which is in the cycle.
g_graphUndocumented
b_graphUndocumented
c_graphUndocumented
g_graph, b_graph, c_graph:DependencyGraphGraphs which need to be updated.
def compute_max_subtract_score(self, column_index, cycle_indexes): (source)

When updating scores the score of the highest-weighted incoming arc is subtracted upon collapse. This returns the correct amount to subtract from that edge.

to a particular node being updated :type cycle_indexes: A list of integers. :param cycle_indexes: Only arcs from cycle nodes are considered. This is a list of such nodes addresses.

Parameters
column_index:integer.A index representing the column of incoming arcs
cycle_indexesUndocumented
def compute_original_indexes(self, new_indexes): (source)

As nodes are collapsed into others, they are replaced by the new node in the graph, but it's still necessary to keep track of what these original nodes were. This takes a list of node addresses and replaces any collapsed node addresses with their original addresses.

subsumed nodes.

Parameters
new_indexes:A list of integers.A list of node addresses to check for
def initialize_edge_scores(self, graph): (source)

Assigns a score to every edge in the DependencyGraph graph. These scores are generated via the parser's scorer which was assigned during the training process.

Parameters
graph:DependencyGraphA dependency graph to assign scores to.
def original_best_arc(self, node_index): (source)

Undocumented

def parse(self, tokens, tags): (source)

Parses a list of tokens in accordance to the MST parsing algorithm for non-projective dependency parses. Assumes that the tokens to be parsed have already been tagged and those tags are provided. Various scoring methods can be used by implementing the DependencyScorerI interface and passing it to the training algorithm.

Parameters
tokens:list(str)A list of words or punctuation to be parsed.
tags:list(str)A list of tags corresponding by index to the words in the tokens list.
Returns
iter(DependencyGraph)An iterator of non-projective parses.
def train(self, graphs, dependency_scorer): (source)

Trains a DependencyScorerI from a set of DependencyGraph objects, and establishes this as the parser's scorer. This is used to initialize the scores on a DependencyGraph during the parsing procedure.

Parameters
graphs:list(DependencyGraph)A list of dependency graphs to train the scorer.
dependency_scorer:DependencyScorerIA scorer which implements the DependencyScorerI interface.
def update_edge_scores(self, new_node, cycle_path): (source)

Updates the edge scores to reflect a collapse operation into new_node.

Parameters
new_node:A Node.The node which cycle nodes are collapsed into.
cycle_path:A list of integers.A list of node addresses that belong to the cycle.
inner_nodes: dict = (source)

Undocumented

Undocumented

Undocumented