class documentation

A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree.

A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.

>>> from nltk.tree import Tree
>>> print(Tree(1, [2, Tree(3, [4]), 5]))
(1 2 (3 4) 5)
>>> vp = Tree('VP', [Tree('V', ['saw']),
...                  Tree('NP', ['him'])])
>>> s = Tree('S', [Tree('NP', ['I']), vp])
>>> print(s)
(S (NP I) (VP (V saw) (NP him)))
>>> print(s[1])
(VP (V saw) (NP him))
>>> print(s[1,1])
(NP him)
>>> t = Tree.fromstring("(S (NP I) (VP (V saw) (NP him)))")
>>> s == t
True
>>> t[1][1].set_label('X')
>>> t[1][1].label()
'X'
>>> print(t)
(S (NP I) (VP (V saw) (X him)))
>>> t[0], t[1,1] = t[1,1], t[0]
>>> print(t)
(S (X him) (VP (V saw) (NP I)))

The length of a tree is the number of children it has.

>>> len(t)
2

The set_label() and label() methods allow individual constituents to be labeled. For example, syntax trees use this label to specify phrase tags, such as "NP" and "VP".

Several Tree methods use "tree positions" to specify children or descendants of a tree. Tree positions are defined as follows:

  • The tree position i specifies a Tree's ith child.
  • The tree position () specifies the Tree itself.
  • If p is the tree position of descendant d, then p+i specifies the ith child of d.

I.e., every tree position is either a single index i, specifying tree[i]; or a sequence i1, i2, ..., iN, specifying tree[i1][i2]...[iN].

Construct a new tree. This constructor can be called in one of two ways:

  • Tree(label, children) constructs a new tree with the
    specified label and list of children.
  • Tree.fromstring(s) constructs a new tree by parsing the string s.
Class Method convert Convert a tree between different subtypes of Tree. cls determines which class will be used to encode the new tree.
Class Method fromlist Convert nested lists to a NLTK Tree
Class Method fromstring Read a bracketed tree string and return the resulting tree. Trees are represented as nested brackettings, such as:
Method __add__ Undocumented
Method __copy__ Undocumented
Method __deepcopy__ Undocumented
Method __delitem__ Undocumented
Method __eq__ Undocumented
Method __getitem__ Undocumented
Method __init__ Undocumented
Method __lt__ Undocumented
Method __mul__ Undocumented
Method __radd__ Undocumented
Method __repr__ Undocumented
Method __rmul__ Undocumented
Method __setitem__ Undocumented
Method __str__ Undocumented
Method chomsky_normal_form This method can modify a tree in three ways:
Method collapse_unary Collapse subtrees with a single child (ie. unary productions) into a new non-terminal (Tree node) joined by 'joinChar'. This is useful when working with algorithms that do not allow unary productions, and completely removing the unary productions would require loss of useful information...
Method copy Undocumented
Method draw Open a new window containing a graphical diagram of this tree.
Method flatten Return a flat version of the tree, with all non-root non-terminals removed.
Method freeze Undocumented
Method height Return the height of the tree.
Method label Return the node label of the tree.
Method leaf_treeposition No summary
Method leaves Return the leaves of the tree.
Method pformat No summary
Method pformat_latex_qtree Returns a representation of the tree compatible with the LaTeX qtree package. This consists of the string \Tree followed by the tree represented in bracketed notation.
Method pos Return a sequence of pos-tagged words extracted from the tree.
Method pprint Print a string representation of this Tree to 'stream'
Method pretty_print Pretty-print this tree as ASCII or Unicode art. For explanation of the arguments, see the documentation for nltk.treeprettyprinter.TreePrettyPrinter.
Method productions Generate the productions that correspond to the non-terminal nodes of the tree. For each subtree of the form (P: C1 C2 ... Cn) this produces a production of the form P -> C1 C2 ... Cn.
Method set_label Set the node label of the tree.
Method subtrees Generate all the subtrees of this tree, optionally restricted to trees matching the filter function.
Method treeposition_spanning_leaves No summary
Method treepositions No summary
Method un_chomsky_normal_form This method modifies the tree in three ways:
Class Variable __ge__ Undocumented
Class Variable __gt__ Undocumented
Class Variable __le__ Undocumented
Class Variable __ne__ Undocumented
Class Variable node Undocumented
Class Method _parse_error Display a friendly error message when parsing a tree string fails. :param s: The string we're parsing. :param match: regexp match of the problem token. :param expecting: what we expected to see instead.
Method _frozen_class Undocumented
Method _get_node Outdated method to access the node value; use the label() method instead.
Method _pformat_flat Undocumented
Method _repr_png_ Draws and outputs in PNG for ipython. PNG is used instead of PDF, since it can be displayed in the qt console and has wider browser support.
Method _set_node Outdated method to set the node value; use the set_label() method instead.
Instance Variable _label Undocumented
@classmethod
def convert(cls, tree): (source)

Convert a tree between different subtypes of Tree. cls determines which class will be used to encode the new tree.

Parameters
tree:TreeThe tree that should be converted.
Returns
The new Tree.
@classmethod
def fromlist(cls, l): (source)

Convert nested lists to a NLTK Tree

Parameters
l:lista tree represented as nested lists
Returns
TreeA tree corresponding to the list representation l.
@classmethod
def fromstring(cls, s, brackets='()', read_node=None, read_leaf=None, node_pattern=None, leaf_pattern=None, remove_empty_top_bracketing=False): (source)

Read a bracketed tree string and return the resulting tree. Trees are represented as nested brackettings, such as:

(S (NP (NNP John)) (VP (V runs)))
Parameters
s:strThe string to read
brackets:str (length=2)The bracket characters used to mark the beginning and end of trees and subtrees.
read_node:functionUndocumented
read_leaf:functionUndocumented
node_pattern:strUndocumented
leaf_pattern:strUndocumented
remove_empty_top_bracketing:boolIf the resulting tree has an empty node label, and is length one, then return its single child instead. This is useful for treebank trees, which sometimes contain an extra level of bracketing.
read_node, read_leaf

If specified, these functions are applied to the substrings of s corresponding to nodes and leaves (respectively) to obtain the values for those nodes and leaves. They should have the following signature:

read_node(str) -> value

For example, these functions could be used to process nodes and leaves whose values should be some type other than string (such as FeatStruct). Note that by default, node strings and leaf strings are delimited by whitespace and brackets; to override this default, use the node_pattern and leaf_pattern arguments.

node_pattern, leaf_patternRegular expression patterns used to find node and leaf substrings in s. By default, both nodes patterns are defined to match any sequence of non-whitespace non-bracket characters.
Returns
TreeA tree corresponding to the string representation s. If this class method is called using a subclass of Tree, then it will return a tree of that type.
def __add__(self, v): (source)

Undocumented

def __copy__(self): (source)

Undocumented

def __deepcopy__(self, memo): (source)

Undocumented

def __delitem__(self, index): (source)
def __eq__(self, other): (source)

Undocumented

def __getitem__(self, index): (source)

Undocumented

def __init__(self, node, children=None): (source)
def __lt__(self, other): (source)

Undocumented

def __mul__(self, v): (source)

Undocumented

def __radd__(self, v): (source)

Undocumented

def __rmul__(self, v): (source)

Undocumented

def __setitem__(self, index, value): (source)
def chomsky_normal_form(self, factor='right', horzMarkov=None, vertMarkov=0, childChar='|', parentChar='^'): (source)

This method can modify a tree in three ways:

  1. Convert a tree into its Chomsky Normal Form (CNF) equivalent -- Every subtree has either two non-terminals or one terminal as its children. This process requires the creation of more"artificial" non-terminal nodes.
  2. Markov (vertical) smoothing of children in new artificial nodes
  3. Horizontal (parent) annotation of nodes
Parameters
factor:str = [left|right]Right or left factoring method (default = "right")
horzMarkov:int | NoneMarkov order for sibling smoothing in artificial nodes (None (default) = include all siblings)
vertMarkov:int | NoneMarkov order for parent smoothing (0 (default) = no vertical annotation)
childChar:strA string used in construction of the artificial nodes, separating the head of the original subtree from the child nodes that have yet to be expanded (default = "|")
parentChar:strA string used to separate the node representation from its vertical annotation
def collapse_unary(self, collapsePOS=False, collapseRoot=False, joinChar='+'): (source)

Collapse subtrees with a single child (ie. unary productions) into a new non-terminal (Tree node) joined by 'joinChar'. This is useful when working with algorithms that do not allow unary productions, and completely removing the unary productions would require loss of useful information. The Tree is modified directly (since it is passed by reference) and no value is returned.

Parameters
collapsePOS:bool'False' (default) will not collapse the parent of leaf nodes (ie. Part-of-Speech tags) since they are always unary productions
collapseRoot:bool'False' (default) will not modify the root production if it is unary. For the Penn WSJ treebank corpus, this corresponds to the TOP -> productions.
joinChar:strA string used to connect collapsed node values (default = "+")
def copy(self, deep=False): (source)
def draw(self): (source)

Open a new window containing a graphical diagram of this tree.

def flatten(self): (source)

Return a flat version of the tree, with all non-root non-terminals removed.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> print(t.flatten())
(S the dog chased the cat)
Returns
Treea tree consisting of this tree's root connected directly to its leaves, omitting all intervening non-terminal nodes.
def freeze(self, leaf_freezer=None): (source)

Undocumented

def height(self): (source)

Return the height of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.height()
5
>>> print(t[0,0])
(D the)
>>> t[0,0].height()
2
Returns
intThe height of this tree. The height of a tree containing no children is 1; the height of a tree containing only leaves is 2; and the height of any other tree is one plus the maximum of its children's heights.
def label(self): (source)

Return the node label of the tree.

>>> t = Tree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
>>> t.label()
'S'
Returns
anythe node label (typically a string)
def leaf_treeposition(self, index): (source)
Returns
The tree position of the index-th leaf in this tree. I.e., if tp=self.leaf_treeposition(i), then self[tp]==self.leaves()[i].
Raises
IndexErrorIf this tree contains fewer than index+1 leaves, or if index<0.
def leaves(self): (source)

Return the leaves of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.leaves()
['the', 'dog', 'chased', 'the', 'cat']
Returns
lista list containing this tree's leaves. The order reflects the order of the leaves in the tree's hierarchical structure.
def pformat(self, margin=70, indent=0, nodesep='', parens='()', quotes=False): (source)
Parameters
margin:intThe right margin at which to do line-wrapping.
indent:intThe indentation level at which printing begins. This number is used to decide how far to indent subsequent lines.
nodesepA string that is used to separate the node from the children. E.g., the default value ':' gives trees like (S: (NP: I) (VP: (V: saw) (NP: it))).
parensUndocumented
quotesUndocumented
Returns
strA pretty-printed string representation of this tree.
def pformat_latex_qtree(self): (source)

Returns a representation of the tree compatible with the LaTeX qtree package. This consists of the string \Tree followed by the tree represented in bracketed notation.

For example, the following result was generated from a parse tree of the sentence The announcement astounded us:

\Tree [.I'' [.N'' [.D The ] [.N' [.N announcement ] ] ]
    [.I' [.V'' [.V' [.V astounded ] [.N'' [.N' [.N us ] ] ] ] ] ] ]

See http://www.ling.upenn.edu/advice/latex.html for the LaTeX style file for the qtree package.

Returns
strA latex qtree representation of this tree.
def pos(self): (source)

Return a sequence of pos-tagged words extracted from the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.pos()
[('the', 'D'), ('dog', 'N'), ('chased', 'V'), ('the', 'D'), ('cat', 'N')]
Returns
list(tuple)a list of tuples containing leaves and pre-terminals (part-of-speech tags). The order reflects the order of the leaves in the tree's hierarchical structure.
def pprint(self, **kwargs): (source)

Print a string representation of this Tree to 'stream'

def pretty_print(self, sentence=None, highlight=(), stream=None, **kwargs): (source)

Pretty-print this tree as ASCII or Unicode art. For explanation of the arguments, see the documentation for nltk.treeprettyprinter.TreePrettyPrinter.

def productions(self): (source)

Generate the productions that correspond to the non-terminal nodes of the tree. For each subtree of the form (P: C1 C2 ... Cn) this produces a production of the form P -> C1 C2 ... Cn.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.productions()
[S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased',
NP -> D N, D -> 'the', N -> 'cat']
Returns
list(Production)Undocumented
def set_label(self, label): (source)

Set the node label of the tree.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.set_label("T")
>>> print(t)
(T (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))
Parameters
label:anythe node label (typically a string)
def subtrees(self, filter=None): (source)

Generate all the subtrees of this tree, optionally restricted to trees matching the filter function.

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> for s in t.subtrees(lambda t: t.height() == 2):
...     print(s)
(D the)
(N dog)
(V chased)
(D the)
(N cat)
Parameters
filter:functionthe function to filter all local trees
def treeposition_spanning_leaves(self, start, end): (source)
Returns
The tree position of the lowest descendant of this tree that dominates self.leaves()[start:end].
Raises
ValueErrorif end <= start
def treepositions(self, order='preorder'): (source)

>>> t = Tree.fromstring("(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))")
>>> t.treepositions() # doctest: +ELLIPSIS
[(), (0,), (0, 0), (0, 0, 0), (0, 1), (0, 1, 0), (1,), (1, 0), (1, 0, 0), ...]
>>> for pos in t.treepositions('leaves'):
...     t[pos] = t[pos][::-1].upper()
>>> print(t)
(S (NP (D EHT) (N GOD)) (VP (V DESAHC) (NP (D EHT) (N TAC))))

Parameters
orderOne of: preorder, postorder, bothorder, leaves.
def un_chomsky_normal_form(self, expandUnary=True, childChar='|', parentChar='^', unaryChar='+'): (source)

This method modifies the tree in three ways:

  1. Transforms a tree in Chomsky Normal Form back to its original structure (branching greater than two)
  2. Removes any parent annotation (if it exists)
  3. (optional) expands unary subtrees (if previously collapsed with collapseUnary(...) )
Parameters
expandUnary:boolFlag to expand unary or not (default = True)
childChar:strA string separating the head node from its children in an artificial node (default = "|")
parentChar:strA sting separating the node label from its parent annotation (default = "^")
unaryChar:strA string joining two non-terminals in a unary production (default = "+")

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

@classmethod
def _parse_error(cls, s, match, expecting): (source)

Display a friendly error message when parsing a tree string fails. :param s: The string we're parsing. :param match: regexp match of the problem token. :param expecting: what we expected to see instead.

def _get_node(self): (source)

Outdated method to access the node value; use the label() method instead.

def _pformat_flat(self, nodesep, parens, quotes): (source)

Undocumented

def _repr_png_(self): (source)

Draws and outputs in PNG for ipython. PNG is used instead of PDF, since it can be displayed in the qt console and has wider browser support.

def _set_node(self, value): (source)

Outdated method to set the node value; use the set_label() method instead.

Undocumented