nltk.corpus.reader.wordnet.Synset

class documentation

class Synset(_WordNetObject): (source)

Constructor: Synset(wordnet_corpus_reader)

Create a Synset from a "<lemma>.<pos>.<number>" string where: <lemma> is the word's morphological stem <pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB <number> is the sense number, counting from 0.

Synset attributes, accessible via methods with the same name:

name: The canonical name of this synset, formed using the first lemma of this synset. Note that this may be different from the name passed to the constructor if that string used a different lemma to identify the synset.
pos: The synset's part of speech, matching one of the module level attributes ADJ, ADJ_SAT, ADV, NOUN or VERB.
lemmas: A list of the Lemma objects for this synset.
definition: The definition for this synset.
examples: A list of example strings for this synset.
offset: The offset in the WordNet dict file of this synset.
lexname: The name of the lexicographer file containing this synset.

Synset methods:

Synsets have the following methods for retrieving related Synsets. They correspond to the names for the pointer symbols defined here: https://wordnet.princeton.edu/documentation/wninput5wn These methods all return lists of Synsets.

hypernyms, instance_hypernyms
hyponyms, instance_hyponyms
member_holonyms, substance_holonyms, part_holonyms
member_meronyms, substance_meronyms, part_meronyms
attributes
entailments
causes
also_sees
verb_groups
similar_tos

Additionally, Synsets support the following methods specific to the hypernym relation:

root_hypernyms
common_hypernyms
lowest_common_hypernyms

Note that Synsets do not support the following relations because these are defined by WordNet as lexical relations:

antonyms
derivationally_related_forms
pertainyms

Method	`__init__`	Undocumented
Method	`__repr__`	Undocumented
Method	`closure`	Return the transitive closure of source under the rel relationship, breadth-first, discarding cycles:
Method	`common_hypernyms`	Find all synsets that are hypernyms of this synset and the other synset.
Method	`definition`	Undocumented
Method	`examples`	Undocumented
Method	`frame_ids`	Undocumented
Method	`hypernym_distances`	Get the path(s) from this synset to the root, counting the distance of each node from the initial node on the way. A set of (synset, distance) tuples is returned.
Method	`hypernym_paths`	Get the path(s) from this synset to the root, where each path is a list of the synset nodes traversed on the way to the root.
Method	`jcn_similarity`	Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets...
Method	`lch_similarity`	Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur...
Method	`lemma_names`	Return all the lemma_names associated with the synset
Method	`lemmas`	Return all the lemma objects associated with the synset
Method	`lexname`	Undocumented
Method	`lin_similarity`	Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets...
Method	`lowest_common_hypernyms`	Get a list of lowest synset(s) that both synsets have as a hypernym. When `use_min_depth == False` this means that the synset which appears as a hypernym of both `self` and `other` with the lowest maximum depth is returned or if there are multiple such synsets at the same depth they are all returned...
Method	`max_depth`	synset to the root.
Method	`min_depth`	synset to the root.
Method	`name`	Undocumented
Method	`offset`	Undocumented
Method	`path_similarity`	Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, except in those cases where a path cannot be found (will only be true for verbs as there are many distinct verb taxonomies), in which case None is returned...
Method	`pos`	Undocumented
Method	`res_similarity`	Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).
Method	`root_hypernyms`	Get the topmost hypernyms of this synset in WordNet.
Method	`shortest_path_distance`	Returns the distance of the shortest path linking the two synsets (if one exists). For each synset, all the ancestor nodes and their distances are recorded and compared. The ancestor node common to both synsets that can be reached with the minimum number of traversals is used...
Method	`tree`	Return the full relation tree, including self, discarding cycles:
Method	`wup_similarity`	Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node)...
Class Variable	`__slots__`	Undocumented
Method	`_iter_hypernym_lists`	hypernyms or instance of hypernyms of the synset.
Method	`_needs_root`	Undocumented
Method	`_related`	Undocumented
Method	`_shortest_hypernym_paths`	Undocumented
Instance Variable	`_all_hypernyms`	Undocumented
Instance Variable	`_definition`	Undocumented
Instance Variable	`_examples`	Undocumented
Instance Variable	`_frame_ids`	Undocumented
Instance Variable	`_lemma_names`	Undocumented
Instance Variable	`_lemma_pointers`	Undocumented
Instance Variable	`_lemmas`	Undocumented
Instance Variable	`_lexname`	Undocumented
Instance Variable	`_max_depth`	Undocumented
Instance Variable	`_min_depth`	Undocumented
Instance Variable	`_name`	Undocumented
Instance Variable	`_offset`	Undocumented
Instance Variable	`_pointers`	Undocumented
Instance Variable	`_pos`	Undocumented
Instance Variable	`_wordnet_corpus_reader`	Undocumented

Inherited from _WordNetObject:

Method	`__eq__`	Undocumented
Method	`__hash__`	Undocumented
Method	`__lt__`	Undocumented
Method	`__ne__`	Undocumented
Method	`also_sees`	Undocumented
Method	`attributes`	Undocumented
Method	`causes`	Undocumented
Method	`entailments`	Undocumented
Method	`hypernyms`	Undocumented
Method	`hyponyms`	Undocumented
Method	`in_region_domains`	Undocumented
Method	`in_topic_domains`	Undocumented
Method	`in_usage_domains`	Undocumented
Method	`instance_hypernyms`	Undocumented
Method	`instance_hyponyms`	Undocumented
Method	`member_holonyms`	Undocumented
Method	`member_meronyms`	Undocumented
Method	`part_holonyms`	Undocumented
Method	`part_meronyms`	Undocumented
Method	`region_domains`	Undocumented
Method	`similar_tos`	Undocumented
Method	`substance_holonyms`	Undocumented
Method	`substance_meronyms`	Undocumented
Method	`topic_domains`	Undocumented
Method	`usage_domains`	Undocumented
Method	`verb_groups`	Undocumented
Method	`_hypernyms`	Undocumented
Method	`_instance_hypernyms`	Undocumented

def __init__(self, wordnet_corpus_reader): (source) ¶

Undocumented

def __repr__(self): (source) ¶

Undocumented

def closure(self, rel, depth=-1): (source) ¶

Return the transitive closure of source under the rel relationship, breadth-first, discarding cycles:

>>> from nltk.corpus import wordnet as wn
>>> computer = wn.synset('computer.n.01')
>>> topic = lambda s:s.topic_domains()
>>> print(list(computer.closure(topic)))
[Synset('computer_science.n.01')]

UserWarning: Discarded redundant search for Synset('computer.n.01') at depth 2

Include redundant pathes (but only once), avoiding duplicate searches (from 'animal.n.01' to 'entity.n.01'):

>>> dog = wn.synset('dog.n.01')
>>> hyp = lambda s:s.hypernyms()
>>> print(list(dog.closure(hyp)))
[Synset('canine.n.02'), Synset('domestic_animal.n.01'), Synset('carnivore.n.01'),
Synset('animal.n.01'), Synset('placental.n.01'), Synset('organism.n.01'),
Synset('mammal.n.01'), Synset('living_thing.n.01'), Synset('vertebrate.n.01'),
Synset('whole.n.02'), Synset('chordate.n.01'), Synset('object.n.01'),
Synset('physical_entity.n.01'), Synset('entity.n.01')]

UserWarning: Discarded redundant search for Synset('animal.n.01') at depth 7

def common_hypernyms(self, other): (source) ¶

Find all synsets that are hypernyms of this synset and the other synset.

Parameters
other:Synset	other input synset.
Returns
The synsets that are hypernyms of both synsets.

def definition(self): (source) ¶

Undocumented

def examples(self): (source) ¶

Undocumented

def frame_ids(self): (source) ¶

Undocumented

def hypernym_distances(self, distance=0, simulate_root=False): (source) ¶

Get the path(s) from this synset to the root, counting the distance of each node from the initial node on the way. A set of (synset, distance) tuples is returned.

Parameters
distance:int	the distance (number of edges) from this hypernym to the original hypernym `Synset` on which this method was called.
simulate_root	Undocumented
Returns
A set of `(Synset, int)` tuples where each `Synset` is a hypernym of the first `Synset`.

def hypernym_paths(self): (source) ¶

Get the path(s) from this synset to the root, where each path is a list of the synset nodes traversed on the way to the root.

Returns
A list of lists, where each list gives the node sequence connecting the initial `Synset` node and a root node.

def jcn_similarity(self, other, ic, verbose=False): (source) ¶

Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
ic:dict	an information content object (as returned by `nltk.corpus.wordnet_ic.ic()`).
verbose	Undocumented
Returns
A float score denoting the similarity of the two `Synset` objects.

def lch_similarity(self, other, verbose=False, simulate_root=True): (source) ¶

Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d is the taxonomy depth.

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
verbose	Undocumented
simulate_root:bool	The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns
A score denoting the similarity of the two `Synset` objects, normally greater than 0. None is returned if no connecting path could be found. If a `Synset` is compared with itself, the maximum score is returned, which varies depending on the taxonomy depth.

def lemma_names(self, lang='eng'): (source) ¶

Return all the lemma_names associated with the synset

def lemmas(self, lang='eng'): (source) ¶

Return all the lemma objects associated with the synset

def lexname(self): (source) ¶

Undocumented

def lin_similarity(self, other, ic, verbose=False): (source) ¶

Lin Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
ic:dict	an information content object (as returned by `nltk.corpus.wordnet_ic.ic()`).
verbose	Undocumented
Returns
A float score denoting the similarity of the two `Synset` objects, in the range 0 to 1.

def lowest_common_hypernyms(self, other, simulate_root=False, use_min_depth=False): (source) ¶

Get a list of lowest synset(s) that both synsets have as a hypernym. When use_min_depth == False this means that the synset which appears as a hypernym of both self and other with the lowest maximum depth is returned or if there are multiple such synsets at the same depth they are all returned

However, if use_min_depth == True then the synset(s) which has/have the lowest minimum depth and appear(s) in both paths is/are returned.

By setting the use_min_depth flag to True, the behavior of NLTK2 can be preserved. This was changed in NLTK3 to give more accurate results in a small set of cases, generally with synsets concerning people. (eg: 'chef.n.01', 'fireman.n.01', etc.)

This method is an implementation of Ted Pedersen's "Lowest Common Subsumer" method from the Perl Wordnet module. It can return either "self" or "other" if they are a hypernym of the other.

Parameters
other:Synset	other input synset
simulate_root:bool	The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (False by default) creates a fake root that connects all the taxonomies. Set it to True to enable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will need to be added for nouns as well.
use_min_depth:bool	This setting mimics older (v2) behavior of NLTK wordnet If True, will use the min_depth function to calculate the lowest common hypernyms. This is known to give strange results for some synset pairs (eg: 'chef.n.01', 'fireman.n.01') but is retained for backwards compatibility
Returns
The synsets that are the lowest common hypernyms of both synsets

def max_depth(self): (source) ¶

synset to the root.

Returns
The length of the longest hypernym path from this

def min_depth(self): (source) ¶

synset to the root.

Returns
The length of the shortest hypernym path from this

def name(self): (source) ¶

Undocumented

def offset(self): (source) ¶

Undocumented

def path_similarity(self, other, verbose=False, simulate_root=True): (source) ¶

Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, except in those cases where a path cannot be found (will only be true for verbs as there are many distinct verb taxonomies), in which case None is returned. A score of 1 represents identity i.e. comparing a sense with itself will return 1.

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
verbose	Undocumented
simulate_root:bool	The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns
A score denoting the similarity of the two `Synset` objects, normally between 0 and 1. None is returned if no connecting path could be found. 1 is returned if a `Synset` is compared with itself.

def pos(self): (source) ¶

Undocumented

def res_similarity(self, other, ic, verbose=False): (source) ¶

Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node).

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
ic:dict	an information content object (as returned by `nltk.corpus.wordnet_ic.ic()`).
verbose	Undocumented
Returns
A float score denoting the similarity of the two `Synset` objects. Synsets whose LCS is the root node of the taxonomy will have a score of 0 (e.g. N['dog'][0] and N['table'][0]).

def root_hypernyms(self): (source) ¶

Get the topmost hypernyms of this synset in WordNet.

def shortest_path_distance(self, other, simulate_root=False): (source) ¶

Returns the distance of the shortest path linking the two synsets (if one exists). For each synset, all the ancestor nodes and their distances are recorded and compared. The ancestor node common to both synsets that can be reached with the minimum number of traversals is used. If no ancestor nodes are common, None is returned. If a node is compared with itself 0 is returned.

Parameters
other:Synset	The Synset to which the shortest path will be found.
simulate_root	Undocumented
Returns
The number of edges in the shortest path connecting the two nodes, or None if no path exists.

def tree(self, rel, depth=-1, cut_mark=None): (source) ¶

Return the full relation tree, including self, discarding cycles:

>>> from nltk.corpus import wordnet as wn
>>> from pprint import pprint
>>> computer = wn.synset('computer.n.01')
>>> topic = lambda s:s.topic_domains()
>>> pprint(computer.tree(topic))
[Synset('computer.n.01'), [Synset('computer_science.n.01')]]

UserWarning: Discarded redundant search for Synset('computer.n.01') at depth -3

But keep duplicate branches (from 'animal.n.01' to 'entity.n.01'):

>>> dog = wn.synset('dog.n.01')
>>> hyp = lambda s:s.hypernyms()
>>> pprint(dog.tree(hyp))
[Synset('dog.n.01'),
 [Synset('canine.n.02'),
  [Synset('carnivore.n.01'),
   [Synset('placental.n.01'),
    [Synset('mammal.n.01'),
     [Synset('vertebrate.n.01'),
      [Synset('chordate.n.01'),
       [Synset('animal.n.01'),
        [Synset('organism.n.01'),
         [Synset('living_thing.n.01'),
          [Synset('whole.n.02'),
           [Synset('object.n.01'),
            [Synset('physical_entity.n.01'),
             [Synset('entity.n.01')]]]]]]]]]]]]],
 [Synset('domestic_animal.n.01'),
  [Synset('animal.n.01'),
   [Synset('organism.n.01'),
    [Synset('living_thing.n.01'),
     [Synset('whole.n.02'),
      [Synset('object.n.01'),
       [Synset('physical_entity.n.01'), [Synset('entity.n.01')]]]]]]]]]

def wup_similarity(self, other, verbose=False, simulate_root=True): (source) ¶

Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Previously, the scores computed by this implementation did _not_ always agree with those given by Pedersen's Perl implementation of WordNet Similarity. However, with the addition of the simulate_root flag (see below), the score for verbs now almost always agree but not always for nouns.

The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by definition the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation.

Parameters
other:Synset	The `Synset` that this `Synset` is being compared to.
verbose	Undocumented
simulate_root:bool	The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well.
Returns
A float score denoting the similarity of the two `Synset` objects, normally greater than zero. If no connecting path between the two senses can be found, None is returned.