class WordNetCorpusReader(CorpusReader): (source)
Constructor: WordNetCorpusReader(root, omw_reader)
A corpus reader used to access wordnet or its variants.
Method | __init__ |
Construct a new wordnet corpus reader, with the given root directory. |
Method | all |
Return all lemma names for all synsets for the given part of speech tag and language or languages. If pos is not specified, all synsets for all parts of speech will be used. |
Method | all |
Iterate over all synsets with a given part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. |
Method | citation |
Return the contents of citation.bib file (for omw) use lang=lang to get the citation for an individual language |
Method | custom |
Reads a custom tab file containing mappings of lemmas in the given language to Princeton WordNet 3.0 synset offsets, allowing NLTK's WordNet functions to then be used with that language. |
Method | get |
Undocumented |
Method | ic |
Creates an information content lookup dictionary from a corpus. |
Method | jcn |
Undocumented |
Method | langs |
return a list of languages supported by Multilingual Wordnet |
Method | lch |
Undocumented |
Method | lemma |
Return lemma object that matches the name |
Method | lemma |
Return the frequency count for this Lemma |
Method | lemma |
Undocumented |
Method | lemmas |
Return all Lemma objects with a name matching the specified lemma name and part of speech tag. Matches any part of speech tag if none is specified. |
Method | license |
Return the contents of LICENSE (for omw) use lang=lang to get the license for an individual language |
Method | lin |
Undocumented |
Method | morphy |
Find a possible base form for the given form, with the given part of speech, by checking WordNet's list of exceptional forms, and by recursively stripping affixes for this part of speech until a form in WordNet is found. |
Method | of2ss |
take an id and return the synsets |
Method | path |
Undocumented |
Method | readme |
Return the contents of README (for omw) use lang=lang to get the readme for an individual language |
Method | res |
Undocumented |
Method | ss2of |
return the ID of the synset |
Method | synset |
Undocumented |
Method | synset |
pos: The synset's part of speech, matching one of the module level attributes ADJ, ADJ_SAT, ADV, NOUN or VERB ('a', 's', 'r', 'n', or 'v'). |
Method | synset |
Retrieves synset based on a given sense_key. Sense keys can be obtained from lemma.key() |
Method | synsets |
Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned. |
Method | words |
return lemmas of the given language as list of words |
Method | wup |
Undocumented |
Constant | MORPHOLOGICAL |
Undocumented |
Class Variable | ADJ |
Undocumented |
Class Variable | ADJ |
Undocumented |
Class Variable | ADV |
Undocumented |
Class Variable | NOUN |
Undocumented |
Class Variable | VERB |
Undocumented |
Method | _compute |
Compute the max depth for the given part of speech. This is used by the lch similarity metric. |
Method | _data |
Return an open file pointer for the data file for the given part of speech. |
Method | _load |
Undocumented |
Method | _load |
load the wordnet data of the requested language from the file to the cache, _lang_data |
Method | _load |
Undocumented |
Method | _morphy |
Undocumented |
Method | _synset |
Undocumented |
Method | _synset |
Hack to help people like the readers of http://stackoverflow.com/a/27145655/1709587 who were using this function before it was officially a public method |
Constant | _ENCODING |
Undocumented |
Constant | _FILEMAP |
Undocumented |
Constant | _FILES |
Undocumented |
Class Variable | _pos |
Undocumented |
Class Variable | _pos |
Undocumented |
Instance Variable | _data |
Undocumented |
Instance Variable | _exception |
Undocumented |
Instance Variable | _key |
Undocumented |
Instance Variable | _key |
Undocumented |
Instance Variable | _lang |
Undocumented |
Instance Variable | _lemma |
Undocumented |
Instance Variable | _lexnames |
Undocumented |
Instance Variable | _max |
Undocumented |
Instance Variable | _omw |
Undocumented |
Instance Variable | _synset |
Undocumented |
Inherited from CorpusReader
:
Method | __repr__ |
Undocumented |
Method | abspath |
Return the absolute path for the given file. |
Method | abspaths |
Return a list of the absolute paths for all fileids in this corpus; or for the given list of fileids, if specified. |
Method | encoding |
Return the unicode encoding for the given corpus file, if known. If the encoding is unknown, or if the given file should be processed using byte strings (str), then return None. |
Method | ensure |
Load this corpus (if it has not already been loaded). This is used by LazyCorpusLoader as a simple method that can be used to make sure a corpus is loaded -- e.g., in case a user wants to do help(some_corpus). |
Method | fileids |
Return a list of file identifiers for the fileids that make up this corpus. |
Method | open |
Return an open stream that can be used to read the given file. If the file's encoding is not None, then the stream will automatically decode the file's contents into unicode. |
Class Variable | root |
Undocumented |
Method | _get |
Undocumented |
Instance Variable | _encoding |
The default unicode encoding for the fileids that make up this corpus. If encoding is None, then the file contents are processed using byte strings. |
Instance Variable | _fileids |
A list of the relative paths for the fileids that make up this corpus. |
Instance Variable | _root |
The root directory for this corpus. |
Instance Variable | _tagset |
Undocumented |
nltk.corpus.reader.CorpusReader.__init__
Construct a new wordnet corpus reader, with the given root directory.
Return all lemma names for all synsets for the given part of speech tag and language or languages. If pos is not specified, all synsets for all parts of speech will be used.
Iterate over all synsets with a given part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded.
nltk.corpus.reader.CorpusReader.citation
Return the contents of citation.bib file (for omw) use lang=lang to get the citation for an individual language
Reads a custom tab file containing mappings of lemmas in the given language to Princeton WordNet 3.0 synset offsets, allowing NLTK's WordNet functions to then be used with that language.
See the "Tab files" section at http://compling.hss.ntu.edu.sg/omw/ for documentation on the Multilingual WordNet tab file format.
:type lang str :param lang ISO 639-3 code of the language of the tab file
Parameters | |
tab | Tab file as a file or file-like object |
lang | Undocumented |
Creates an information content lookup dictionary from a corpus.
content dictionary. :type weight_senses_equally: bool :param weight_senses_equally: If this is True, gives all possible senses equal weight rather than dividing by the number of possible senses. (If a word has 3 synses, each sense gets 0.3333 per appearance when this is False, 1.0 when it is true.) :param smoothing: How much do we smooth synset counts (default is 1.0) :type smoothing: float :return: An information content dictionary
Parameters | |
corpus:CorpusReader | The corpus from which we create an information |
weight | Undocumented |
smoothing | Undocumented |
Return all Lemma objects with a name matching the specified lemma name and part of speech tag. Matches any part of speech tag if none is specified.
nltk.corpus.reader.CorpusReader.license
Return the contents of LICENSE (for omw) use lang=lang to get the license for an individual language
Find a possible base form for the given form, with the given part of speech, by checking WordNet's list of exceptional forms, and by recursively stripping affixes for this part of speech until a form in WordNet is found.
>>> from nltk.corpus import wordnet as wn >>> print(wn.morphy('dogs')) dog >>> print(wn.morphy('churches')) church >>> print(wn.morphy('aardwolves')) aardwolf >>> print(wn.morphy('abaci')) abacus >>> wn.morphy('hardrock', wn.ADV) >>> print(wn.morphy('book', wn.NOUN)) book >>> wn.morphy('book', wn.ADJ)
nltk.corpus.reader.CorpusReader.readme
Return the contents of README (for omw) use lang=lang to get the readme for an individual language
- pos: The synset's part of speech, matching one of the module level attributes ADJ, ADJ_SAT, ADV, NOUN or VERB ('a', 's', 'r', 'n', or 'v').
- offset: The byte offset of this synset in the WordNet dict file for this pos.
>>> from nltk.corpus import wordnet as wn >>> print(wn.synset_from_pos_and_offset('n', 1740)) Synset('entity.n.01')
Retrieves synset based on a given sense_key. Sense keys can be obtained from lemma.key()
From https://wordnet.princeton.edu/documentation/senseidx5wn: A sense_key is represented as:
lemma % lex_sense (e.g. 'dog%1:18:01::')
- where lex_sense is encoded as:
- ss_type:lex_filenum:lex_id:head_word:head_id
lemma: ASCII text of word/collocation, in lower case ss_type: synset type for the sense (1 digit int)
The synset type is encoded as follows: 1 NOUN 2 VERB 3 ADJECTIVE 4 ADVERB 5 ADJECTIVE SATELLITE
lex_filenum: name of lexicographer file containing the synset for the sense (2 digit int) lex_id: when paired with lemma, uniquely identifies a sense in the lexicographer file (2 digit int) head_word: lemma of the first word in satellite's head synset
Only used if sense is in an adjective satellite synset
- head_id: uniquely identifies sense in a lexicographer file when paired with head_word
- Only used if head_word is present (2 digit int)
>>> import nltk >>> from nltk.corpus import wordnet as wn >>> print(wn.synset_from_sense_key("drive%1:04:03::")) Synset('drive.n.06')
>>> print(wn.synset_from_sense_key("driving%1:04:03::")) Synset('drive.n.06')
Load all synsets with a given lemma and part of speech tag. If no pos is specified, all synsets for all parts of speech will be loaded. If lang is specified, all the synsets associated with the lemma name of that language will be returned.
Undocumented
Value |
|
def _synset_from_pos_and_offset(self, *args, **kwargs): (source) ¶
Hack to help people like the readers of http://stackoverflow.com/a/27145655/1709587 who were using this function before it was officially a public method