class documentation
A collection of texts, which can be loaded with list of texts, or with a corpus consisting of one or more texts, and which supports counting, concordancing, collocation discovery, etc. Initialize a TextCollection as follows:
>>> import nltk.corpus >>> from nltk.text import TextCollection >>> print('hack'); from nltk.book import text1, text2, text3 hack... >>> gutenberg = TextCollection(nltk.corpus.gutenberg) >>> mytexts = TextCollection([text1, text2, text3])
Iterating over a TextCollection produces all the tokens of all the texts in order.
Method | __init__ |
Create a Text object. |
Method | idf |
The number of texts in the corpus divided by the number of texts that the term appears in. If a term does not appear in the corpus, 0.0 is returned. |
Method | tf |
The frequency of the term in text. |
Method | tf |
Undocumented |
Instance Variable | _idf |
Undocumented |
Instance Variable | _texts |
Undocumented |
Inherited from Text
:
Method | __getitem__ |
Undocumented |
Method | __len__ |
Undocumented |
Method | __repr__ |
Undocumented |
Method | __str__ |
Undocumented |
Method | collocation |
Return collocations derived from the text, ignoring stopwords. |
Method | collocations |
Print collocations derived from the text, ignoring stopwords. |
Method | common |
Find contexts where the specified words appear; list most frequent common contexts first. |
Method | concordance |
Prints a concordance for word with the specified context window. Word matching is not case-sensitive. |
Method | concordance |
Generate a concordance for word with the specified context window. Word matching is not case-sensitive. |
Method | count |
Count the number of times this word appears in the text. |
Method | dispersion |
Produce a plot showing the distribution of the words through the text. Requires pylab to be installed. |
Method | findall |
Find instances of the regular expression in the text. The text is a list of tokens, and a regexp pattern to match a single token must be surrounded by angle brackets. E.g. |
Method | generate |
Print random text, generated using a trigram language model. See also help(nltk.lm) . |
Method | index |
Find the index of the first occurrence of the word in the text. |
Method | plot |
See documentation for FreqDist.plot() :seealso: nltk.prob.FreqDist.plot() |
Method | readability |
Undocumented |
Method | similar |
Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first. |
Method | vocab |
No summary |
Instance Variable | name |
Undocumented |
Instance Variable | tokens |
Undocumented |
Method | _context |
One left & one right token, both case-normalized. Skip over non-sentence-final punctuation. Used by the ContextIndex that is created for similar() and common_contexts(). |
Method | _train |
Undocumented |
Constant | _CONTEXT |
Undocumented |
Constant | _COPY |
Undocumented |
Instance Variable | _collocations |
Undocumented |
Instance Variable | _concordance |
Undocumented |
Instance Variable | _num |
Undocumented |
Instance Variable | _token |
Undocumented |
Instance Variable | _tokenized |
Undocumented |
Instance Variable | _trigram |
Undocumented |
Instance Variable | _vocab |
Undocumented |
Instance Variable | _window |
Undocumented |
Instance Variable | _word |
Undocumented |
overrides
nltk.text.Text.__init__
Create a Text object.
Parameters | |
source | Undocumented |
tokens:sequence of str | The source text. |