class documentation

class TextCollection(Text): (source)

Constructor: TextCollection(source)

View In Hierarchy

A collection of texts, which can be loaded with list of texts, or with a corpus consisting of one or more texts, and which supports counting, concordancing, collocation discovery, etc. Initialize a TextCollection as follows:

>>> import nltk.corpus
>>> from nltk.text import TextCollection
>>> print('hack'); from nltk.book import text1, text2, text3
hack...
>>> gutenberg = TextCollection(nltk.corpus.gutenberg)
>>> mytexts = TextCollection([text1, text2, text3])

Iterating over a TextCollection produces all the tokens of all the texts in order.

Method __init__ Create a Text object.
Method idf The number of texts in the corpus divided by the number of texts that the term appears in. If a term does not appear in the corpus, 0.0 is returned.
Method tf The frequency of the term in text.
Method tf_idf Undocumented
Instance Variable _idf_cache Undocumented
Instance Variable _texts Undocumented

Inherited from Text:

Method __getitem__ Undocumented
Method __len__ Undocumented
Method __repr__ Undocumented
Method __str__ Undocumented
Method collocation_list Return collocations derived from the text, ignoring stopwords.
Method collocations Print collocations derived from the text, ignoring stopwords.
Method common_contexts Find contexts where the specified words appear; list most frequent common contexts first.
Method concordance Prints a concordance for word with the specified context window. Word matching is not case-sensitive.
Method concordance_list Generate a concordance for word with the specified context window. Word matching is not case-sensitive.
Method count Count the number of times this word appears in the text.
Method dispersion_plot Produce a plot showing the distribution of the words through the text. Requires pylab to be installed.
Method findall Find instances of the regular expression in the text. The text is a list of tokens, and a regexp pattern to match a single token must be surrounded by angle brackets. E.g.
Method generate Print random text, generated using a trigram language model. See also help(nltk.lm).
Method index Find the index of the first occurrence of the word in the text.
Method plot See documentation for FreqDist.plot() :seealso: nltk.prob.FreqDist.plot()
Method readability Undocumented
Method similar Distributional similarity: find other words which appear in the same contexts as the specified word; list most similar words first.
Method vocab No summary
Instance Variable name Undocumented
Instance Variable tokens Undocumented
Method _context One left & one right token, both case-normalized. Skip over non-sentence-final punctuation. Used by the ContextIndex that is created for similar() and common_contexts().
Method _train_default_ngram_lm Undocumented
Constant _CONTEXT_RE Undocumented
Constant _COPY_TOKENS Undocumented
Instance Variable _collocations Undocumented
Instance Variable _concordance_index Undocumented
Instance Variable _num Undocumented
Instance Variable _token_searcher Undocumented
Instance Variable _tokenized_sents Undocumented
Instance Variable _trigram_model Undocumented
Instance Variable _vocab Undocumented
Instance Variable _window_size Undocumented
Instance Variable _word_context_index Undocumented
def __init__(self, source): (source)

Create a Text object.

Parameters
sourceUndocumented
tokens:sequence of strThe source text.
def idf(self, term): (source)

The number of texts in the corpus divided by the number of texts that the term appears in. If a term does not appear in the corpus, 0.0 is returned.

def tf(self, term, text): (source)

The frequency of the term in text.

def tf_idf(self, term, text): (source)

Undocumented

_idf_cache: dict = (source)

Undocumented

Undocumented