class AbstractCollocationFinder(object): (source)
Known subclasses: nltk.collocations.BigramCollocationFinder
, nltk.collocations.QuadgramCollocationFinder
, nltk.collocations.TrigramCollocationFinder
Constructor: AbstractCollocationFinder(word_fd, ngram_fd)
An abstract base class for collocation finders whose purpose is to collect collocation candidate frequencies, filter and rank them.
As a minimum, collocation finders require the frequencies of each word in a corpus, and the joint frequency of word tuples. This data should be provided through nltk.probability.FreqDist objects or an identical interface.
Class Method | from |
Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens. |
Method | __init__ |
Undocumented |
Method | above |
Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score. |
Method | apply |
Removes candidate ngrams which have frequency less than min_freq. |
Method | apply |
Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True. |
Method | apply |
Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True. |
Method | nbest |
Returns the top n ngrams when scored by the given function. |
Method | score |
Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided. |
Instance Variable | N |
Undocumented |
Instance Variable | ngram |
Undocumented |
Instance Variable | word |
Undocumented |
Class Method | _build |
Pad the document with the place holder according to the window_size |
Static Method | _ngram |
Undocumented |
Method | _apply |
Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple. |
Method | _score |
Generates of (ngram, score) pairs as determined by the scoring function provided. |
Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.
Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.
Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.
def _build_new_documents(cls, documents, window_size, pad_left=False, pad_right=False, pad_symbol=None): (source) ¶
Pad the document with the place holder according to the window_size