nltk.collocations.AbstractCollocationFinder

class documentation

class AbstractCollocationFinder(object): (source)

Known subclasses: nltk.collocations.BigramCollocationFinder, nltk.collocations.QuadgramCollocationFinder, nltk.collocations.TrigramCollocationFinder

Constructor: AbstractCollocationFinder(word_fd, ngram_fd)

View In Hierarchy

An abstract base class for collocation finders whose purpose is to collect collocation candidate frequencies, filter and rank them.

As a minimum, collocation finders require the frequencies of each word in a corpus, and the joint frequency of word tuples. This data should be provided through nltk.probability.FreqDist objects or an identical interface.

Class Method	`from_documents`	Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.
Method	`__init__`	Undocumented
Method	`above_score`	Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.
Method	`apply_freq_filter`	Removes candidate ngrams which have frequency less than min_freq.
Method	`apply_ngram_filter`	Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True.
Method	`apply_word_filter`	Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True.
Method	`nbest`	Returns the top n ngrams when scored by the given function.
Method	`score_ngrams`	Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.
Instance Variable	`N`	Undocumented
Instance Variable	`ngram_fd`	Undocumented
Instance Variable	`word_fd`	Undocumented
Class Method	`_build_new_documents`	Pad the document with the place holder according to the window_size
Static Method	`_ngram_freqdist`	Undocumented
Method	`_apply_filter`	Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple.
Method	`_score_ngrams`	Generates of (ngram, score) pairs as determined by the scoring function provided.

@classmethod
def from_documents(cls, documents): (source) ¶

Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.

def __init__(self, word_fd, ngram_fd): (source) ¶

overridden in nltk.collocations.BigramCollocationFinder, nltk.collocations.QuadgramCollocationFinder, nltk.collocations.TrigramCollocationFinder

Undocumented

def above_score(self, score_fn, min_score): (source) ¶

Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.

def apply_freq_filter(self, min_freq): (source) ¶

Removes candidate ngrams which have frequency less than min_freq.

def apply_ngram_filter(self, fn): (source) ¶

Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True.

def apply_word_filter(self, fn): (source) ¶

Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True.

def nbest(self, score_fn, n): (source) ¶

Returns the top n ngrams when scored by the given function.

def score_ngrams(self, score_fn): (source) ¶

Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.

N = (source) ¶

Undocumented

ngram_fd = (source) ¶

Undocumented

word_fd = (source) ¶

Undocumented

@classmethod
def _build_new_documents(cls, documents, window_size, pad_left=False, pad_right=False, pad_symbol=None): (source) ¶

Pad the document with the place holder according to the window_size

@staticmethod
def _ngram_freqdist(words, n): (source) ¶

Undocumented

def _apply_filter(self, fn=(lambda ngram, freq: False)): (source) ¶

Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple.

def _score_ngrams(self, score_fn): (source) ¶

Generates of (ngram, score) pairs as determined by the scoring function provided.