class documentation

An abstract base class for collocation finders whose purpose is to collect collocation candidate frequencies, filter and rank them.

As a minimum, collocation finders require the frequencies of each word in a corpus, and the joint frequency of word tuples. This data should be provided through nltk.probability.FreqDist objects or an identical interface.

Class Method from_documents Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.
Method __init__ Undocumented
Method above_score Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.
Method apply_freq_filter Removes candidate ngrams which have frequency less than min_freq.
Method apply_ngram_filter Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True.
Method apply_word_filter Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True.
Method nbest Returns the top n ngrams when scored by the given function.
Method score_ngrams Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.
Instance Variable N Undocumented
Instance Variable ngram_fd Undocumented
Instance Variable word_fd Undocumented
Class Method _build_new_documents Pad the document with the place holder according to the window_size
Static Method _ngram_freqdist Undocumented
Method _apply_filter Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple.
Method _score_ngrams Generates of (ngram, score) pairs as determined by the scoring function provided.
@classmethod
def from_documents(cls, documents): (source)

Constructs a collocation finder given a collection of documents, each of which is a list (or iterable) of tokens.

def above_score(self, score_fn, min_score): (source)

Returns a sequence of ngrams, ordered by decreasing score, whose scores each exceed the given minimum score.

def apply_freq_filter(self, min_freq): (source)

Removes candidate ngrams which have frequency less than min_freq.

def apply_ngram_filter(self, fn): (source)

Removes candidate ngrams (w1, w2, ...) where fn(w1, w2, ...) evaluates to True.

def apply_word_filter(self, fn): (source)

Removes candidate ngrams (w1, w2, ...) where any of (fn(w1), fn(w2), ...) evaluates to True.

def nbest(self, score_fn, n): (source)

Returns the top n ngrams when scored by the given function.

def score_ngrams(self, score_fn): (source)

Returns a sequence of (ngram, score) pairs ordered from highest to lowest score, as determined by the scoring function provided.

Undocumented

ngram_fd = (source)

Undocumented

Undocumented

@classmethod
def _build_new_documents(cls, documents, window_size, pad_left=False, pad_right=False, pad_symbol=None): (source)

Pad the document with the place holder according to the window_size

@staticmethod
def _ngram_freqdist(words, n): (source)

Undocumented

def _apply_filter(self, fn=(lambda ngram, freq: False)): (source)

Generic filter removes ngrams from the frequency distribution if the function returns True when passed an ngram tuple.

def _score_ngrams(self, score_fn): (source)

Generates of (ngram, score) pairs as determined by the scoring function provided.