class documentation

class BigramAssocMeasures(NgramAssocMeasures): (source)

View In Hierarchy

A collection of bigram association measures. Each association measure is provided as a function with three arguments:

bigram_score_fn(n_ii, (n_ix, n_xi), n_xx)

The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example:

n_ii counts (w1, w2), i.e. the bigram being scored n_ix counts (w1, ) n_xi counts (, w2) n_xx counts (*, *), i.e. any bigram

This may be shown with respect to a contingency table:

        w1    ~w1
     ------ ------
 w2 | n_ii | n_oi | = n_xi
     ------ ------
~w2 | n_io | n_oo |
     ------ ------
     = n_ix        TOTAL = n_xx
Class Method chi_sq Scores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3.
Class Method fisher Scores bigrams using Fisher's Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy.
Class Method phi_sq Scores bigrams using phi-square, the square of the Pearson correlation coefficient.
Static Method dice Scores bigrams using Dice's coefficient.
Static Method _contingency Calculates values of a bigram contingency table from marginal values.
Static Method _expected_values Calculates expected values for a contingency table.
Static Method _marginals Calculates values of contingency table marginals from its values.
Class Variable _n Undocumented

Inherited from NgramAssocMeasures:

Class Method jaccard Scores ngrams using the Jaccard index.
Class Method likelihood_ratio Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4.
Class Method pmi Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4.
Class Method poisson_stirling Scores ngrams using the Poisson-Stirling measure.
Class Method student_t Scores ngrams using Student's t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1.
Static Method mi_like Scores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated.
Static Method raw_freq Scores ngrams by their frequency
@classmethod
def chi_sq(cls, n_ii, n_ix_xi_tuple, n_xx): (source)

Scores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3.

@classmethod
def fisher(cls, *marginals): (source)

Scores bigrams using Fisher's Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy.

@classmethod
def phi_sq(cls, *marginals): (source)

Scores bigrams using phi-square, the square of the Pearson correlation coefficient.

@staticmethod
def dice(n_ii, n_ix_xi_tuple, n_xx): (source)

Scores bigrams using Dice's coefficient.

@staticmethod
def _contingency(n_ii, n_ix_xi_tuple, n_xx): (source)

Calculates values of a bigram contingency table from marginal values.

@staticmethod
def _expected_values(cont): (source)

Calculates expected values for a contingency table.

@staticmethod
def _marginals(n_ii, n_oi, n_io, n_oo): (source)

Calculates values of contingency table marginals from its values.