class BigramAssocMeasures(NgramAssocMeasures): (source)
A collection of bigram association measures. Each association measure is provided as a function with three arguments:
bigram_score_fn(n_ii, (n_ix, n_xi), n_xx)
The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word in question, while x indicates the appearance of any word. Thus, for example:
n_ii counts (w1, w2), i.e. the bigram being scored n_ix counts (w1, ) n_xi counts (, w2) n_xx counts (*, *), i.e. any bigram
This may be shown with respect to a contingency table:
w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx
Class Method | chi |
Scores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3. |
Class Method | fisher |
Scores bigrams using Fisher's Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy. |
Class Method | phi |
Scores bigrams using phi-square, the square of the Pearson correlation coefficient. |
Static Method | dice |
Scores bigrams using Dice's coefficient. |
Static Method | _contingency |
Calculates values of a bigram contingency table from marginal values. |
Static Method | _expected |
Calculates expected values for a contingency table. |
Static Method | _marginals |
Calculates values of contingency table marginals from its values. |
Class Variable | _n |
Undocumented |
Inherited from NgramAssocMeasures
:
Class Method | jaccard |
Scores ngrams using the Jaccard index. |
Class Method | likelihood |
Scores ngrams using likelihood ratios as in Manning and Schutze 5.3.4. |
Class Method | pmi |
Scores ngrams by pointwise mutual information, as in Manning and Schutze 5.4. |
Class Method | poisson |
Scores ngrams using the Poisson-Stirling measure. |
Class Method | student |
Scores ngrams using Student's t test with independence hypothesis for unigrams, as in Manning and Schutze 5.3.1. |
Static Method | mi |
Scores ngrams using a variant of mutual information. The keyword argument power sets an exponent (default 3) for the numerator. No logarithm of the result is calculated. |
Static Method | raw |
Scores ngrams by their frequency |
Scores bigrams using chi-square, i.e. phi-sq multiplied by the number of bigrams, as in Manning and Schutze 5.3.3.
Scores bigrams using Fisher's Exact Test (Pedersen 1996). Less sensitive to small counts than PMI or Chi Sq, but also more expensive to compute. Requires scipy.