class KneserNeyProbDist(ProbDistI): (source)
Constructor: KneserNeyProbDist(freqdist, bins, discount)
Kneser-Ney estimate of a probability distribution. This is a version of back-off that counts how likely an n-gram is provided the n-1-gram had been seen in training. Extends the ProbDistI interface, requires a trigram FreqDist instance to train on. Optionally, a different from default discount value can be specified. The default discount is set to 0.75.
Method | __init__ |
No summary |
Method | __repr__ |
Return a string representation of this ProbDist |
Method | discount |
Return the value by which counts are discounted. By default set to 0.75. |
Method | max |
Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined. |
Method | prob |
Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1]. |
Method | samples |
Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample. |
Method | set |
Set the value by which counts are discounted to the value of discount. |
Instance Variable | _bigrams |
Undocumented |
Instance Variable | _bins |
Undocumented |
Instance Variable | _cache |
Undocumented |
Instance Variable | _D |
Undocumented |
Instance Variable | _trigrams |
Undocumented |
Instance Variable | _trigrams |
Undocumented |
Instance Variable | _wordtypes |
Undocumented |
Instance Variable | _wordtypes |
Undocumented |
Inherited from ProbDistI
:
Method | generate |
Return a randomly selected sample from this probability distribution. The probability of returning each sample samp is equal to self.prob(samp). |
Method | logprob |
Return the base 2 logarithm of the probability for a given sample. |
Constant | SUM |
True if the probabilities of the samples in this probability distribution will always sum to one. |
nltk.probability.ProbDistI.__init__
Parameters | |
freqdist:FreqDist | The trigram frequency distribution upon which to base the estimation |
bins:int or float | Included for compatibility with nltk.tag.hmm |
discount:float (preferred, but can be set to int) | The discount applied when retrieving counts of trigrams |
nltk.probability.ProbDistI.discount
Return the value by which counts are discounted. By default set to 0.75.
Returns | |
float | Undocumented |
nltk.probability.ProbDistI.max
Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Returns | |
any | Undocumented |
nltk.probability.ProbDistI.prob
Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
Parameters | |
trigram | Undocumented |
sample:any | The sample whose probability should be returned. |
Returns | |
float | Undocumented |
nltk.probability.ProbDistI.samples
Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
Returns | |
list | Undocumented |