nltk.probability.KneserNeyProbDist

class documentation

class KneserNeyProbDist(ProbDistI): (source)

Constructor: KneserNeyProbDist(freqdist, bins, discount)

Kneser-Ney estimate of a probability distribution. This is a version of back-off that counts how likely an n-gram is provided the n-1-gram had been seen in training. Extends the ProbDistI interface, requires a trigram FreqDist instance to train on. Optionally, a different from default discount value can be specified. The default discount is set to 0.75.

Method	`__init__`	No summary
Method	`__repr__`	Return a string representation of this ProbDist
Method	`discount`	Return the value by which counts are discounted. By default set to 0.75.
Method	`max`	Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Method	`prob`	Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
Method	`samples`	Return a list of all samples that have nonzero probabilities. Use `prob` to find the probability of each sample.
Method	`set_discount`	Set the value by which counts are discounted to the value of discount.
Instance Variable	`_bigrams`	Undocumented
Instance Variable	`_bins`	Undocumented
Instance Variable	`_cache`	Undocumented
Instance Variable	`_D`	Undocumented
Instance Variable	`_trigrams`	Undocumented
Instance Variable	`_trigrams_contain`	Undocumented
Instance Variable	`_wordtypes_after`	Undocumented
Instance Variable	`_wordtypes_before`	Undocumented

Inherited from ProbDistI:

Method	`generate`	Return a randomly selected sample from this probability distribution. The probability of returning each sample `samp` is equal to `self.prob(samp)`.
Method	`logprob`	Return the base 2 logarithm of the probability for a given sample.
Constant	`SUM_TO_ONE`	True if the probabilities of the samples in this probability distribution will always sum to one.

def __init__(self, freqdist, bins=None, discount=0.75): (source) ¶

overrides nltk.probability.ProbDistI.__init__

Parameters
freqdist:FreqDist	The trigram frequency distribution upon which to base the estimation
bins:int or float	Included for compatibility with nltk.tag.hmm
discount:float (preferred, but can be set to int)	The discount applied when retrieving counts of trigrams

def __repr__(self): (source) ¶

Return a string representation of this ProbDist

Returns
str	Undocumented

def discount(self): (source) ¶

overrides nltk.probability.ProbDistI.discount

Return the value by which counts are discounted. By default set to 0.75.

Returns
float	Undocumented

def max(self): (source) ¶

overrides nltk.probability.ProbDistI.max

Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.

Returns
any	Undocumented

def prob(self, trigram): (source) ¶

overrides nltk.probability.ProbDistI.prob

Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].

Parameters
trigram	Undocumented
sample:any	The sample whose probability should be returned.
Returns
float	Undocumented

def samples(self): (source) ¶

overrides nltk.probability.ProbDistI.samples

Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.

Returns
list	Undocumented

def set_discount(self, discount): (source) ¶

Set the value by which counts are discounted to the value of discount.

Parameters
discount:float (preferred, but int possible)	the new value to discount counts by
Returns
None	Undocumented

_bigrams = (source) ¶

Undocumented

_bins = (source) ¶

Undocumented

_cache: dict = (source) ¶

Undocumented

_D = (source) ¶

Undocumented

_trigrams = (source) ¶

Undocumented

_trigrams_contain = (source) ¶

Undocumented

_wordtypes_after = (source) ¶

Undocumented

_wordtypes_before = (source) ¶

Undocumented