class documentation

The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to T / (N + T) where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occurring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:

  • p = T / Z (N + T), if count = 0
  • p = c / (N + T), otherwise
Method __init__ Creates a distribution of Witten-Bell probability estimates. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to ...
Method __repr__ Return a string representation of this ProbDist.
Method discount Return the ratio by which counts are discounted on average: c*/c
Method freqdist Undocumented
Method max Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Method prob Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
Method samples Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
Instance Variable _freqdist Undocumented
Instance Variable _N Undocumented
Instance Variable _P0 Undocumented
Instance Variable _T Undocumented
Instance Variable _Z Undocumented

Inherited from ProbDistI:

Method generate Return a randomly selected sample from this probability distribution. The probability of returning each sample samp is equal to self.prob(samp).
Method logprob Return the base 2 logarithm of the probability for a given sample.
Constant SUM_TO_ONE True if the probabilities of the samples in this probability distribution will always sum to one.
def __init__(self, freqdist, bins=None): (source)

Creates a distribution of Witten-Bell probability estimates. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to T / (N + T) where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occurring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:

  • p = T / Z (N + T), if count = 0
  • p = c / (N + T), otherwise

The parameters T and N are taken from the freqdist parameter (the B() and N() values). The normalizing factor Z is calculated using these values along with the bins parameter.

Parameters
freqdist:FreqDistThe frequency counts upon which to base the estimation.
bins:intThe number of possible event types. This must be at least as large as the number of bins in the freqdist. If None, then it's assumed to be equal to that of the freqdist
def __repr__(self): (source)

Return a string representation of this ProbDist.

Returns
strUndocumented
def discount(self): (source)

Return the ratio by which counts are discounted on average: c*/c

Returns
floatUndocumented
def freqdist(self): (source)

Undocumented

def max(self): (source)

Return the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.

Returns
anyUndocumented
def prob(self, sample): (source)

Return the probability for a given sample. Probabilities are always real numbers in the range [0, 1].

Parameters
sample:anyThe sample whose probability should be returned.
Returns
floatUndocumented
def samples(self): (source)

Return a list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.

Returns
listUndocumented
_freqdist = (source)

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented