«
module documentation

Classes for representing and processing probabilistic information.

The FreqDist class is used to encode "frequency distributions", which count the number of times that each outcome of an experiment occurs.

The ProbDistI class defines a standard interface for "probability distributions", which encode the probability of each outcome for an experiment. There are two types of probability distribution:

  • "derived probability distributions" are created from frequency distributions. They attempt to model the probability distribution that generated the frequency distribution.
  • "analytic probability distributions" are created directly from parameters (such as variance).

The ConditionalFreqDist class and ConditionalProbDistI interface are used to encode conditional distributions. Conditional probability distributions can be derived or analytic; but currently the only implementation of the ConditionalProbDistI interface is ConditionalProbDist, a derived distribution.

Class ConditionalFreqDist A collection of frequency distributions for a single experiment run under different conditions. Conditional frequency distributions are used to record the number of times each sample occurred, given the condition under which the experiment was run...
Class ConditionalProbDist A conditional probability distribution modeling the experiments that were used to generate a conditional frequency distribution. A ConditionalProbDist is constructed from a ConditionalFreqDist and a ProbDist...
Class ConditionalProbDistI A collection of probability distributions for a single experiment run under different conditions. Conditional probability distributions are used to estimate the likelihood of each sample, given the condition under which the experiment was run...
Class CrossValidationProbDist The cross-validation estimate for the probability distribution of the experiment used to generate a set of frequency distribution. The "cross-validation estimate" for the probability of a sample is found by averaging the held-out estimates for the sample in each pair of frequency distributions.
Class DictionaryConditionalProbDist An alternative ConditionalProbDist that simply wraps a dictionary of ProbDists rather than creating these from FreqDists.
Class DictionaryProbDist A probability distribution whose probabilities are directly specified by a given dictionary. The given dictionary maps samples to probabilities.
Class ELEProbDist The expected likelihood estimate for the probability distribution of the experiment used to generate a frequency distribution. The "expected likelihood estimate" approximates the probability of a sample with count ...
Class FreqDist A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document...
Class HeldoutProbDist The heldout estimate for the probability distribution of the experiment used to generate two frequency distributions. These two frequency distributions are called the "heldout frequency distribution" and the "base frequency distribution...
Class ImmutableProbabilisticMixIn Undocumented
Class KneserNeyProbDist Kneser-Ney estimate of a probability distribution. This is a version of back-off that counts how likely an n-gram is provided the n-1-gram had been seen in training. Extends the ProbDistI interface, requires a trigram FreqDist instance to train on...
Class LaplaceProbDist The Laplace estimate for the probability distribution of the experiment used to generate a frequency distribution. The "Laplace estimate" approximates the probability of a sample with count c from an experiment with ...
Class LidstoneProbDist The Lidstone estimate for the probability distribution of the experiment used to generate a frequency distribution. The "Lidstone estimate" is parameterized by a real number gamma, which typically ranges from 0 to 1...
Class MLEProbDist The maximum likelihood estimate for the probability distribution of the experiment used to generate a frequency distribution. The "maximum likelihood estimate" approximates the probability of each sample as the frequency of that sample in the frequency distribution.
Class MutableProbDist An mutable probdist where the probabilities may be easily modified. This simply copies an existing probdist, storing the probability values in a mutable dictionary and providing an update method.
Class ProbabilisticMixIn A mix-in class to associate probabilities with other classes (trees, rules, etc.). To use the ProbabilisticMixIn class, define a new class that derives from an existing class and from ProbabilisticMixIn...
Class ProbDistI A probability distribution for the outcomes of an experiment. A probability distribution specifies how likely it is that an experiment will have any given outcome. For example, a probability distribution could be used to predict the probability that a token in a document will have a given type...
Class RandomProbDist Generates a random probability distribution whereby each sample will be between 0 and 1 with equal probability (uniform random distribution. Also called a continuous uniform distribution).
Class SimpleGoodTuringProbDist SimpleGoodTuring ProbDist approximates from frequency to frequency of frequency into a linear line under log space by linear regression. Details of Simple Good-Turing algorithm can be found in:
Class UniformProbDist A probability distribution that assigns equal probability to each sample in a given set; and a zero probability to all other samples.
Class WittenBellProbDist The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to ...
Function add_logs Given two numbers logx = log(x) and logy = log(y), return log(x+y). Conceptually, this is the same as returning log(2**(logx)+2**(logy)), but the actual implementation avoids overflow errors that could result from direct computation.
Function demo A demonstration of frequency distributions and probability distributions. This demonstration creates three frequency distributions with, and uses them to sample a random process with numsamples samples...
Function entropy Undocumented
Function gt_demo Undocumented
Function log_likelihood Undocumented
Function sum_logs Undocumented
Function _create_rand_fdist Create a new frequency distribution, with random samples. The samples are numbers from 1 to numsamples, and are generated by summing two numbers, each of which has a uniform distribution.
Function _create_sum_pdist Return the true probability distribution for the experiment _create_rand_fdist(numsamples, x).
Function _get_kwarg Undocumented
Constant _ADD_LOGS_MAX_DIFF Undocumented
Constant _NINF Undocumented
def add_logs(logx, logy): (source)

Given two numbers logx = log(x) and logy = log(y), return log(x+y). Conceptually, this is the same as returning log(2**(logx)+2**(logy)), but the actual implementation avoids overflow errors that could result from direct computation.

def demo(numsamples=6, numoutcomes=500): (source)

A demonstration of frequency distributions and probability distributions. This demonstration creates three frequency distributions with, and uses them to sample a random process with numsamples samples. Each frequency distribution is sampled numoutcomes times. These three frequency distributions are then used to build six probability distributions. Finally, the probability estimates of these distributions are compared to the actual probability of each sample.

Parameters
numsamples:intThe number of samples to use in each demo frequency distributions.
numoutcomes:intThe total number of outcomes for each demo frequency distribution. These outcomes are divided into numsamples bins.
Returns
NoneUndocumented
def entropy(pdist): (source)

Undocumented

def gt_demo(): (source)

Undocumented

def log_likelihood(test_pdist, actual_pdist): (source)

Undocumented

def sum_logs(logs): (source)

Undocumented

def _create_rand_fdist(numsamples, numoutcomes): (source)

Create a new frequency distribution, with random samples. The samples are numbers from 1 to numsamples, and are generated by summing two numbers, each of which has a uniform distribution.

def _create_sum_pdist(numsamples): (source)

Return the true probability distribution for the experiment _create_rand_fdist(numsamples, x).

def _get_kwarg(kwargs, key, default): (source)

Undocumented

_ADD_LOGS_MAX_DIFF = (source)

Undocumented

Value
math.log(1e-30, 2)

Undocumented

Value
float('-1e300')