nltk.probability

module documentation

(source)

Classes for representing and processing probabilistic information.

The FreqDist class is used to encode "frequency distributions", which count the number of times that each outcome of an experiment occurs.

The ProbDistI class defines a standard interface for "probability distributions", which encode the probability of each outcome for an experiment. There are two types of probability distribution:

"derived probability distributions" are created from frequency distributions. They attempt to model the probability distribution that generated the frequency distribution.

"analytic probability distributions" are created directly from parameters (such as variance).

The ConditionalFreqDist class and ConditionalProbDistI interface are used to encode conditional distributions. Conditional probability distributions can be derived or analytic; but currently the only implementation of the ConditionalProbDistI interface is ConditionalProbDist, a derived distribution.

Class	`ConditionalFreqDist`	A collection of frequency distributions for a single experiment run under different conditions. Conditional frequency distributions are used to record the number of times each sample occurred, given the condition under which the experiment was run...
Class	`ConditionalProbDist`	A conditional probability distribution modeling the experiments that were used to generate a conditional frequency distribution. A ConditionalProbDist is constructed from a `ConditionalFreqDist` and a `ProbDist`...
Class	`ConditionalProbDistI`	A collection of probability distributions for a single experiment run under different conditions. Conditional probability distributions are used to estimate the likelihood of each sample, given the condition under which the experiment was run...
Class	`CrossValidationProbDist`	The cross-validation estimate for the probability distribution of the experiment used to generate a set of frequency distribution. The "cross-validation estimate" for the probability of a sample is found by averaging the held-out estimates for the sample in each pair of frequency distributions.
Class	`DictionaryConditionalProbDist`	An alternative ConditionalProbDist that simply wraps a dictionary of ProbDists rather than creating these from FreqDists.
Class	`DictionaryProbDist`	A probability distribution whose probabilities are directly specified by a given dictionary. The given dictionary maps samples to probabilities.
Class	`ELEProbDist`	The expected likelihood estimate for the probability distribution of the experiment used to generate a frequency distribution. The "expected likelihood estimate" approximates the probability of a sample with count ...
Class	`FreqDist`	A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document...
Class	`HeldoutProbDist`	The heldout estimate for the probability distribution of the experiment used to generate two frequency distributions. These two frequency distributions are called the "heldout frequency distribution" and the "base frequency distribution...
Class	`ImmutableProbabilisticMixIn`	Undocumented
Class	`KneserNeyProbDist`	Kneser-Ney estimate of a probability distribution. This is a version of back-off that counts how likely an n-gram is provided the n-1-gram had been seen in training. Extends the ProbDistI interface, requires a trigram FreqDist instance to train on...
Class	`LaplaceProbDist`	The Laplace estimate for the probability distribution of the experiment used to generate a frequency distribution. The "Laplace estimate" approximates the probability of a sample with count c from an experiment with ...
Class	`LidstoneProbDist`	The Lidstone estimate for the probability distribution of the experiment used to generate a frequency distribution. The "Lidstone estimate" is parameterized by a real number gamma, which typically ranges from 0 to 1...
Class	`MLEProbDist`	The maximum likelihood estimate for the probability distribution of the experiment used to generate a frequency distribution. The "maximum likelihood estimate" approximates the probability of each sample as the frequency of that sample in the frequency distribution.
Class	`MutableProbDist`	An mutable probdist where the probabilities may be easily modified. This simply copies an existing probdist, storing the probability values in a mutable dictionary and providing an update method.
Class	`ProbabilisticMixIn`	A mix-in class to associate probabilities with other classes (trees, rules, etc.). To use the `ProbabilisticMixIn` class, define a new class that derives from an existing class and from ProbabilisticMixIn...
Class	`ProbDistI`	A probability distribution for the outcomes of an experiment. A probability distribution specifies how likely it is that an experiment will have any given outcome. For example, a probability distribution could be used to predict the probability that a token in a document will have a given type...
Class	`RandomProbDist`	Generates a random probability distribution whereby each sample will be between 0 and 1 with equal probability (uniform random distribution. Also called a continuous uniform distribution).
Class	`SimpleGoodTuringProbDist`	SimpleGoodTuring ProbDist approximates from frequency to frequency of frequency into a linear line under log space by linear regression. Details of Simple Good-Turing algorithm can be found in:
Class	`UniformProbDist`	A probability distribution that assigns equal probability to each sample in a given set; and a zero probability to all other samples.
Class	`WittenBellProbDist`	The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to ...
Function	`add_logs`	Given two numbers `logx` = log(x) and `logy` = log(y), return log(x+y). Conceptually, this is the same as returning `log(2(logx)+2(logy))`, but the actual implementation avoids overflow errors that could result from direct computation.
Function	`demo`	A demonstration of frequency distributions and probability distributions. This demonstration creates three frequency distributions with, and uses them to sample a random process with `numsamples` samples...
Function	`entropy`	Undocumented
Function	`gt_demo`	Undocumented
Function	`log_likelihood`	Undocumented
Function	`sum_logs`	Undocumented
Function	`_create_rand_fdist`	Create a new frequency distribution, with random samples. The samples are numbers from 1 to `numsamples`, and are generated by summing two numbers, each of which has a uniform distribution.
Function	`_create_sum_pdist`	Return the true probability distribution for the experiment `_create_rand_fdist(numsamples, x)`.
Function	`_get_kwarg`	Undocumented
Constant	`_ADD_LOGS_MAX_DIFF`	Undocumented
Constant	`_NINF`	Undocumented

def add_logs(logx, logy): (source) ¶

Given two numbers logx = log(x) and logy = log(y), return log(x+y). Conceptually, this is the same as returning log(2**(logx)+2**(logy)), but the actual implementation avoids overflow errors that could result from direct computation.

def demo(numsamples=6, numoutcomes=500): (source) ¶

A demonstration of frequency distributions and probability distributions. This demonstration creates three frequency distributions with, and uses them to sample a random process with numsamples samples. Each frequency distribution is sampled numoutcomes times. These three frequency distributions are then used to build six probability distributions. Finally, the probability estimates of these distributions are compared to the actual probability of each sample.

Parameters
numsamples:int	The number of samples to use in each demo frequency distributions.
numoutcomes:int	The total number of outcomes for each demo frequency distribution. These outcomes are divided into `numsamples` bins.
Returns
None	Undocumented

def entropy(pdist): (source) ¶

Undocumented

def gt_demo(): (source) ¶

Undocumented

def log_likelihood(test_pdist, actual_pdist): (source) ¶

Undocumented

def sum_logs(logs): (source) ¶

Undocumented

def _create_rand_fdist(numsamples, numoutcomes): (source) ¶

Create a new frequency distribution, with random samples. The samples are numbers from 1 to numsamples, and are generated by summing two numbers, each of which has a uniform distribution.

def _create_sum_pdist(numsamples): (source) ¶

Return the true probability distribution for the experiment _create_rand_fdist(numsamples, x).

def _get_kwarg(kwargs, key, default): (source) ¶

Undocumented

_ADD_LOGS_MAX_DIFF = (source) ¶

Undocumented

Value

math.log(1e-30, 2)

_NINF = (source) ¶

Undocumented

Value

float('-1e300')