class documentation

class ConditionalFreqDist(defaultdict): (source)

Constructor: ConditionalFreqDist(cond_samples)

View In Hierarchy

A collection of frequency distributions for a single experiment run under different conditions. Conditional frequency distributions are used to record the number of times each sample occurred, given the condition under which the experiment was run. For example, a conditional frequency distribution could be used to record the frequency of each word (type) in a document, given its length. Formally, a conditional frequency distribution can be defined as a function that maps from each condition to the FreqDist for the experiment under that condition.

Conditional frequency distributions are typically constructed by repeatedly running an experiment under a variety of conditions, and incrementing the sample outcome counts for the appropriate conditions. For example, the following code will produce a conditional frequency distribution that encodes how often each word type occurs, given the length of that word type:

>>> from nltk.probability import ConditionalFreqDist
>>> from nltk.tokenize import word_tokenize
>>> sent = "the the the dog dog some other words that we do not care about"
>>> cfdist = ConditionalFreqDist()
>>> for word in word_tokenize(sent):
...     condition = len(word)
...     cfdist[condition][word] += 1

An equivalent way to do this is with the initializer:

>>> cfdist = ConditionalFreqDist((len(word), word) for word in word_tokenize(sent))

The frequency distribution for each condition is accessed using the indexing operator:

>>> cfdist[3]
FreqDist({'the': 3, 'dog': 2, 'not': 1})
>>> cfdist[3].freq('the')
0.5
>>> cfdist[3]['dog']
2

When the indexing operator is used to access the frequency distribution for a condition that has not been accessed before, ConditionalFreqDist creates a new empty FreqDist for that condition.

Method __add__ Add counts from two ConditionalFreqDists.
Method __and__ Intersection is the minimum of corresponding counts.
Method __ge__ Undocumented
Method __gt__ Undocumented
Method __init__ Construct a new empty conditional frequency distribution. In particular, the count for every sample, under every condition, is zero.
Method __le__ Undocumented
Method __lt__ Undocumented
Method __or__ Union is the maximum of value in either of the input counters.
Method __reduce__ Undocumented
Method __repr__ Return a string representation of this ConditionalFreqDist.
Method __sub__ Subtract count, but keep only results with positive counts.
Method conditions Return a list of the conditions that have been accessed for this ConditionalFreqDist. Use the indexing operator to access the frequency distribution for a given condition. Note that the frequency distributions for some conditions may contain zero sample outcomes.
Method N Return the total number of sample outcomes that have been recorded by this ConditionalFreqDist.
Method plot Plot the given samples from the conditional frequency distribution. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)
Method tabulate Tabulate the given samples from the conditional frequency distribution.
def __add__(self, other): (source)

Add counts from two ConditionalFreqDists.

def __and__(self, other): (source)

Intersection is the minimum of corresponding counts.

def __ge__(self, other): (source)

Undocumented

def __gt__(self, other): (source)

Undocumented

def __init__(self, cond_samples=None): (source)

Construct a new empty conditional frequency distribution. In particular, the count for every sample, under every condition, is zero.

Parameters
cond_samples:Sequence of (condition, sample) tuplesThe samples to initialize the conditional frequency distribution with
def __le__(self, other): (source)

Undocumented

def __lt__(self, other): (source)

Undocumented

def __or__(self, other): (source)

Union is the maximum of value in either of the input counters.

def __reduce__(self): (source)

Undocumented

def __repr__(self): (source)

Return a string representation of this ConditionalFreqDist.

Returns
strUndocumented
def __sub__(self, other): (source)

Subtract count, but keep only results with positive counts.

def conditions(self): (source)

Return a list of the conditions that have been accessed for this ConditionalFreqDist. Use the indexing operator to access the frequency distribution for a given condition. Note that the frequency distributions for some conditions may contain zero sample outcomes.

Returns
listUndocumented
def N(self): (source)

Return the total number of sample outcomes that have been recorded by this ConditionalFreqDist.

Returns
intUndocumented
def plot(self, *args, **kwargs): (source)

Plot the given samples from the conditional frequency distribution. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)

Parameters
*argsUndocumented
samples:listThe samples to plot
title:strThe title for the graph
conditions:listThe conditions to plot (default is all)
**kwargsUndocumented
def tabulate(self, *args, **kwargs): (source)

Tabulate the given samples from the conditional frequency distribution.

Parameters
*argsUndocumented
samples:listThe samples to plot
conditions:listThe conditions to plot (default is all)
title:boolUndocumented
cumulativeA flag to specify whether the freqs are cumulative (default = False)
**kwargsUndocumented