class documentation

class FreqDist(Counter): (source)

Constructor: FreqDist(samples)

View In Hierarchy

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> from nltk.tokenize import word_tokenize
>>> from nltk.probability import FreqDist
>>> sent = 'This is an example sentence'
>>> fdist = FreqDist()
>>> for word in word_tokenize(sent):
...    fdist[word.lower()] += 1

An equivalent way to do this is with the initializer:

>>> fdist = FreqDist(word.lower() for word in word_tokenize(sent))
Method __add__ Add counts from two counters.
Method __and__ Intersection is the minimum of corresponding counts.
Method __delitem__ Override Counter.__delitem__() to invalidate the cached N
Method __ge__ Undocumented
Method __init__ Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.
Method __iter__ Return an iterator which yields tokens ordered by frequency.
Method __le__ Returns True if this frequency distribution is a subset of the other and for no key the value exceeds the value of the same key from the other frequency distribution.
Method __or__ Union is the maximum of value in either of the input counters.
Method __repr__ Return a string representation of this FreqDist.
Method __setitem__ Override Counter.__setitem__() to invalidate the cached N
Method __str__ Return a string representation of this FreqDist.
Method __sub__ Subtract count, but keep only results with positive counts.
Method B Return the total number of sample values (or "bins") that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)...
Method copy Create a copy of this frequency distribution.
Method freq Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist...
Method hapaxes Return a list of all samples that occur once (hapax legomena)
Method max Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined...
Method N Return the total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().
Method Nr Undocumented
Method pformat Return a string representation of this FreqDist.
Method plot Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. For a cumulative plot, specify cumulative=True...
Method pprint Print a string representation of this FreqDist to 'stream'
Method r_Nr Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0.
Method setdefault Override Counter.setdefault() to invalidate the cached N
Method tabulate Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted.
Method update Override Counter.update() to invalidate the cached N
Class Variable __gt__ Undocumented
Class Variable __lt__ Undocumented
Method _cumulative_frequencies Return the cumulative frequencies of the specified samples. If no samples are specified, all counts are returned, starting with the largest.
Instance Variable _N Undocumented
def __add__(self, other): (source)

Add counts from two counters.

>>> FreqDist('abbb') + FreqDist('bcc')
FreqDist({'b': 4, 'c': 2, 'a': 1})
def __and__(self, other): (source)

Intersection is the minimum of corresponding counts.

>>> FreqDist('abbb') & FreqDist('bcc')
FreqDist({'b': 1})
def __delitem__(self, key): (source)

Override Counter.__delitem__() to invalidate the cached N

def __ge__(self, other): (source)

Undocumented

def __init__(self, samples=None): (source)

Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.

In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls update with the list samples.

Parameters
samples:SequenceThe samples to initialize the frequency distribution with.
def __iter__(self): (source)

Return an iterator which yields tokens ordered by frequency.

Returns
iteratorUndocumented
def __le__(self, other): (source)

Returns True if this frequency distribution is a subset of the other and for no key the value exceeds the value of the same key from the other frequency distribution.

The <= operator forms partial order and satisfying the axioms reflexivity, antisymmetry and transitivity.

>>> FreqDist('a') <= FreqDist('a')
True
>>> a = FreqDist('abc')
>>> b = FreqDist('aabc')
>>> (a <= b, b <= a)
(True, False)
>>> FreqDist('a') <= FreqDist('abcd')
True
>>> FreqDist('abc') <= FreqDist('xyz')
False
>>> FreqDist('xyz') <= FreqDist('abc')
False
>>> c = FreqDist('a')
>>> d = FreqDist('aa')
>>> e = FreqDist('aaa')
>>> c <= d and d <= e and c <= e
True
def __or__(self, other): (source)

Union is the maximum of value in either of the input counters.

>>> FreqDist('abbb') | FreqDist('bcc')
FreqDist({'b': 3, 'c': 2, 'a': 1})
def __repr__(self): (source)

Return a string representation of this FreqDist.

Returns
stringUndocumented
def __setitem__(self, key, val): (source)

Override Counter.__setitem__() to invalidate the cached N

def __str__(self): (source)

Return a string representation of this FreqDist.

Returns
stringUndocumented
def __sub__(self, other): (source)

Subtract count, but keep only results with positive counts.

>>> FreqDist('abbbc') - FreqDist('bccd')
FreqDist({'b': 2, 'a': 1})
def B(self): (source)

Return the total number of sample values (or "bins") that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)

Returns
intUndocumented
def copy(self): (source)

Create a copy of this frequency distribution.

Returns
FreqDistUndocumented
def freq(self, sample): (source)

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

Parameters
sample:anythe sample whose frequency should be returned.
Returns
floatUndocumented
def hapaxes(self): (source)

Return a list of all samples that occur once (hapax legomena)

Returns
listUndocumented
def max(self): (source)

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occurred in this frequency distribution, return None.

Returns
any or NoneThe sample with the maximum number of outcomes in this frequency distribution.
def N(self): (source)

Return the total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().

Returns
intUndocumented
def Nr(self, r, bins=None): (source)

Undocumented

def pformat(self, maxlen=10): (source)

Return a string representation of this FreqDist.

Parameters
maxlen:intThe maximum number of items to display
Returns
stringUndocumented
def plot(self, *args, **kwargs): (source)

Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)

Parameters
*argsUndocumented
title:boolThe title for the graph
cumulativeA flag to specify whether the plot is cumulative (default = False)
**kwargsUndocumented
def pprint(self, maxlen=10, stream=None): (source)

Print a string representation of this FreqDist to 'stream'

Parameters
maxlen:intThe maximum number of items to print
streamThe stream to print to. stdout by default
def r_Nr(self, bins=None): (source)

Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0.

Parameters
bins:intThe number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
Returns
intUndocumented
def setdefault(self, key, val): (source)

Override Counter.setdefault() to invalidate the cached N

def tabulate(self, *args, **kwargs): (source)

Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted.

Parameters
*argsUndocumented
samples:listThe samples to plot (default is all samples)
title:boolUndocumented
cumulativeA flag to specify whether the freqs are cumulative (default = False)
**kwargsUndocumented
def update(self, *args, **kwargs): (source)

Override Counter.update() to invalidate the cached N

Undocumented

Undocumented

def _cumulative_frequencies(self, samples): (source)

Return the cumulative frequencies of the specified samples. If no samples are specified, all counts are returned, starting with the largest.

Parameters
samples:anythe samples whose frequencies should be returned.
Returns
list(float)Undocumented

Undocumented