A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.
Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:
>>> from nltk.tokenize import word_tokenize >>> from nltk.probability import FreqDist >>> sent = 'This is an example sentence' >>> fdist = FreqDist() >>> for word in word_tokenize(sent): ... fdist[word.lower()] += 1
An equivalent way to do this is with the initializer:
>>> fdist = FreqDist(word.lower() for word in word_tokenize(sent))
Method | __add__ |
Add counts from two counters. |
Method | __and__ |
Intersection is the minimum of corresponding counts. |
Method | __delitem__ |
Override Counter.__delitem__() to invalidate the cached N |
Method | __ge__ |
Undocumented |
Method | __init__ |
Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty. |
Method | __iter__ |
Return an iterator which yields tokens ordered by frequency. |
Method | __le__ |
Returns True if this frequency distribution is a subset of the other and for no key the value exceeds the value of the same key from the other frequency distribution. |
Method | __or__ |
Union is the maximum of value in either of the input counters. |
Method | __repr__ |
Return a string representation of this FreqDist. |
Method | __setitem__ |
Override Counter.__setitem__() to invalidate the cached N |
Method | __str__ |
Return a string representation of this FreqDist. |
Method | __sub__ |
Subtract count, but keep only results with positive counts. |
Method | B |
Return the total number of sample values (or "bins") that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)... |
Method | copy |
Create a copy of this frequency distribution. |
Method | freq |
Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist... |
Method | hapaxes |
Return a list of all samples that occur once (hapax legomena) |
Method | max |
Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined... |
Method | N |
Return the total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B(). |
Method |
|
Undocumented |
Method | pformat |
Return a string representation of this FreqDist. |
Method | plot |
Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. For a cumulative plot, specify cumulative=True... |
Method | pprint |
Print a string representation of this FreqDist to 'stream' |
Method | r_ |
Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0. |
Method | setdefault |
Override Counter.setdefault() to invalidate the cached N |
Method | tabulate |
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. |
Method | update |
Override Counter.update() to invalidate the cached N |
Class Variable | __gt__ |
Undocumented |
Class Variable | __lt__ |
Undocumented |
Method | _cumulative |
Return the cumulative frequencies of the specified samples. If no samples are specified, all counts are returned, starting with the largest. |
Instance Variable | _N |
Undocumented |
Add counts from two counters.
>>> FreqDist('abbb') + FreqDist('bcc') FreqDist({'b': 4, 'c': 2, 'a': 1})
Intersection is the minimum of corresponding counts.
>>> FreqDist('abbb') & FreqDist('bcc') FreqDist({'b': 1})
Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.
In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls update with the list samples.
Parameters | |
samples:Sequence | The samples to initialize the frequency distribution with. |
Returns True if this frequency distribution is a subset of the other and for no key the value exceeds the value of the same key from the other frequency distribution.
The <= operator forms partial order and satisfying the axioms reflexivity, antisymmetry and transitivity.
>>> FreqDist('a') <= FreqDist('a') True >>> a = FreqDist('abc') >>> b = FreqDist('aabc') >>> (a <= b, b <= a) (True, False) >>> FreqDist('a') <= FreqDist('abcd') True >>> FreqDist('abc') <= FreqDist('xyz') False >>> FreqDist('xyz') <= FreqDist('abc') False >>> c = FreqDist('a') >>> d = FreqDist('aa') >>> e = FreqDist('aaa') >>> c <= d and d <= e and c <= e True
Union is the maximum of value in either of the input counters.
>>> FreqDist('abbb') | FreqDist('bcc') FreqDist({'b': 3, 'c': 2, 'a': 1})
Subtract count, but keep only results with positive counts.
>>> FreqDist('abbbc') - FreqDist('bccd') FreqDist({'b': 2, 'a': 1})
Return the total number of sample values (or "bins") that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)
Returns | |
int | Undocumented |
Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].
Parameters | |
sample:any | the sample whose frequency should be returned. |
Returns | |
float | Undocumented |
Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occurred in this frequency distribution, return None.
Returns | |
any or None | The sample with the maximum number of outcomes in this frequency distribution. |
Return the total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().
Returns | |
int | Undocumented |
Return a string representation of this FreqDist.
Parameters | |
maxlen:int | The maximum number of items to display |
Returns | |
string | Undocumented |
Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)
Parameters | |
*args | Undocumented |
title:bool | The title for the graph |
cumulative | A flag to specify whether the plot is cumulative (default = False) |
**kwargs | Undocumented |
Print a string representation of this FreqDist to 'stream'
Parameters | |
maxlen:int | The maximum number of items to print |
stream | The stream to print to. stdout by default |
Return the dictionary mapping r to Nr, the number of samples with frequency r, where Nr > 0.
Parameters | |
bins:int | The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0). |
Returns | |
int | Undocumented |
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted.
Parameters | |
*args | Undocumented |
samples:list | The samples to plot (default is all samples) |
title:bool | Undocumented |
cumulative | A flag to specify whether the freqs are cumulative (default = False) |
**kwargs | Undocumented |