nltk.classify.maxent

module documentation

(source)

A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistent with the training data; and chooses the distribution with the highest entropy. A probability distribution is "empirically consistent" with a set of training data if its estimated frequency with which a class and a feature vector value co-occur is equal to the actual frequency in the data. Terminology: 'feature' ====================== The term *feature* is usually used to refer to some property of an unlabeled token. For example, when performing word sense disambiguation, we might define a ``'prevword'`` feature whose value is the word preceding the target word. However, in the context of maxent modeling, the term *feature* is typically used to refer to a property of a "labeled" token. In order to prevent confusion, we will introduce two distinct terms to disambiguate these two different concepts: - An "input-feature" is a property of an unlabeled token. - A "joint-feature" is a property of a labeled token. In the rest of the ``nltk.classify`` module, the term "features" is used to refer to what we will call "input-features" in this module. In literature that describes and discusses maximum entropy models, input-features are typically called "contexts", and joint-features are simply referred to as "features". Converting Input-Features to Joint-Features ------------------------------------------- In maximum entropy models, joint-features are required to have numeric values. Typically, each input-feature ``input_feat`` is mapped to a set of joint-features of the form: | joint_feat(token, label) = { 1 if input_feat(token) == feat_val | { and label == some_label | { | { 0 otherwise For all values of ``feat_val`` and ``some_label``. This mapping is performed by classes that implement the ``MaxentFeatureEncodingI`` interface.

Class	`BinaryMaxentFeatureEncoding`	A feature encoding that generates vectors containing a binary joint-features of the form:
Class	`FunctionBackedMaxentFeatureEncoding`	A feature encoding that calls a user-supplied function to map a given featureset/label pair to a sparse joint-feature vector.
Class	`GISEncoding`	A binary feature encoding which adds one new joint-feature to the joint-features defined by ``BinaryMaxentFeatureEncoding``: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number...
Class	`MaxentClassifier`	A maximum entropy classifier (also known as a "conditional exponential classifier"). This classifier is parameterized by a set of "weights", which are used to combine the joint-features that are generated from a featureset by an "encoding"...
Class	`MaxentFeatureEncodingI`	A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.
Class	`TadmEventMaxentFeatureEncoding`	Undocumented
Class	`TadmMaxentClassifier`	Undocumented
Class	`TypedMaxentFeatureEncoding`	A feature encoding that generates vectors containing integer, float and binary joint-features of the form:
Function	`calculate_deltas`	Calculate the update values for the classifier weights for this iteration of IIS. These update weights are the value of ``delta`` that solves the equation::
Function	`calculate_empirical_fcount`	Undocumented
Function	`calculate_estimated_fcount`	Undocumented
Function	`calculate_nfmap`	Construct a map that can be used to compress ``nf`` (which is typically sparse).
Function	`demo`	Undocumented
Function	`train_maxent_classifier_with_gis`	Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Generalized Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.
Function	`train_maxent_classifier_with_iis`	Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Improved Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.
Function	`train_maxent_classifier_with_megam`	Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the external ``megam`` library. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.

def calculate_deltas(train_toks, classifier, unattested, ffreq_empirical, nfmap, nfarray, nftranspose, encoding): (source) ¶

Calculate the update values for the classifier weights for this iteration of IIS. These update weights are the value of ``delta`` that solves the equation:: ffreq_empirical[i] = SUM[fs,l] (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Where: - *(fs,l)* is a (featureset, label) tuple from ``train_toks`` - *feature_vector(fs,l)* = ``encoding.encode(fs,l)`` - *nf(vector)* = ``sum([val for (id,val) in vector])`` This method uses Newton's method to solve this equation for *delta[i]*. In particular, it starts with a guess of ``delta[i]`` = 1; and iteratively updates ``delta`` with: | delta[i] -= (ffreq_empirical[i] - sum1[i])/(-sum2[i]) until convergence, where *sum1* and *sum2* are defined as: | sum1[i](delta) = SUM[fs,l] f[i](fs,l,delta) | sum2[i](delta) = SUM[fs,l] (f[i](fs,l,delta).nf(feature_vector(fs,l))) | f[i](fs,l,delta) = (classifier.prob_classify(fs).prob(l) . | feature_vector(fs,l)[i] . | exp(delta[i] . nf(feature_vector(fs,l)))) Note that *sum1* and *sum2* depend on ``delta``; so they need to be re-computed each iteration. The variables ``nfmap``, ``nfarray``, and ``nftranspose`` are used to generate a dense encoding for *nf(ltext)*. This allows ``_deltas`` to calculate *sum1* and *sum2* using matrices, which yields a significant performance improvement. :param train_toks: The set of training tokens. :type train_toks: list(tuple(dict, str)) :param classifier: The current classifier. :type classifier: ClassifierI :param ffreq_empirical: An array containing the empirical frequency for each feature. The *i*\ th element of this array is the empirical frequency for feature *i*. :type ffreq_empirical: sequence of float :param unattested: An array that is 1 for features that are not attested in the training data; and 0 for features that are attested. In other words, ``unattested[i]==0`` iff ``ffreq_empirical[i]==0``. :type unattested: sequence of int :param nfmap: A map that can be used to compress ``nf`` to a dense vector. :type nfmap: dict(int -> int) :param nfarray: An array that can be used to uncompress ``nf`` from a dense vector. :type nfarray: array(float) :param nftranspose: The transpose of ``nfarray`` :type nftranspose: array(float)

def calculate_empirical_fcount(train_toks, encoding): (source) ¶

Undocumented

def calculate_estimated_fcount(classifier, train_toks, encoding): (source) ¶

Undocumented

def calculate_nfmap(train_toks, encoding): (source) ¶

Construct a map that can be used to compress ``nf`` (which is typically sparse). *nf(feature_vector)* is the sum of the feature values for *feature_vector*. This represents the number of features that are active for a given labeled text. This method finds all values of *nf(t)* that are attested for at least one token in the given list of training tokens; and constructs a dictionary mapping these attested values to a continuous range *0...N*. For example, if the only values of *nf()* that were attested were 3, 5, and 7, then ``_nfmap`` might return the dictionary ``{3:0, 5:1, 7:2}``. :return: A map that can be used to compress ``nf`` to a dense vector. :rtype: dict(int -> int)

def demo(): (source) ¶

Undocumented

def train_maxent_classifier_with_gis(train_toks, trace=3, encoding=None, labels=None, **cutoffs): (source) ¶

Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Generalized Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.

:see: ``train_maxent_classifier()`` for parameter descriptions.

def train_maxent_classifier_with_iis(train_toks, trace=3, encoding=None, labels=None, **cutoffs): (source) ¶

Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the Improved Iterative Scaling algorithm. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.

:see: ``train_maxent_classifier()`` for parameter descriptions.

def train_maxent_classifier_with_megam(train_toks, trace=3, encoding=None, labels=None, gaussian_prior_sigma=0, **kwargs): (source) ¶

Train a new ``ConditionalExponentialClassifier``, using the given training samples, using the external ``megam`` library. This ``ConditionalExponentialClassifier`` will encode the model that maximizes entropy from all the models that are empirically consistent with ``train_toks``.

:see: ``train_maxent_classifier()`` for parameter descriptions. :see: ``nltk.classify.megam``