class documentation

A maximum entropy classifier (also known as a "conditional exponential classifier"). This classifier is parameterized by a set of "weights", which are used to combine the joint-features that are generated from a featureset by an "encoding". In particular, the encoding maps each ``(featureset, label)`` pair to a vector. The probability of each label is then computed using the following equation:

                            dotprod(weights, encode(fs,label))
  prob(fs|label) = ---------------------------------------------------
                   sum(dotprod(weights, encode(fs,l)) for l in labels)

Where ``dotprod`` is the dot product:

  dotprod(a,b) = sum(x*y for (x,y) in zip(a,b))
Class Method train Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus.
Method __init__ Construct a new maxent classifier model. Typically, new classifier models are created using the ``train()`` method.
Method __repr__ Undocumented
Method classify No summary
Method explain Print a table showing the effect of each of the features in the given feature set, and how they combine to determine the probabilities of each label for that featureset.
Method labels No summary
Method most_informative_features Generates the ranked list of informative features from most to least.
Method prob_classify No summary
Method set_weights Set the feature weight vector for this classifier. :param new_weights: The new feature weight vector. :type new_weights: list of float
Method show_most_informative_features :param show: all, neg, or pos (for negative-only or positive-only) :type show: str :param n: The no. of top features :type n: int
Method weights :return: The feature weight vector for this classifier. :rtype: list of float
Constant ALGORITHMS Undocumented
Instance Variable _encoding Undocumented
Instance Variable _logarithmic Undocumented
Instance Variable _most_informative_features Undocumented
Instance Variable _weights Undocumented

Inherited from ClassifierI:

Method classify_many Apply self.classify() to each element of featuresets. I.e.:
Method prob_classify_many Apply self.prob_classify() to each element of featuresets. I.e.:
@classmethod
def train(cls, train_toks, algorithm=None, trace=3, encoding=None, labels=None, gaussian_prior_sigma=0, **cutoffs): (source)

Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus. :rtype: MaxentClassifier :return: The new maxent classifier :type train_toks: list :param train_toks: Training data, represented as a list of pairs, the first member of which is a featureset, and the second of which is a classification label. :type algorithm: str :param algorithm: A case-insensitive string, specifying which algorithm should be used to train the classifier. The following algorithms are currently available. - Iterative Scaling Methods: Generalized Iterative Scaling (``'GIS'``), Improved Iterative Scaling (``'IIS'``) - External Libraries (requiring megam): LM-BFGS algorithm, with training performed by Megam (``'megam'``) The default algorithm is ``'IIS'``. :type trace: int :param trace: The level of diagnostic tracing output to produce. Higher values produce more verbose output. :type encoding: MaxentFeatureEncodingI :param encoding: A feature encoding, used to convert featuresets into feature vectors. If none is specified, then a ``BinaryMaxentFeatureEncoding`` will be built based on the features that are attested in the training corpus. :type labels: list(str) :param labels: The set of possible labels. If none is given, then the set of all labels attested in the training data will be used instead. :param gaussian_prior_sigma: The sigma value for a gaussian prior on model weights. Currently, this is supported by ``megam``. For other algorithms, its value is ignored. :param cutoffs: Arguments specifying various conditions under which the training should be halted. (Some of the cutoff conditions are not supported by some algorithms.) - ``max_iter=v``: Terminate after ``v`` iterations. - ``min_ll=v``: Terminate after the negative average log-likelihood drops under ``v``. - ``min_lldelta=v``: Terminate if a single iteration improves log likelihood by less than ``v``.

def __init__(self, encoding, weights, logarithmic=True): (source)

Construct a new maxent classifier model. Typically, new classifier models are created using the ``train()`` method. :type encoding: MaxentFeatureEncodingI :param encoding: An encoding that is used to convert the featuresets that are given to the ``classify`` method into joint-feature vectors, which are used by the maxent classifier model. :type weights: list of float :param weights: The feature weight vector for this classifier. :type logarithmic: bool :param logarithmic: If false, then use non-logarithmic weights.

def __repr__(self): (source)

Undocumented

def classify(self, featureset): (source)
Returns
labelthe most appropriate label for the given featureset.
def explain(self, featureset, columns=4): (source)

Print a table showing the effect of each of the features in the given feature set, and how they combine to determine the probabilities of each label for that featureset.

def labels(self): (source)
Returns
list of (immutable)the list of category labels used by this classifier.
def most_informative_features(self, n=10): (source)

Generates the ranked list of informative features from most to least.

def prob_classify(self, featureset): (source)
Returns
ProbDistIa probability distribution over labels for the given featureset.
def set_weights(self, new_weights): (source)

Set the feature weight vector for this classifier. :param new_weights: The new feature weight vector. :type new_weights: list of float

def show_most_informative_features(self, n=10, show='all'): (source)

:param show: all, neg, or pos (for negative-only or positive-only) :type show: str :param n: The no. of top features :type n: int

def weights(self): (source)

:return: The feature weight vector for this classifier. :rtype: list of float

ALGORITHMS: list[str] = (source)

Undocumented

Value
['GIS', 'IIS', 'MEGAM', 'TADM']
_encoding = (source)

Undocumented

_logarithmic = (source)

Undocumented

_most_informative_features = (source)

Undocumented

_weights = (source)

Undocumented