nltk.classify.maxent.MaxentFeatureEncodingI

class documentation

class MaxentFeatureEncodingI(object): (source)

Known subclasses: nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

View In Hierarchy

A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.

The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding).

Because the joint-feature vectors generated by ``MaxentFeatureEncodingI`` are typically very sparse, they are represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

Feature encodings are generally created using the ``train()`` method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus.

Method	`describe`	:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
Method	`encode`	Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
Method	`labels`	:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list
Method	`length`	:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int
Method	`train`	Construct and return new feature encoding, based on a given training corpus ``train_toks``.

def describe(self, fid): (source) ¶

overridden in nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str

def encode(self, featureset, label): (source) ¶

overridden in nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

:type featureset: dict :rtype: list(tuple(int, int))

def labels(self): (source) ¶

overridden in nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list

def length(self): (source) ¶

overridden in nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int

def train(cls, train_toks): (source) ¶

overridden in nltk.classify.maxent.BinaryMaxentFeatureEncoding, nltk.classify.maxent.TypedMaxentFeatureEncoding

Construct and return new feature encoding, based on a given training corpus ``train_toks``. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.