class MaxentFeatureEncodingI(object): (source)
Known subclasses: nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.
The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding).
Because the joint-feature vectors generated by ``MaxentFeatureEncodingI`` are typically very sparse, they are represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
Feature encodings are generally created using the ``train()`` method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus.
Method | describe |
:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str |
Method | encode |
Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature. |
Method | labels |
:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list |
Method | length |
:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int |
Method | train |
Construct and return new feature encoding, based on a given training corpus ``train_toks``. |
nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
:type featureset: dict :rtype: list(tuple(int, int))
nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list
nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.FunctionBackedMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int
nltk.classify.maxent.BinaryMaxentFeatureEncoding
, nltk.classify.maxent.TypedMaxentFeatureEncoding
Construct and return new feature encoding, based on a given training corpus ``train_toks``. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.