class documentation

A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.

The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding).

Because the joint-feature vectors generated by ``MaxentFeatureEncodingI`` are typically very sparse, they are represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

Feature encodings are generally created using the ``train()`` method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus.

Method describe :return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
Method encode Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
Method labels :return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list
Method length :return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int
Method train Construct and return new feature encoding, based on a given training corpus ``train_toks``.
def describe(self, fid): (source)

:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str

def encode(self, featureset, label): (source)

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

:type featureset: dict :rtype: list(tuple(int, int))

def labels(self): (source)

:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list

def length(self): (source)

:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int

def train(cls, train_toks): (source)

Construct and return new feature encoding, based on a given training corpus ``train_toks``. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.