nltk.classify.maxent.BinaryMaxentFeatureEncoding

class documentation

class BinaryMaxentFeatureEncoding(MaxentFeatureEncodingI): (source)

Known subclasses: nltk.classify.maxent.GISEncoding, nltk.classify.maxent.TadmEventMaxentFeatureEncoding

Constructor: BinaryMaxentFeatureEncoding(labels, mapping, unseen_features, alwayson_features)

A feature encoding that generates vectors containing a binary joint-features of the form: | joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label) | { | { 0 otherwise Where ``fname`` is the name of an input-feature, ``fval`` is a value for that input-feature, and ``label`` is a label. Typically, these features are constructed based on a training corpus, using the ``train()`` method. This method will create one feature for each combination of ``fname``, ``fval``, and ``label`` that occurs at least once in the training corpus. The ``unseen_features`` parameter can be used to add "unseen-value features", which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form: | joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname]) | { and l == label | { | { 0 otherwise Where ``is_unseen(fname, fval)`` is true if the encoding does not contain any joint features that are true when ``fs[fname]==fval``. The ``alwayson_features`` parameter can be used to add "always-on features", which have the form:: | joint_feat(fs, l) = { 1 if (l == label) | { | { 0 otherwise These always-on features allow the maxent model to directly model the prior probabilities of each label.

Class Method	`train`	Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``BinaryMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding.
Method	`__init__`	:param labels: A list of the "known labels" for this encoding.
Method	`describe`	:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
Method	`encode`	Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
Method	`labels`	:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list
Method	`length`	:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int
Instance Variable	`_alwayson`	dict mapping from label -> fid
Instance Variable	`_inv_mapping`	Undocumented
Instance Variable	`_labels`	A list of attested labels.
Instance Variable	`_length`	The length of generated joint feature vectors.
Instance Variable	`_mapping`	dict mapping from (fname,fval,label) -> fid
Instance Variable	`_unseen`	dict mapping from fname -> fid

@classmethod
def train(cls, train_toks, count_cutoff=0, labels=None, **options): (source) ¶

overrides nltk.classify.maxent.MaxentFeatureEncodingI.train

overridden in nltk.classify.maxent.TadmEventMaxentFeatureEncoding

Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``BinaryMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding. :type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label. :type count_cutoff: int :param count_cutoff: A cutoff value that is used to discard rare joint-features. If a joint-feature's value is 1 fewer than ``count_cutoff`` times in the training corpus, then that joint-feature is not included in the generated encoding. :type labels: list :param labels: A list of labels that should be used by the classifier. If not specified, then the set of labels attested in ``train_toks`` will be used. :param options: Extra parameters for the constructor, such as ``unseen_features`` and ``alwayson_features``.

def __init__(self, labels, mapping, unseen_features=False, alwayson_features=False): (source) ¶

overridden in nltk.classify.maxent.GISEncoding, nltk.classify.maxent.TadmEventMaxentFeatureEncoding

:param labels: A list of the "known labels" for this encoding. :param mapping: A dictionary mapping from ``(fname,fval,label)`` tuples to corresponding joint-feature indexes. These indexes must be the set of integers from 0...len(mapping). If ``mapping[fname,fval,label]=id``, then ``self.encode(..., fname:fval, ..., label)[id]`` is 1; otherwise, it is 0. :param unseen_features: If true, then include unseen value features in the generated joint-feature vectors. :param alwayson_features: If true, then include always-on features in the generated joint-feature vectors.

def describe(self, f_id): (source) ¶

overrides nltk.classify.maxent.MaxentFeatureEncodingI.describe

overridden in nltk.classify.maxent.GISEncoding, nltk.classify.maxent.TadmEventMaxentFeatureEncoding

:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str

def encode(self, featureset, label): (source) ¶

overrides nltk.classify.maxent.MaxentFeatureEncodingI.encode

overridden in nltk.classify.maxent.GISEncoding, nltk.classify.maxent.TadmEventMaxentFeatureEncoding

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

:type featureset: dict :rtype: list(tuple(int, int))

def labels(self): (source) ¶

overrides nltk.classify.maxent.MaxentFeatureEncodingI.labels

overridden in nltk.classify.maxent.TadmEventMaxentFeatureEncoding

:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list

def length(self): (source) ¶

overrides nltk.classify.maxent.MaxentFeatureEncodingI.length

overridden in nltk.classify.maxent.GISEncoding, nltk.classify.maxent.TadmEventMaxentFeatureEncoding

:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int

_alwayson = (source) ¶

dict mapping from label -> fid

_inv_mapping = (source) ¶

Undocumented

_labels = (source) ¶

A list of attested labels.

_length = (source) ¶

The length of generated joint feature vectors.

_mapping = (source) ¶

overridden in nltk.classify.maxent.TadmEventMaxentFeatureEncoding

dict mapping from (fname,fval,label) -> fid

_unseen = (source) ¶

dict mapping from fname -> fid