class documentation

A binary feature encoding which adds one new joint-feature to the joint-features defined by ``BinaryMaxentFeatureEncoding``: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number. This new feature is used to ensure two preconditions for the GIS training algorithm:

  • At least one feature vector index must be nonzero for every token.
  • The feature vector must sum to a constant non-negative number for every token.
Method __init__ :param C: The correction constant. The value of the correction feature is based on this value. In particular, its value is ``C - sum([v for (f,v) in encoding])``. :seealso: ``BinaryMaxentFeatureEncoding...
Method describe :return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
Method encode Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
Method length :return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int
Property C The non-negative constant that all encoded feature vectors will sum to.
Instance Variable _C Undocumented

Inherited from BinaryMaxentFeatureEncoding:

Class Method train Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``BinaryMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding.
Method labels :return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list
Instance Variable _alwayson dict mapping from label -> fid
Instance Variable _inv_mapping Undocumented
Instance Variable _labels A list of attested labels.
Instance Variable _length The length of generated joint feature vectors.
Instance Variable _mapping dict mapping from (fname,fval,label) -> fid
Instance Variable _unseen dict mapping from fname -> fid
def __init__(self, labels, mapping, unseen_features=False, alwayson_features=False, C=None): (source)

:param C: The correction constant. The value of the correction feature is based on this value. In particular, its value is ``C - sum([v for (f,v) in encoding])``. :seealso: ``BinaryMaxentFeatureEncoding.__init__``

def describe(self, f_id): (source)

:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str

def encode(self, featureset, label): (source)

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.

:type featureset: dict :rtype: list(tuple(int, int))

def length(self): (source)

:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int

@property
C = (source)

The non-negative constant that all encoded feature vectors will sum to.

Undocumented