class GISEncoding(BinaryMaxentFeatureEncoding): (source)
Constructor: GISEncoding(labels, mapping, unseen_features, alwayson_features, C)
A binary feature encoding which adds one new joint-feature to the joint-features defined by ``BinaryMaxentFeatureEncoding``: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number. This new feature is used to ensure two preconditions for the GIS training algorithm:
- At least one feature vector index must be nonzero for every token.
- The feature vector must sum to a constant non-negative number for every token.
Method | __init__ |
:param C: The correction constant. The value of the correction feature is based on this value. In particular, its value is ``C - sum([v for (f,v) in encoding])``. :seealso: ``BinaryMaxentFeatureEncoding... |
Method | describe |
:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str |
Method | encode |
Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature. |
Method | length |
:return: The size of the fixed-length joint-feature vectors that are generated by this encoding. :rtype: int |
Property | C |
The non-negative constant that all encoded feature vectors will sum to. |
Instance Variable | _C |
Undocumented |
Inherited from BinaryMaxentFeatureEncoding
:
Class Method | train |
Construct and return new feature encoding, based on a given training corpus ``train_toks``. See the class description ``BinaryMaxentFeatureEncoding`` for a description of the joint-features that will be included in this encoding. |
Method | labels |
:return: A list of the "known labels" -- i.e., all labels ``l`` such that ``self.encode(fs,l)`` can be a nonzero joint-feature vector for some value of ``fs``. :rtype: list |
Instance Variable | _alwayson |
dict mapping from label -> fid |
Instance Variable | _inv |
Undocumented |
Instance Variable | _labels |
A list of attested labels. |
Instance Variable | _length |
The length of generated joint feature vectors. |
Instance Variable | _mapping |
dict mapping from (fname,fval,label) -> fid |
Instance Variable | _unseen |
dict mapping from fname -> fid |
:param C: The correction constant. The value of the correction feature is based on this value. In particular, its value is ``C - sum([v for (f,v) in encoding])``. :seealso: ``BinaryMaxentFeatureEncoding.__init__``
:return: A string describing the value of the joint-feature whose index in the generated feature vectors is ``fid``. :rtype: str
Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of ``(index, value)`` tuples, specifying the value of each non-zero joint-feature.
:type featureset: dict :rtype: list(tuple(int, int))