class documentation

An abstract base class for Features. A Feature is a combination of a specific property-computing method and a list of relative positions to apply that method to.

The property-computing method, M{extract_property(tokens, index)}, must be implemented by every subclass. It extracts or computes a specific property for the token at the current index. Typical extract_property() methods return features such as the token text or tag; but more involved methods may consider the entire sequence M{tokens} and for instance compute the length of the sentence the token belongs to.

In addition, the subclass may have a PROPERTY_NAME, which is how it will be printed (in Rules and Templates, etc). If not given, defaults to the classname.

Class Method decode_json_obj Undocumented
Class Method expand Return a list of features, one for each start point in starts and for each window length in winlen. If excludezero is True, no Features containing 0 in its positions will be generated (many tbl trainers have a special representation for the target feature at [0])...
Static Method extract_property Any subclass of Feature must define static method extract_property(tokens, index)
Method __eq__ Undocumented
Method __ge__ Undocumented
Method __gt__ Undocumented
Method __init__ Construct a Feature which may apply at C{positions}.
Method __le__ Undocumented
Method __lt__ Undocumented
Method __ne__ Undocumented
Method __repr__ Undocumented
Method encode_json_obj Undocumented
Method intersects Return True if the positions of this Feature intersects with those of other
Method issuperset Return True if this Feature always returns True when other does
Class Variable json_tag Undocumented
Instance Variable positions Undocumented
Instance Variable PROPERTY_NAME Undocumented
@classmethod
def decode_json_obj(cls, obj): (source)

Undocumented

@classmethod
def expand(cls, starts, winlens, excludezero=False): (source)

Return a list of features, one for each start point in starts and for each window length in winlen. If excludezero is True, no Features containing 0 in its positions will be generated (many tbl trainers have a special representation for the target feature at [0])

For instance, importing a concrete subclass (Feature is abstract) >>> from nltk.tag.brill import Word

First argument gives the possible start positions, second the possible window lengths >>> Word.expand([-3,-2,-1], [1]) [Word([-3]), Word([-2]), Word([-1])]

>>> Word.expand([-2,-1], [1])
[Word([-2]), Word([-1])]
>>> Word.expand([-3,-2,-1], [1,2])
[Word([-3]), Word([-2]), Word([-1]), Word([-3, -2]), Word([-2, -1])]
>>> Word.expand([-2,-1], [1])
[Word([-2]), Word([-1])]

a third optional argument excludes all Features whose positions contain zero >>> Word.expand([-2,-1,0], [1,2], excludezero=False) [Word([-2]), Word([-1]), Word([0]), Word([-2, -1]), Word([-1, 0])]

>>> Word.expand([-2,-1,0], [1,2], excludezero=True)
[Word([-2]), Word([-1]), Word([-2, -1])]

All window lengths must be positive >>> Word.expand([-2,-1], [0]) Traceback (most recent call last):

File "<stdin>", line 1, in <module> File "nltk/tag/tbl/template.py", line 371, in expand

ValueError: non-positive window length in [0]

Parameters
starts:list of intswhere to start looking for Feature
winlenswindow lengths where to look for Feature
excludezero:booldo not output any Feature with 0 in any of its positions.
Returns
list of Features
Raises
ValueErrorfor non-positive window lengths
@staticmethod
@abstractmethod
def extract_property(tokens, index): (source)

Any subclass of Feature must define static method extract_property(tokens, index)

Parameters
tokens:list of tokensthe sequence of tokens
index:intthe current index
Returns
any (but usually scalar)feature value
def __eq__(self, other): (source)

Undocumented

def __ge__(self, other): (source)

Undocumented

def __gt__(self, other): (source)

Undocumented

def __init__(self, positions, end=None): (source)

Construct a Feature which may apply at C{positions}.

#For instance, importing some concrete subclasses (Feature is abstract) >>> from nltk.tag.brill import Word, Pos

#Feature Word, applying at one of [-2, -1] >>> Word([-2,-1]) Word([-2, -1])

#Positions need not be contiguous >>> Word([-2,-1, 1]) Word([-2, -1, 1])

#Contiguous ranges can alternatively be specified giving the #two endpoints (inclusive) >>> Pos(-3, -1) Pos([-3, -2, -1])

#In two-arg form, start <= end is enforced >>> Pos(2, 1) Traceback (most recent call last):

File "<stdin>", line 1, in <module> File "nltk/tbl/template.py", line 306, in __init__

raise TypeError

ValueError: illegal interval specification: (start=2, end=1)

An alternative calling convention, for contiguous positions only, is Feature(start, end):

Parameters
positions:list of intthe positions at which this features should apply
end:intend of range (NOTE: inclusive!) where this feature should apply
start:intstart of range where this feature should apply
Raises
ValueErrorillegal position specifications
def __le__(self, other): (source)

Undocumented

def __lt__(self, other): (source)

Undocumented

def __ne__(self, other): (source)

Undocumented

def __repr__(self): (source)

Undocumented

def encode_json_obj(self): (source)

Undocumented

def intersects(self, other): (source)

Return True if the positions of this Feature intersects with those of other

More precisely, return True if this feature refers to the same property as other; and there is some overlap in the positions they look at.

#For instance, importing a concrete subclass (Feature is abstract) >>> from nltk.tag.brill import Word, Pos

>>> Word([-3,-2,-1]).intersects(Word([-3,-2]))
True
>>> Word([-3,-2,-1]).intersects(Word([-3,-2, 0]))
True
>>> Word([-3,-2,-1]).intersects(Word([0]))
False

#Feature subclasses must agree >>> Word([-3,-2,-1]).intersects(Pos([-3,-2])) False

Parameters
other:(subclass of) Featurefeature with which to compare
Returns
boolTrue if feature classes agree and there is some overlap in the positions they look at
def issuperset(self, other): (source)

Return True if this Feature always returns True when other does

More precisely, return True if this feature refers to the same property as other; and this Feature looks at all positions that other does (and possibly other positions in addition).

#For instance, importing a concrete subclass (Feature is abstract) >>> from nltk.tag.brill import Word, Pos

>>> Word([-3,-2,-1]).issuperset(Word([-3,-2]))
True
>>> Word([-3,-2,-1]).issuperset(Word([-3,-2, 0]))
False

#Feature subclasses must agree >>> Word([-3,-2,-1]).issuperset(Pos([-3,-2])) False

Parameters
other:(subclass of) Featurefeature with which to compare
Returns
boolTrue if this feature is superset, otherwise False
json_tag: str = (source)

Undocumented

positions = (source)

Undocumented

PROPERTY_NAME = (source)

Undocumented