class documentation

class RTEFeatureExtractor(object): (source)

Constructor: RTEFeatureExtractor(rtepair, stop, use_lemmatize)

View In Hierarchy

This builds a bag of words for both the text and the hypothesis after throwing away some stopwords, then calculates overlap and difference.

Method __init__ No summary
Method hyp_extra Compute the extraneous material in the hypothesis.
Method overlap Compute the overlap between text and hypothesis.
Instance Variable hyp_tokens Undocumented
Instance Variable hyp_words Undocumented
Instance Variable negwords Undocumented
Instance Variable stop Undocumented
Instance Variable stopwords Undocumented
Instance Variable text_tokens Undocumented
Instance Variable text_words Undocumented
Static Method _lemmatize Use morphy from WordNet to find the base form of verbs.
Static Method _ne This just assumes that words in all caps or titles are named entities.
Instance Variable _hyp_extra Undocumented
Instance Variable _overlap Undocumented
Instance Variable _txt_extra Undocumented
def __init__(self, rtepair, stop=True, use_lemmatize=False): (source)
Parameters
rtepaira RTEPair from which features should be extracted
stop:boolif True, stopwords are thrown away.
use_lemmatizeUndocumented
def hyp_extra(self, toktype, debug=True): (source)

Compute the extraneous material in the hypothesis.

Parameters
toktype:'ne' or 'word'distinguish Named Entities from ordinary words
debugUndocumented
def overlap(self, toktype, debug=False): (source)

Compute the overlap between text and hypothesis.

Parameters
toktype:'ne' or 'word'distinguish Named Entities from ordinary words
debugUndocumented
hyp_tokens = (source)

Undocumented

hyp_words = (source)

Undocumented

negwords = (source)

Undocumented

Undocumented

stopwords = (source)

Undocumented

text_tokens = (source)

Undocumented

text_words = (source)

Undocumented

@staticmethod
def _lemmatize(word): (source)

Use morphy from WordNet to find the base form of verbs.

@staticmethod
def _ne(token): (source)

This just assumes that words in all caps or titles are named entities.

Parameters
token:strUndocumented
_hyp_extra = (source)

Undocumented

_overlap = (source)

Undocumented

_txt_extra = (source)

Undocumented