nltk.classify.rte_classify.RTEFeatureExtractor

Modules Classes Names

«

class documentation

class RTEFeatureExtractor(object): (source)

Constructor: RTEFeatureExtractor(rtepair, stop, use_lemmatize)

View In Hierarchy

This builds a bag of words for both the text and the hypothesis after throwing away some stopwords, then calculates overlap and difference.

Method	`__init__`	No summary
Method	`hyp_extra`	Compute the extraneous material in the hypothesis.
Method	`overlap`	Compute the overlap between text and hypothesis.
Instance Variable	`hyp_tokens`	Undocumented
Instance Variable	`hyp_words`	Undocumented
Instance Variable	`negwords`	Undocumented
Instance Variable	`stop`	Undocumented
Instance Variable	`stopwords`	Undocumented
Instance Variable	`text_tokens`	Undocumented
Instance Variable	`text_words`	Undocumented
Static Method	`_lemmatize`	Use morphy from WordNet to find the base form of verbs.
Static Method	`_ne`	This just assumes that words in all caps or titles are named entities.
Instance Variable	`_hyp_extra`	Undocumented
Instance Variable	`_overlap`	Undocumented
Instance Variable	`_txt_extra`	Undocumented

def __init__(self, rtepair, stop=True, use_lemmatize=False): (source) ¶

Parameters
rtepair	a `RTEPair` from which features should be extracted
stop:bool	if `True`, stopwords are thrown away.
use_lemmatize	Undocumented

def hyp_extra(self, toktype, debug=True): (source) ¶

Compute the extraneous material in the hypothesis.

Parameters
toktype:'ne' or 'word'	distinguish Named Entities from ordinary words
debug	Undocumented

def overlap(self, toktype, debug=False): (source) ¶

Compute the overlap between text and hypothesis.

Parameters
toktype:'ne' or 'word'	distinguish Named Entities from ordinary words
debug	Undocumented

hyp_tokens = (source) ¶

Undocumented

hyp_words = (source) ¶

Undocumented

negwords = (source) ¶

Undocumented

stop = (source) ¶

Undocumented

stopwords = (source) ¶

Undocumented

text_tokens = (source) ¶

Undocumented

text_words = (source) ¶

Undocumented

@staticmethod
def _lemmatize(word): (source) ¶

Use morphy from WordNet to find the base form of verbs.

@staticmethod
def _ne(token): (source) ¶

This just assumes that words in all caps or titles are named entities.

Parameters
token:str	Undocumented

_hyp_extra = (source) ¶

Undocumented

_overlap = (source) ¶

Undocumented

_txt_extra = (source) ¶

Undocumented