module documentation

Simple classifier for RTE corpus.

It calculates the overlap in words and named entities between text and hypothesis, and also whether there are words / named entities in the hypothesis which fail to occur in the text, since this is an indicator that the hypothesis is more informative than (i.e not entailed by) the text.

TO DO: better Named Entity classification TO DO: add lemmatization

Class RTEFeatureExtractor This builds a bag of words for both the text and the hypothesis after throwing away some stopwords, then calculates overlap and difference.
Function rte_classifier Undocumented
Function rte_features Undocumented
Function rte_featurize Undocumented
def rte_classifier(algorithm, sample_N=None): (source)

Undocumented

def rte_features(rtepair): (source)

Undocumented

def rte_featurize(rte_pairs): (source)

Undocumented