module documentation

Undocumented

Function allign_words Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. In case there are multiple matches the match which has the least number of crossing is chosen.
Function exact_match matches exact words in hypothesis and reference and returns a word mapping based on the enumerated word id between hypothesis and reference
Function meteor_score Calculates METEOR score for hypothesis with multiple references as described in "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL...
Function single_meteor_score Calculates METEOR score for single hypothesis and reference as per "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL...
Function stem_match Stems each word and matches them in hypothesis and reference and returns a word mapping between hypothesis and reference
Function wordnetsyn_match Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.
Function _count_chunks Counts the fewest possible number of chunks such that matched unigrams of each chunk are adjacent to each other. This is used to caluclate the fragmentation part of the metric.
Function _enum_allign_words Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. in case there are multiple matches the match which has the least number of crossing is chosen...
Function _enum_stem_match Stems each word and matches them in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id. The function also returns a enumerated list of unmatched words for hypothesis and reference.
Function _enum_wordnetsyn_match Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.
Function _generate_enums Takes in string inputs for hypothesis and reference and returns enumerated word lists for each of them
Function _match_enums matches exact words in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id.
def allign_words(hypothesis, reference, stemmer=PorterStemmer(), wordnet=wordnet): (source)

Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. In case there are multiple matches the match which has the least number of crossing is chosen.

Parameters
hypothesishypothesis string
referencereference string
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tuplessorted list of matched tuples, unmatched hypothesis list, unmatched reference list
def exact_match(hypothesis, reference): (source)

matches exact words in hypothesis and reference and returns a word mapping based on the enumerated word id between hypothesis and reference

Parameters
hypothesis:strhypothesis string
reference:strreference string
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuplesenumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples
def meteor_score(references, hypothesis, preprocess=str.lower, stemmer=PorterStemmer(), wordnet=wordnet, alpha=0.9, beta=3, gamma=0.5): (source)

Calculates METEOR score for hypothesis with multiple references as described in "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. http://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf

In case of multiple references the best score is chosen. This method iterates over single_meteor_score and picks the best pair among all the references for a given hypothesis

>>> hypothesis1 = 'It is a guide to action which ensures that the military always obeys the commands of the party'
>>> hypothesis2 = 'It is to insure the troops forever hearing the activity guidebook that party direct'
>>> reference1 = 'It is a guide to action that ensures that the military will forever heed Party commands'
>>> reference2 = 'It is the guiding principle which guarantees the military forces always being under the command of the Party'
>>> reference3 = 'It is the practical guide for the army always to heed the directions of the party'
>>> round(meteor_score([reference1, reference2, reference3], hypothesis1),4)
0.7398
If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.
>>> round(meteor_score(['this is a cat'], 'non matching hypothesis'),4)
0.0
Parameters
references:list(str)reference sentences
hypothesis:stra hypothesis sentence
preprocess:methodpreprocessing function (default str.lower)
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
alpha:floatparameter for controlling relative weights of precision and recall.
beta:floatparameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma:floatrelative weight assigned to fragmentation penality.
Returns
floatThe sentence-level METEOR score.
def single_meteor_score(reference, hypothesis, preprocess=str.lower, stemmer=PorterStemmer(), wordnet=wordnet, alpha=0.9, beta=3, gamma=0.5): (source)

Calculates METEOR score for single hypothesis and reference as per "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. http://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf

>>> hypothesis1 = 'It is a guide to action which ensures that the military always obeys the commands of the party'
>>> reference1 = 'It is a guide to action that ensures that the military will forever heed Party commands'
>>> round(single_meteor_score(reference1, hypothesis1),4)
0.7398
If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.
>>> round(meteor_score('this is a cat', 'non matching hypothesis'),4)
0.0
Parameters
referenceUndocumented
hypothesis:stra hypothesis sentence
preprocess:methodpreprocessing function (default str.lower)
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
alpha:floatparameter for controlling relative weights of precision and recall.
beta:floatparameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma:floatrelative weight assigned to fragmentation penality.
references:list(str)reference sentences
Returns
floatThe sentence-level METEOR score.
def stem_match(hypothesis, reference, stemmer=PorterStemmer()): (source)

Stems each word and matches them in hypothesis and reference and returns a word mapping between hypothesis and reference

Parameters
hypothesis:
reference:
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuplesenumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples
def wordnetsyn_match(hypothesis, reference, wordnet=wordnet): (source)

Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.

Parameters
hypothesishypothesis string
referencereference string
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tupleslist of mapped tuples
def _count_chunks(matches): (source)

Counts the fewest possible number of chunks such that matched unigrams of each chunk are adjacent to each other. This is used to caluclate the fragmentation part of the metric.

Parameters
matcheslist containing a mapping of matched words (output of allign_words)
Returns
intNumber of chunks a sentence is divided into post allignment
def _enum_allign_words(enum_hypothesis_list, enum_reference_list, stemmer=PorterStemmer(), wordnet=wordnet): (source)

Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. in case there are multiple matches the match which has the least number of crossing is chosen. Takes enumerated list as input instead of string input

Parameters
enum_hypothesis_listenumerated hypothesis list
enum_reference_listenumerated reference list
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tuplessorted list of matched tuples, unmatched hypothesis list, unmatched reference list
def _enum_stem_match(enum_hypothesis_list, enum_reference_list, stemmer=PorterStemmer()): (source)

Stems each word and matches them in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id. The function also returns a enumerated list of unmatched words for hypothesis and reference.

Parameters
enum_hypothesis_list:
enum_reference_list:
stemmer:nltk.stem.api.StemmerI or any class that implements a stem methodnltk.stem.api.StemmerI object (default PorterStemmer())
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuplesenumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples
def _enum_wordnetsyn_match(enum_hypothesis_list, enum_reference_list, wordnet=wordnet): (source)

Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.

Parameters
enum_hypothesis_listenumerated hypothesis list
enum_reference_listenumerated reference list
wordnet:WordNetCorpusReadera wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tupleslist of matched tuples, unmatched hypothesis list, unmatched reference list
def _generate_enums(hypothesis, reference, preprocess=str.lower): (source)

Takes in string inputs for hypothesis and reference and returns enumerated word lists for each of them

Parameters
hypothesis:strhypothesis string
reference:strreference string
preprocess:methodUndocumented
Returns
list of 2D tuples, list of 2D tuplesenumerated words list
Unknown Field: preprocess
preprocessing method (default str.lower)
def _match_enums(enum_hypothesis_list, enum_reference_list): (source)

matches exact words in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id.

Parameters
enum_hypothesis_list:list of tuplesenumerated hypothesis list
enum_reference_list:list of 2D tuplesenumerated reference list
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuplesenumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples