nltk.translate.meteor

module documentation

(source)

Undocumented

Function	`allign_words`	Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. In case there are multiple matches the match which has the least number of crossing is chosen.
Function	`exact_match`	matches exact words in hypothesis and reference and returns a word mapping based on the enumerated word id between hypothesis and reference
Function	`meteor_score`	Calculates METEOR score for hypothesis with multiple references as described in "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL...
Function	`single_meteor_score`	Calculates METEOR score for single hypothesis and reference as per "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL...
Function	`stem_match`	Stems each word and matches them in hypothesis and reference and returns a word mapping between hypothesis and reference
Function	`wordnetsyn_match`	Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.
Function	`_count_chunks`	Counts the fewest possible number of chunks such that matched unigrams of each chunk are adjacent to each other. This is used to caluclate the fragmentation part of the metric.
Function	`_enum_allign_words`	Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. in case there are multiple matches the match which has the least number of crossing is chosen...
Function	`_enum_stem_match`	Stems each word and matches them in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id. The function also returns a enumerated list of unmatched words for hypothesis and reference.
Function	`_enum_wordnetsyn_match`	Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.
Function	`_generate_enums`	Takes in string inputs for hypothesis and reference and returns enumerated word lists for each of them
Function	`_match_enums`	matches exact words in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id.

def allign_words(hypothesis, reference, stemmer=PorterStemmer(), wordnet=wordnet): (source) ¶

Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. In case there are multiple matches the match which has the least number of crossing is chosen.

Parameters
hypothesis	hypothesis string
reference	reference string
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tuples	sorted list of matched tuples, unmatched hypothesis list, unmatched reference list

def exact_match(hypothesis, reference): (source) ¶

matches exact words in hypothesis and reference and returns a word mapping based on the enumerated word id between hypothesis and reference

Parameters
hypothesis:str	hypothesis string
reference:str	reference string
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuples	enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples

def meteor_score(references, hypothesis, preprocess=str.lower, stemmer=PorterStemmer(), wordnet=wordnet, alpha=0.9, beta=3, gamma=0.5): (source) ¶

Calculates METEOR score for hypothesis with multiple references as described in "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. http://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf

In case of multiple references the best score is chosen. This method iterates over single_meteor_score and picks the best pair among all the references for a given hypothesis

>>> hypothesis1 = 'It is a guide to action which ensures that the military always obeys the commands of the party'
>>> hypothesis2 = 'It is to insure the troops forever hearing the activity guidebook that party direct'

>>> reference1 = 'It is a guide to action that ensures that the military will forever heed Party commands'
>>> reference2 = 'It is the guiding principle which guarantees the military forces always being under the command of the Party'
>>> reference3 = 'It is the practical guide for the army always to heed the directions of the party'

>>> round(meteor_score([reference1, reference2, reference3], hypothesis1),4)
0.7398

If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.

>>> round(meteor_score(['this is a cat'], 'non matching hypothesis'),4)
0.0

Parameters
references:list(str)	reference sentences
hypothesis:str	a hypothesis sentence
preprocess:method	preprocessing function (default str.lower)
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
alpha:float	parameter for controlling relative weights of precision and recall.
beta:float	parameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma:float	relative weight assigned to fragmentation penality.
Returns
float	The sentence-level METEOR score.

def single_meteor_score(reference, hypothesis, preprocess=str.lower, stemmer=PorterStemmer(), wordnet=wordnet, alpha=0.9, beta=3, gamma=0.5): (source) ¶

Calculates METEOR score for single hypothesis and reference as per "Meteor: An Automatic Metric for MT Evaluation with HighLevels of Correlation with Human Judgments" by Alon Lavie and Abhaya Agarwal, in Proceedings of ACL. http://www.cs.cmu.edu/~alavie/METEOR/pdf/Lavie-Agarwal-2007-METEOR.pdf

>>> hypothesis1 = 'It is a guide to action which ensures that the military always obeys the commands of the party'

>>> reference1 = 'It is a guide to action that ensures that the military will forever heed Party commands'

>>> round(single_meteor_score(reference1, hypothesis1),4)
0.7398

If there is no words match during the alignment the method returns the score as 0. We can safely return a zero instead of raising a division by zero error as no match usually implies a bad translation.

>>> round(meteor_score('this is a cat', 'non matching hypothesis'),4)
0.0

Parameters
reference	Undocumented
hypothesis:str	a hypothesis sentence
preprocess:method	preprocessing function (default str.lower)
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
alpha:float	parameter for controlling relative weights of precision and recall.
beta:float	parameter for controlling shape of penalty as a function of as a function of fragmentation.
gamma:float	relative weight assigned to fragmentation penality.
references:list(str)	reference sentences
Returns
float	The sentence-level METEOR score.

def stem_match(hypothesis, reference, stemmer=PorterStemmer()): (source) ¶

Stems each word and matches them in hypothesis and reference and returns a word mapping between hypothesis and reference

Parameters
hypothesis:
reference:
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuples	enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples

def wordnetsyn_match(hypothesis, reference, wordnet=wordnet): (source) ¶

Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.

Parameters
hypothesis	hypothesis string
reference	reference string
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples	list of mapped tuples

def _count_chunks(matches): (source) ¶

Counts the fewest possible number of chunks such that matched unigrams of each chunk are adjacent to each other. This is used to caluclate the fragmentation part of the metric.

Parameters
matches	list containing a mapping of matched words (output of allign_words)
Returns
int	Number of chunks a sentence is divided into post allignment

def _enum_allign_words(enum_hypothesis_list, enum_reference_list, stemmer=PorterStemmer(), wordnet=wordnet): (source) ¶

Aligns/matches words in the hypothesis to reference by sequentially applying exact match, stemmed match and wordnet based synonym match. in case there are multiple matches the match which has the least number of crossing is chosen. Takes enumerated list as input instead of string input

Parameters
enum_hypothesis_list	enumerated hypothesis list
enum_reference_list	enumerated reference list
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tuples	sorted list of matched tuples, unmatched hypothesis list, unmatched reference list

def _enum_stem_match(enum_hypothesis_list, enum_reference_list, stemmer=PorterStemmer()): (source) ¶

Stems each word and matches them in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id. The function also returns a enumerated list of unmatched words for hypothesis and reference.

Parameters
enum_hypothesis_list:
enum_reference_list:
stemmer:nltk.stem.api.StemmerI or any class that implements a stem method	nltk.stem.api.StemmerI object (default PorterStemmer())
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuples	enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples

def _enum_wordnetsyn_match(enum_hypothesis_list, enum_reference_list, wordnet=wordnet): (source) ¶

Matches each word in reference to a word in hypothesis if any synonym of a hypothesis word is the exact match to the reference word.

Parameters
enum_hypothesis_list	enumerated hypothesis list
enum_reference_list	enumerated reference list
wordnet:WordNetCorpusReader	a wordnet corpus reader object (default nltk.corpus.wordnet)
Returns
list of tuples, list of tuples, list of tuples	list of matched tuples, unmatched hypothesis list, unmatched reference list

def _generate_enums(hypothesis, reference, preprocess=str.lower): (source) ¶

Takes in string inputs for hypothesis and reference and returns enumerated word lists for each of them

Parameters
hypothesis:str	hypothesis string
reference:str	reference string
preprocess:method	Undocumented
Returns
list of 2D tuples, list of 2D tuples	enumerated words list
Unknown Field: preprocess
preprocessing method (default str.lower)

def _match_enums(enum_hypothesis_list, enum_reference_list): (source) ¶

matches exact words in hypothesis and reference and returns a word mapping between enum_hypothesis_list and enum_reference_list based on the enumerated word id.

Parameters
enum_hypothesis_list:list of tuples	enumerated hypothesis list
enum_reference_list:list of 2D tuples	enumerated reference list
Returns
list of 2D tuples, list of 2D tuples, list of 2D tuples	enumerated matched tuples, enumerated unmatched hypothesis tuples, enumerated unmatched reference tuples

nltk.translate.meteor_score

`nltk.translate.meteor_score`