nltk.translate.ribes

module documentation

(source)

RIBES score implementation

Function	`corpus_ribes`	This function "calculates RIBES for a system output (hypothesis) with multiple references, and returns "best" score among multi-references and individual scores. The scores are corpus-wise, i.e., averaged by the number of sentences...
Function	`find_increasing_sequences`	Given the worder list, this function groups monotonic +1 sequences.
Function	`kendall_tau`	Calculates the Kendall's Tau correlation coefficient given the worder list of word alignments from word_rank_alignment(), using the formula:
Function	`position_of_ngram`	This function returns the position of the first instance of the ngram appearing in a sentence.
Function	`sentence_ribes`	The RIBES (Rank-based Intuitive Bilingual Evaluation Score) from Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh and Hajime Tsukada. 2010. "Automatic Evaluation of Translation Quality for Distant Language Pairs"...
Function	`spearman_rho`	Calculates the Spearman's Rho correlation coefficient given the worder list of word alignment from word_rank_alignment(), using the formula:
Function	`word_rank_alignment`	This is the word rank alignment algorithm described in the paper to produce the worder list, i.e. a list of word indices of the hypothesis word orders w.r.t. the list of reference words.

def corpus_ribes(list_of_references, hypotheses, alpha=0.25, beta=0.1): (source) ¶

This function "calculates RIBES for a system output (hypothesis) with multiple references, and returns "best" score among multi-references and individual scores. The scores are corpus-wise, i.e., averaged by the number of sentences." (c.f. RIBES version 1.03.1 code).

Different from BLEU's micro-average precision, RIBES calculates the macro-average precision by averaging the best RIBES score for each pair of hypothesis and its corresponding references

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']

>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']
>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history',
...          'because', 'he', 'read', 'the', 'book']

>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]
>>> hypotheses = [hyp1, hyp2]
>>> round(corpus_ribes(list_of_references, hypotheses),4)
0.3597

Parameters
list_of_references	Undocumented
hypotheses:list(list(str))	a list of hypothesis sentences
alpha:float	hyperparameter used as a prior for the unigram precision.
beta:float	hyperparameter used as a prior for the brevity penalty.
references:list(list(list(str)))	a corpus of lists of reference sentences, w.r.t. hypotheses
Returns
float	The best ribes score from one of the references.

def find_increasing_sequences(worder): (source) ¶

Given the worder list, this function groups monotonic +1 sequences.

>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> list(find_increasing_sequences(worder))
[(7, 8, 9, 10), (0, 1, 2, 3, 4, 5)]

Parameters
worder	The worder list output from word_rank_alignment
type	list(int)

def kendall_tau(worder, normalize=True): (source) ¶

Calculates the Kendall's Tau correlation coefficient given the worder list of word alignments from word_rank_alignment(), using the formula:

tau = 2 * num_increasing_pairs / num_possible pairs -1

Note that the no. of increasing pairs can be discontinuous in the worder list and each each increasing sequence can be tabulated as choose(len(seq), 2) no. of increasing pairs, e.g.

>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> number_possible_pairs = choose(len(worder), 2)
>>> round(kendall_tau(worder, normalize=False),3)
-0.236
>>> round(kendall_tau(worder),3)
0.382

Parameters
worder:list(int)	The worder list output from word_rank_alignment
normalize:boolean	Flag to indicate normalization
Returns
float	The Kendall's Tau correlation coefficient.

def position_of_ngram(ngram, sentence): (source) ¶

This function returns the position of the first instance of the ngram appearing in a sentence.

Note that one could also use string as follows but the code is a little convoluted with type casting back and forth:

char_pos = ' '.join(sent)[:' '.join(sent).index(' '.join(ngram))] word_pos = char_pos.count(' ')

Another way to conceive this is:

return next(i for i, ng in enumerate(ngrams(sentence, len(ngram)))

if ng == ngram)

Parameters
ngram:tuple	The ngram that needs to be searched
sentence:list(str)	The list of tokens to search from.

def sentence_ribes(references, hypothesis, alpha=0.25, beta=0.1): (source) ¶

The RIBES (Rank-based Intuitive Bilingual Evaluation Score) from Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh and Hajime Tsukada. 2010. "Automatic Evaluation of Translation Quality for Distant Language Pairs". In Proceedings of EMNLP. http://www.aclweb.org/anthology/D/D10/D10-1092.pdf

The generic RIBES scores used in shared task, e.g. Workshop for Asian Translation (WAT) uses the following RIBES calculations:

RIBES = kendall_tau * (alpha**p1) * (beta**bp)

Please note that this re-implementation differs from the official RIBES implementation and though it emulates the results as describe in the original paper, there are further optimization implemented in the official RIBES script.

Users are encouraged to use the official RIBES script instead of this implementation when evaluating your machine translation system. Refer to http://www.kecl.ntt.co.jp/icl/lirg/ribes/ for the official script.

Parameters
references	a list of reference sentences
hypothesis:list(str)	a hypothesis sentence
alpha:float	hyperparameter used as a prior for the unigram precision.
beta:float	hyperparameter used as a prior for the brevity penalty.
reference:list(list(str))	Undocumented
Returns
float	The best ribes score from one of the references.

def spearman_rho(worder, normalize=True): (source) ¶

Calculates the Spearman's Rho correlation coefficient given the worder list of word alignment from word_rank_alignment(), using the formula:

rho = 1 - sum(d**2) / choose(len(worder)+1, 3)

Given that d is the sum of difference between the worder list of indices and the original word indices from the reference sentence.

Using the (H0,R0) and (H5, R5) example from the paper

>>> worder =  [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> round(spearman_rho(worder, normalize=False), 3)
-0.591
>>> round(spearman_rho(worder), 3)
0.205

Parameters
worder	The worder list output from word_rank_alignment
normalize	Undocumented
type	list(int)

def word_rank_alignment(reference, hypothesis, character_based=False): (source) ¶

This is the word rank alignment algorithm described in the paper to produce the worder list, i.e. a list of word indices of the hypothesis word orders w.r.t. the list of reference words.

Below is (H0, R0) example from the Isozaki et al. 2010 paper, note the examples are indexed from 1 but the results here are indexed from 0:

>>> ref = str('he was interested in world history because he '
... 'read the book').split()
>>> hyp = str('he read the book because he was interested in world '
... 'history').split()
>>> word_rank_alignment(ref, hyp)
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]

The (H1, R1) example from the paper, note the 0th index:

>>> ref = 'John hit Bob yesterday'.split()
>>> hyp = 'Bob hit John yesterday'.split()
>>> word_rank_alignment(ref, hyp)
[2, 1, 0, 3]

Here is the (H2, R2) example from the paper, note the 0th index here too:

>>> ref = 'the boy read the book'.split()
>>> hyp = 'the book was read by the boy'.split()
>>> word_rank_alignment(ref, hyp)
[3, 4, 2, 0, 1]

Parameters
reference:list(str)	a reference sentence
hypothesis:list(str)	a hypothesis sentence
character_based	Undocumented

nltk.translate.ribes_score

`nltk.translate.ribes_score`