nltk.translate.nist

module documentation

(source)

NIST score implementation.

Function	`corpus_nist`	Calculate a single corpus-level NIST score (aka. system-level BLEU) for all the hypotheses and their respective references.
Function	`nist_length_penalty`	Calculates the NIST length penalty, from Eq. 3 in Doddington (2002)
Function	`sentence_nist`	Calculate NIST score from George Doddington. 2002. "Automatic evaluation of machine translation quality using n-gram co-occurrence statistics." Proceedings of HLT. Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=1289189.1289273...

def corpus_nist(list_of_references, hypotheses, n=5): (source) ¶

Calculate a single corpus-level NIST score (aka. system-level BLEU) for all the hypotheses and their respective references.

Parameters
list_of_references	Undocumented
hypotheses:list(list(str))	a list of hypothesis sentences
n:int	highest n-gram order
references:list(list(list(str)))	a corpus of lists of reference sentences, w.r.t. hypotheses

def nist_length_penalty(ref_len, hyp_len): (source) ¶

Calculates the NIST length penalty, from Eq. 3 in Doddington (2002)

penalty = exp( beta * log( min( len(hyp)/len(ref) , 1.0 )))

where,

beta is chosen to make the brevity penalty factor = 0.5 when the no. of words in the system output (hyp) is 2/3 of the average no. of words in the reference translation (ref)

The NIST penalty is different from BLEU's such that it minimize the impact of the score of small variations in the length of a translation. See Fig. 4 in Doddington (2002)

def sentence_nist(references, hypothesis, n=5): (source) ¶

Calculate NIST score from George Doddington. 2002. "Automatic evaluation of machine translation quality using n-gram co-occurrence statistics." Proceedings of HLT. Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=1289189.1289273

DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. The official script used by NIST to compute BLEU and NIST score is mteval-14.pl. The main differences are:

BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.

NIST has a different brevity penalty

NIST score from mteval-14.pl has a self-contained tokenizer

Note: The mteval-14.pl includes a smoothing function for BLEU score that is NOT: used in the NIST score computation.

>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...               'ensures', 'that', 'the', 'military', 'always',
...               'obeys', 'the', 'commands', 'of', 'the', 'party']

>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
...               'forever', 'hearing', 'the', 'activity', 'guidebook',
...               'that', 'party', 'direct']

>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...               'ensures', 'that', 'the', 'military', 'will', 'forever',
...               'heed', 'Party', 'commands']

>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...               'guarantees', 'the', 'military', 'forces', 'always',
...               'being', 'under', 'the', 'command', 'of', 'the',
...               'Party']

>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...               'army', 'always', 'to', 'heed', 'the', 'directions',
...               'of', 'the', 'party']

>>> sentence_nist([reference1, reference2, reference3], hypothesis1) # doctest: +ELLIPSIS
3.3709...

>>> sentence_nist([reference1, reference2, reference3], hypothesis2) # doctest: +ELLIPSIS
1.4619...

Parameters
references:list(list(str))	reference sentences
hypothesis:list(str)	a hypothesis sentence
n:int	highest n-gram order

nltk.translate.nist_score

`nltk.translate.nist_score`