nltk.tokenize.nist

module documentation

(source)

This is a NLTK port of the tokenizer used in the NIST BLEU evaluation script, https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl#L926 which was also ported into Python in https://github.com/lium-lst/nmtpy/blob/master/nmtpy/metrics/mtevalbleu.py#L162

Class NISTTokenizer This NIST tokenizer is sentence-based instead of the original paragraph-based tokenization from mteval-14.pl; The sentence-based tokenization is consistent with the other tokenizers available in NLTK.