nltk.translate.stack_decoder.StackDecoder

class documentation

class StackDecoder(object): (source)

Constructor: StackDecoder(phrase_table, language_model)

Phrase-based stack decoder for machine translation

>>> from nltk.translate import PhraseTable
>>> phrase_table = PhraseTable()
>>> phrase_table.add(('niemand',), ('nobody',), log(0.8))
>>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2))
>>> phrase_table.add(('erwartet',), ('expects',), log(0.8))
>>> phrase_table.add(('erwartet',), ('expecting',), log(0.2))
>>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1))
>>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8))
>>> phrase_table.add(('!',), ('!',), log(0.8))

>>> #  nltk.model should be used here once it is implemented
>>> from collections import defaultdict
>>> language_prob = defaultdict(lambda: -999.0)
>>> language_prob[('nobody',)] = log(0.5)
>>> language_prob[('expects',)] = log(0.4)
>>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2)
>>> language_prob[('!',)] = log(0.1)
>>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()

>>> stack_decoder = StackDecoder(phrase_table, language_model)

>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!'])
['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']

Static Method	`valid_phrases`	Extract phrases from `all_phrases_from` that contains words that have not been translated by `hypothesis`
Method	`__init__`	No summary
Method	`compute_future_scores`	Determines the approximate scores for translating every subsequence in `src_sentence`
Method	`distortion_factor.setter`	Undocumented
Method	`distortion_score`	Undocumented
Method	`expansion_score`	Calculate the score of expanding `hypothesis` with `translation_option`
Method	`find_all_src_phrases`	Finds all subsequences in src_sentence that have a phrase translation in the translation table
Method	`future_score`	Determines the approximate score for translating the untranslated words in `hypothesis`
Method	`translate`	No summary
Instance Variable	`beam_threshold`	hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.
Instance Variable	`language_model`	Undocumented
Instance Variable	`phrase_table`	Undocumented
Instance Variable	`stack_size`	Higher values increase the likelihood of a good translation, but increases processing time.
Instance Variable	`word_penalty`	If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.
Property	`distortion_factor`	Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5.
Method	`__compute_log_distortion`	Undocumented
Instance Variable	`__distortion_factor`	Undocumented
Instance Variable	`__log_distortion_factor`	Undocumented

@staticmethod
def valid_phrases(all_phrases_from, hypothesis): (source) ¶

Extract phrases from all_phrases_from that contains words that have not been translated by hypothesis

Parameters
all_phrases_from:list(list(int))	Phrases represented by their spans, in the same format as the return value of `find_all_src_phrases`
hypothesis:_Hypothesis	Undocumented
Returns
list(tuple(int, int))	A list of phrases, represented by their spans, that cover untranslated positions.

def __init__(self, phrase_table, language_model): (source) ¶

Parameters
phrase_table:PhraseTable	Table of translations for source language phrases and the log probabilities for those translations.
language_model:object	Target language model. Must define a `probability_change` method that calculates the change in log probability of a sentence, if a given string is appended to it. This interface is experimental and will likely be replaced with nltk.model once it is implemented.

def compute_future_scores(self, src_sentence): (source) ¶

Determines the approximate scores for translating every subsequence in src_sentence

Future scores can be used a look-ahead to determine the difficulty of translating the remaining parts of a src_sentence.

end positions. For example, result[2][5] is the score of the subsequence covering positions 2, 3, and 4. :rtype: dict(int: (dict(int): float))

Parameters
src_sentence:tuple(str)	Undocumented
Returns
Scores of subsequences referenced by their start and

@distortion_factor.setter
def distortion_factor(self, d): (source) ¶

Undocumented

def distortion_score(self, hypothesis, next_src_phrase_span): (source) ¶

Undocumented

def expansion_score(self, hypothesis, translation_option, src_phrase_span): (source) ¶

Calculate the score of expanding hypothesis with translation_option

Parameters
hypothesis:_Hypothesis	Hypothesis being expanded
translation_option:PhraseTableEntry	Information about the proposed expansion
src_phrase_span:tuple(int, int)	Word position span of the source phrase

def find_all_src_phrases(self, src_sentence): (source) ¶

Finds all subsequences in src_sentence that have a phrase translation in the translation table

Parameters
src_sentence:tuple(str)	Undocumented
Returns
list(list(int))	Subsequences that have a phrase translation, represented as a table of lists of end positions. For example, if result[2] is [5, 6, 9], then there are three phrases starting from position 2 in `src_sentence`, ending at positions 5, 6, and 9 exclusive. The list of ending positions are in ascending order.

def future_score(self, hypothesis, future_score_table, sentence_length): (source) ¶

Determines the approximate score for translating the untranslated words in hypothesis

def translate(self, src_sentence): (source) ¶

Parameters
src_sentence:list(str)	Sentence to be translated
Returns
list(str)	Translated sentence

beam_threshold: float = (source) ¶

float: Hypotheses that score below this factor of the best: hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.

language_model = (source) ¶

Undocumented

phrase_table = (source) ¶

Undocumented

stack_size: int = (source) ¶

int: Maximum number of hypotheses to consider in a stack.: Higher values increase the likelihood of a good translation, but increases processing time.

word_penalty: float = (source) ¶

float: Influences the translation length exponentially.: If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.

@property
distortion_factor = (source) ¶

float: Amount of reordering of source phrases.: Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5.

def __compute_log_distortion(self): (source) ¶

Undocumented

__distortion_factor = (source) ¶

Undocumented

__log_distortion_factor = (source) ¶

Undocumented