class documentation
class StackDecoder(object): (source)
Constructor: StackDecoder(phrase_table, language_model)
Phrase-based stack decoder for machine translation
>>> from nltk.translate import PhraseTable >>> phrase_table = PhraseTable() >>> phrase_table.add(('niemand',), ('nobody',), log(0.8)) >>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2)) >>> phrase_table.add(('erwartet',), ('expects',), log(0.8)) >>> phrase_table.add(('erwartet',), ('expecting',), log(0.2)) >>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1)) >>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8)) >>> phrase_table.add(('!',), ('!',), log(0.8))
>>> # nltk.model should be used here once it is implemented >>> from collections import defaultdict >>> language_prob = defaultdict(lambda: -999.0) >>> language_prob[('nobody',)] = log(0.5) >>> language_prob[('expects',)] = log(0.4) >>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2) >>> language_prob[('!',)] = log(0.1) >>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()
>>> stack_decoder = StackDecoder(phrase_table, language_model)
>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!']) ['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']
Static Method | valid |
Extract phrases from all_phrases_from that contains words that have not been translated by hypothesis |
Method | __init__ |
No summary |
Method | compute |
Determines the approximate scores for translating every subsequence in src_sentence |
Method | distortion |
Undocumented |
Method | distortion |
Undocumented |
Method | expansion |
Calculate the score of expanding hypothesis with translation_option |
Method | find |
Finds all subsequences in src_sentence that have a phrase translation in the translation table |
Method | future |
Determines the approximate score for translating the untranslated words in hypothesis |
Method | translate |
No summary |
Instance Variable | beam |
hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0. |
Instance Variable | language |
Undocumented |
Instance Variable | phrase |
Undocumented |
Instance Variable | stack |
Higher values increase the likelihood of a good translation, but increases processing time. |
Instance Variable | word |
If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied. |
Property | distortion |
Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5. |
Method | __compute |
Undocumented |
Instance Variable | __distortion |
Undocumented |
Instance Variable | __log |
Undocumented |
Extract phrases from all_phrases_from that contains words that have not been translated by hypothesis
Parameters | |
all | Phrases represented by their spans, in the same format as the return value of find_all_src_phrases |
hypothesis:_Hypothesis | Undocumented |
Returns | |
list(tuple(int, int)) | A list of phrases, represented by their spans, that cover untranslated positions. |
Parameters | |
phrase | Table of translations for source language phrases and the log probabilities for those translations. |
language | Target language model. Must define a probability_change method that calculates the change in log probability of a sentence, if a given string is appended to it. This interface is experimental and will likely be replaced with nltk.model once it is implemented. |
Determines the approximate scores for translating every subsequence in src_sentence
Future scores can be used a look-ahead to determine the difficulty of translating the remaining parts of a src_sentence.
end positions. For example, result[2][5] is the score of the subsequence covering positions 2, 3, and 4. :rtype: dict(int: (dict(int): float))
Parameters | |
src | Undocumented |
Returns | |
Scores of subsequences referenced by their start and |
Calculate the score of expanding hypothesis with translation_option
Parameters | |
hypothesis:_Hypothesis | Hypothesis being expanded |
translation | Information about the proposed expansion |
src | Word position span of the source phrase |
Finds all subsequences in src_sentence that have a phrase translation in the translation table
Parameters | |
src | Undocumented |
Returns | |
list(list(int)) | Subsequences that have a phrase translation, represented as a table of lists of end positions. For example, if result[2] is [5, 6, 9], then there are three phrases starting from position 2 in src_sentence, ending at positions 5, 6, and 9 exclusive. The list of ending positions are in ascending order. |
- float: Hypotheses that score below this factor of the best
- hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.
- int: Maximum number of hypotheses to consider in a stack.
- Higher values increase the likelihood of a good translation, but increases processing time.
- float: Influences the translation length exponentially.
- If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.