class documentation
class IBMModel4(IBMModel): (source)
Constructor: IBMModel4(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables)
Translation model that reorders output words based on their type and their distance from other related words in the output sentence
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize'])) >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 } >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm4.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm4.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm4.translation_table['ja'][None], 3)) 1.0
>>> print(round(ibm4.head_distortion_table[1][0][1], 3)) 1.0 >>> print(round(ibm4.head_distortion_table[2][0][1], 3)) 0.0 >>> print(round(ibm4.non_head_distortion_table[3][6], 3)) 0.5
>>> print(round(ibm4.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm4.fertility_table[1]['book'], 3)) 1.0
>>> print(ibm4.p1) 0.033...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
Static Method | model4 |
Undocumented |
Method | __init__ |
Train on sentence_aligned_corpus and create a lexical translation model, distortion models, a fertility model, and a model for generating NULL-aligned words. |
Method | maximize |
Undocumented |
Method | prob |
Probability of target sentence and an alignment given the source sentence |
Method | reset |
Undocumented |
Method | set |
Set distortion probabilities uniformly to 1 / cardinality of displacement values |
Method | train |
Undocumented |
Instance Variable | alignment |
Undocumented |
Instance Variable | fertility |
Undocumented |
Instance Variable | head |
dict[int][int][int]: float. Probability(displacement of head word | word class of previous cept,target word class). Values accessed as distortion_table[dj][src_class][trg_class]. |
Instance Variable | non |
dict[int][int]: float. Probability(displacement of non-head word | target word class). Values accessed as distortion_table[dj][trg_class]. |
Instance Variable | p1 |
Undocumented |
Instance Variable | src |
Undocumented |
Instance Variable | translation |
Undocumented |
Instance Variable | trg |
Undocumented |
def __init__(self, sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None):
(source)
¶
Train on sentence_aligned_corpus and create a lexical translation model, distortion models, a fertility model, and a model for generating NULL-aligned words.
Translation direction is from AlignedSent.mots to AlignedSent.words.
Parameters | |
sentence | Sentence-aligned parallel corpus |
iterations:int | Number of iterations to run training algorithm |
source | Lookup table that maps a source word to its word class, the latter represented by an integer id |
target | Lookup table that maps a target word to its word class, the latter represented by an integer id |
probability | Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table, fertility_table, p1, head_distortion_table, non_head_distortion_table. See IBMModel and IBMModel4 for the type and purpose of these tables. |
dict[int][int][int]: float. Probability(displacement of head word | word class of previous cept,target word class). Values accessed as distortion_table[dj][src_class][trg_class].