class documentation
class IBMModel3(IBMModel): (source)
Constructor: IBMModel3(sentence_aligned_corpus, iterations, probability_tables)
Translation model that considers how a word can be aligned to multiple words in another language
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> ibm3 = IBMModel3(bitext, 5)
>>> print(round(ibm3.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm3.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm3.translation_table['ja'][None], 3)) 1.0
>>> print(round(ibm3.distortion_table[1][1][2][2], 3)) 1.0 >>> print(round(ibm3.distortion_table[1][2][2][2], 3)) 0.0 >>> print(round(ibm3.distortion_table[2][2][4][5], 3)) 0.75
>>> print(round(ibm3.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm3.fertility_table[1]['book'], 3)) 1.0
>>> print(ibm3.p1) 0.054...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
Method | __init__ |
Train on sentence_aligned_corpus and create a lexical translation model, a distortion model, a fertility model, and a model for generating NULL-aligned words. |
Method | maximize |
Undocumented |
Method | prob |
Probability of target sentence and an alignment given the source sentence |
Method | reset |
Undocumented |
Method | set |
Undocumented |
Method | train |
Undocumented |
Instance Variable | alignment |
Undocumented |
Instance Variable | distortion |
dict[int][int][int][int]: float. Probability(j | i,l,m). Values accessed as distortion_table[j][i][l][m]. |
Instance Variable | fertility |
Undocumented |
Instance Variable | p1 |
Undocumented |
Instance Variable | translation |
Undocumented |
Train on sentence_aligned_corpus and create a lexical translation model, a distortion model, a fertility model, and a model for generating NULL-aligned words.
Translation direction is from AlignedSent.mots to AlignedSent.words.
Parameters | |
sentence | Sentence-aligned parallel corpus |
iterations:int | Number of iterations to run training algorithm |
probability | Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table, fertility_table, p1, distortion_table. See IBMModel for the type and purpose of these tables. |