class IBMModel2(IBMModel): (source)
Constructor: IBMModel2(sentence_aligned_corpus, iterations, probability_tables)
Lexical translation model that considers word order
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm2 = IBMModel2(bitext, 5)
>>> print(round(ibm2.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm2.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm2.translation_table['buch'][None], 3)) 0.0 >>> print(round(ibm2.translation_table['ja'][None], 3)) 0.0
>>> print(ibm2.alignment_table[1][1][2][2]) 0.938... >>> print(round(ibm2.alignment_table[1][2][2][2], 3)) 0.0 >>> print(round(ibm2.alignment_table[2][2][4][5], 3)) 1.0
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
Method | __init__ |
Train on sentence_aligned_corpus and create a lexical translation model and an alignment model. |
Method | align |
Determines the best word alignment for one sentence pair from the corpus that the model was trained on. |
Method | align |
Undocumented |
Method | maximize |
Undocumented |
Method | prob |
Probability that position j in trg_sentence is aligned to position i in the src_sentence |
Method | prob |
Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t |
Method | prob |
Probability of target sentence and an alignment given the source sentence |
Method | set |
Undocumented |
Method | train |
Undocumented |
Instance Variable | alignment |
Undocumented |
Instance Variable | translation |
Undocumented |
Train on sentence_aligned_corpus and create a lexical translation model and an alignment model.
Translation direction is from AlignedSent.mots to AlignedSent.words.
Parameters | |
sentence | Sentence-aligned parallel corpus |
iterations:int | Number of iterations to run training algorithm |
probability | Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table. See IBMModel for the type and purpose of these tables. |
Determines the best word alignment for one sentence pair from the corpus that the model was trained on.
The best alignment will be set in sentence_pair when the method returns. In contrast with the internal implementation of IBM models, the word indices in the Alignment are zero- indexed, not one-indexed.
Parameters | |
sentence | A sentence in the source language and its counterpart sentence in the target language |
Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t
Each entry in the return value represents the contribution to the total alignment probability by the target word t.
To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.
Returns | |
dict(str): float | Probability of t for all s in src_sentence |