nltk.translate.ibm3.IBMModel3

class documentation

class IBMModel3(IBMModel): (source)

Constructor: IBMModel3(sentence_aligned_corpus, iterations, probability_tables)

Translation model that considers how a word can be aligned to multiple words in another language

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))

>>> ibm3 = IBMModel3(bitext, 5)

>>> print(round(ibm3.translation_table['buch']['book'], 3))
1.0
>>> print(round(ibm3.translation_table['das']['book'], 3))
0.0
>>> print(round(ibm3.translation_table['ja'][None], 3))
1.0

>>> print(round(ibm3.distortion_table[1][1][2][2], 3))
1.0
>>> print(round(ibm3.distortion_table[1][2][2][2], 3))
0.0
>>> print(round(ibm3.distortion_table[2][2][4][5], 3))
0.75

>>> print(round(ibm3.fertility_table[2]['summarize'], 3))
1.0
>>> print(round(ibm3.fertility_table[1]['book'], 3))
1.0

>>> print(ibm3.p1)
0.054...

>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])

Method	`__init__`	Train on `sentence_aligned_corpus` and create a lexical translation model, a distortion model, a fertility model, and a model for generating NULL-aligned words.
Method	`maximize_distortion_probabilities`	Undocumented
Method	`prob_t_a_given_s`	Probability of target sentence and an alignment given the source sentence
Method	`reset_probabilities`	Undocumented
Method	`set_uniform_probabilities`	Undocumented
Method	`train`	Undocumented
Instance Variable	`alignment_table`	Undocumented
Instance Variable	`distortion_table`	dict[int][int][int][int]: float. Probability(j \| i,l,m). Values accessed as `distortion_table[j][i][l][m]`.
Instance Variable	`fertility_table`	Undocumented
Instance Variable	`p1`	Undocumented
Instance Variable	`translation_table`	Undocumented

def __init__(self, sentence_aligned_corpus, iterations, probability_tables=None): (source) ¶

Train on sentence_aligned_corpus and create a lexical translation model, a distortion model, a fertility model, and a model for generating NULL-aligned words.

Translation direction is from AlignedSent.mots to AlignedSent.words.

Parameters
sentence_aligned_corpus:list(AlignedSent)	Sentence-aligned parallel corpus
iterations:int	Number of iterations to run training algorithm
probability_tables:dict[str]: object	Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: `translation_table`, `alignment_table`, `fertility_table`, `p1`, `distortion_table`. See `IBMModel` for the type and purpose of these tables.