nltk.translate.ibm2.IBMModel2

class documentation

class IBMModel2(IBMModel): (source)

Constructor: IBMModel2(sentence_aligned_corpus, iterations, probability_tables)

Lexical translation model that considers word order

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))

>>> ibm2 = IBMModel2(bitext, 5)

>>> print(round(ibm2.translation_table['buch']['book'], 3))
1.0
>>> print(round(ibm2.translation_table['das']['book'], 3))
0.0
>>> print(round(ibm2.translation_table['buch'][None], 3))
0.0
>>> print(round(ibm2.translation_table['ja'][None], 3))
0.0

>>> print(ibm2.alignment_table[1][1][2][2])
0.938...
>>> print(round(ibm2.alignment_table[1][2][2][2], 3))
0.0
>>> print(round(ibm2.alignment_table[2][2][4][5], 3))
1.0

>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])

Method	`__init__`	Train on `sentence_aligned_corpus` and create a lexical translation model and an alignment model.
Method	`align`	Determines the best word alignment for one sentence pair from the corpus that the model was trained on.
Method	`align_all`	Undocumented
Method	`maximize_alignment_probabilities`	Undocumented
Method	`prob_alignment_point`	Probability that position j in `trg_sentence` is aligned to position i in the `src_sentence`
Method	`prob_all_alignments`	Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t
Method	`prob_t_a_given_s`	Probability of target sentence and an alignment given the source sentence
Method	`set_uniform_probabilities`	Undocumented
Method	`train`	Undocumented
Instance Variable	`alignment_table`	Undocumented
Instance Variable	`translation_table`	Undocumented

def __init__(self, sentence_aligned_corpus, iterations, probability_tables=None): (source) ¶

Train on sentence_aligned_corpus and create a lexical translation model and an alignment model.

Translation direction is from AlignedSent.mots to AlignedSent.words.

Parameters
sentence_aligned_corpus:list(AlignedSent)	Sentence-aligned parallel corpus
iterations:int	Number of iterations to run training algorithm
probability_tables:dict[str]: object	Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: `translation_table`, `alignment_table`. See `IBMModel` for the type and purpose of these tables.

def align(self, sentence_pair): (source) ¶

Determines the best word alignment for one sentence pair from the corpus that the model was trained on.

The best alignment will be set in sentence_pair when the method returns. In contrast with the internal implementation of IBM models, the word indices in the Alignment are zero- indexed, not one-indexed.

Parameters
sentence_pair:AlignedSent	A sentence in the source language and its counterpart sentence in the target language

def align_all(self, parallel_corpus): (source) ¶

Undocumented

def maximize_alignment_probabilities(self, counts): (source) ¶

Undocumented

def prob_alignment_point(self, i, j, src_sentence, trg_sentence): (source) ¶

Probability that position j in trg_sentence is aligned to position i in the src_sentence

def prob_all_alignments(self, src_sentence, trg_sentence): (source) ¶

Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t

Each entry in the return value represents the contribution to the total alignment probability by the target word t.

To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.

Returns
dict(str): float	Probability of t for all s in `src_sentence`

def prob_t_a_given_s(self, alignment_info): (source) ¶

Probability of target sentence and an alignment given the source sentence

def set_uniform_probabilities(self, sentence_aligned_corpus): (source) ¶

Undocumented

def train(self, parallel_corpus): (source) ¶

Undocumented

alignment_table = (source) ¶

Undocumented

translation_table = (source) ¶

Undocumented