nltk.translate.ibm_model.IBMModel

class documentation

class IBMModel(object): (source)

Constructor: IBMModel(sentence_aligned_corpus)

Abstract base class for all IBM models

Method	`__init__`	Undocumented
Method	`best_model2_alignment`	Finds the best alignment according to IBM Model 2
Method	`hillclimb`	Starting from the alignment in `alignment_info`, look at neighboring alignments iteratively for the best one
Method	`init_vocab`	Undocumented
Method	`maximize_fertility_probabilities`	Undocumented
Method	`maximize_lexical_translation_probabilities`	Undocumented
Method	`maximize_null_generation_probabilities`	Undocumented
Method	`neighboring`	Determine the neighbors of `alignment_info`, obtained by moving or swapping one alignment point
Method	`prob_of_alignments`	Undocumented
Method	`prob_t_a_given_s`	Probability of target sentence and an alignment given the source sentence
Method	`reset_probabilities`	Undocumented
Method	`sample`	Sample the most probable alignments from the entire alignment space
Method	`set_uniform_probabilities`	Initialize probability tables to a uniform distribution
Constant	`MIN_PROB`	Undocumented
Instance Variable	`alignment_table`	dict[int][int][int][int]: float. Probability(i \| j,l,m). Values accessed as `alignment_table[i][j][l][m]`. Used in model 2 and hill climbing in models 3 and above
Instance Variable	`fertility_table`	dict[int][str]: float. Probability(fertility \| source word). Values accessed as `fertility_table[fertility][source_word]`. Used in model 3 and higher.
Instance Variable	`p1`	Probability that a generated word requires another target word that is aligned to NULL. Used in model 3 and higher.
Instance Variable	`src_vocab`	set(str): All source language words used in training
Instance Variable	`translation_table`	dict[str][str]: float. Probability(target word \| source word). Values accessed as `translation_table[target_word][source_word]`.
Instance Variable	`trg_vocab`	set(str): All target language words used in training

def __init__(self, sentence_aligned_corpus): (source) ¶

Undocumented

def best_model2_alignment(self, sentence_pair, j_pegged=None, i_pegged=0): (source) ¶

Finds the best alignment according to IBM Model 2

Used as a starting point for hill climbing in Models 3 and above, because it is easier to compute than the best alignments in higher models

Parameters
sentence_pair:AlignedSent	Source and target language sentence pair to be word-aligned
j_pegged:int	If specified, the alignment point of j_pegged will be fixed to i_pegged
i_pegged:int	Alignment point to j_pegged

def hillclimb(self, alignment_info, j_pegged=None): (source) ¶

Starting from the alignment in alignment_info, look at neighboring alignments iteratively for the best one

There is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.

Parameters
alignment_info	Undocumented
j_pegged:int	If specified, the search will be constrained to alignments where `j_pegged` remains unchanged
Returns
AlignmentInfo	The best alignment found from hill climbing

def init_vocab(self, sentence_aligned_corpus): (source) ¶

Undocumented

def maximize_fertility_probabilities(self, counts): (source) ¶

Undocumented

def maximize_lexical_translation_probabilities(self, counts): (source) ¶

Undocumented

def maximize_null_generation_probabilities(self, counts): (source) ¶

Undocumented

def neighboring(self, alignment_info, j_pegged=None): (source) ¶

Determine the neighbors of alignment_info, obtained by moving or swapping one alignment point

Parameters
alignment_info	Undocumented
j_pegged:int	If specified, neighbors that have a different alignment point from j_pegged will not be considered
Returns
set(AlignmentInfo)	A set neighboring alignments represented by their `AlignmentInfo`

def prob_of_alignments(self, alignments): (source) ¶

Undocumented

def prob_t_a_given_s(self, alignment_info): (source) ¶

Probability of target sentence and an alignment given the source sentence

All required information is assumed to be in alignment_info and self.

Derived classes should override this method

def reset_probabilities(self): (source) ¶

Undocumented

def sample(self, sentence_pair): (source) ¶

Sample the most probable alignments from the entire alignment space

First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a higher IBM Model. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point.

Hill climbing may be stuck in a local maxima, hence the pegging and trying out of different alignments.

Parameters
sentence_pair:AlignedSent	Source and target language sentence pair to generate a sample of alignments from
Returns
set(AlignmentInfo), AlignmentInfo	A set of best alignments represented by their `AlignmentInfo` and the best alignment of the set for convenience

def set_uniform_probabilities(self, sentence_aligned_corpus): (source) ¶

Initialize probability tables to a uniform distribution

Derived classes should implement this accordingly.

MIN_PROB: float = (source) ¶

Undocumented

Value

1e-12

alignment_table = (source) ¶

dict[int][int][int][int]: float. Probability(i | j,l,m). Values accessed as alignment_table[i][j][l][m]. Used in model 2 and hill climbing in models 3 and above

fertility_table = (source) ¶

dict[int][str]: float. Probability(fertility | source word). Values accessed as fertility_table[fertility][source_word]. Used in model 3 and higher.

p1 = (source) ¶

Probability that a generated word requires another target word that is aligned to NULL. Used in model 3 and higher.

src_vocab = (source) ¶

set(str): All source language words used in training

translation_table = (source) ¶

dict[str][str]: float. Probability(target word | source word). Values accessed as translation_table[target_word][source_word].

trg_vocab = (source) ¶

set(str): All target language words used in training