nltk.tag.hmm.HiddenMarkovModelTrainer

class documentation

class HiddenMarkovModelTrainer(object): (source)

Constructor: HiddenMarkovModelTrainer(states, symbols)

Algorithms for learning HMM parameters from training data. These include both supervised learning (MLE) and unsupervised learning (Baum-Welch).

Creates an HMM trainer to induce an HMM with the given states and output symbol alphabet. A supervised and unsupervised training method may be used. If either of the states or symbols are not given, these may be derived from supervised training.

Parameters
states	the set of state labels
symbols	the set of observation symbols

Method	`__init__`	Undocumented
Method	`train`	Trains the HMM using both (or either of) supervised and unsupervised techniques.
Method	`train_supervised`	Supervised training maximising the joint probability of the symbol and state sequences. This is done via collecting frequencies of transitions between states, symbol observations while within each state and which states start a sentence...
Method	`train_unsupervised`	Trains the HMM using the Baum-Welch algorithm to maximise the probability of the data sequence. This is a variant of the EM algorithm, and is unsupervised in that it doesn't need the state sequences for the symbols...
Method	`_baum_welch_step`	Undocumented
Instance Variable	`_states`	Undocumented
Instance Variable	`_symbols`	Undocumented

def __init__(self, states=None, symbols=None): (source) ¶

Undocumented

def train(self, labeled_sequences=None, unlabeled_sequences=None, **kwargs): (source) ¶

Trains the HMM using both (or either of) supervised and unsupervised techniques.

Parameters
labeled_sequences	Undocumented
unlabeled_sequences:list	the unsupervised training data, a set of sequences of observations ex: [ word_1, ..., word_n ]
labelled_sequences:list	the supervised training data, a set of labelled sequences of observations ex: [ (word_1, tag_1),...,(word_n,tag_n) ]
**kwargs	additional arguments to pass to the training methods
Returns
HiddenMarkovModelTagger	the trained model

def train_supervised(self, labelled_sequences, estimator=None): (source) ¶

Supervised training maximising the joint probability of the symbol and state sequences. This is done via collecting frequencies of transitions between states, symbol observations while within each state and which states start a sentence. These frequency distributions are then normalised into probability estimates, which can be smoothed if desired.

Parameters
labelled_sequences:list	the training data, a set of labelled sequences of observations
estimator	a function taking a FreqDist and a number of bins and returning a CProbDistI; otherwise a MLE estimate is used
Returns
HiddenMarkovModelTagger	the trained model

def train_unsupervised(self, unlabeled_sequences, update_outputs=True, **kwargs): (source) ¶

Trains the HMM using the Baum-Welch algorithm to maximise the probability of the data sequence. This is a variant of the EM algorithm, and is unsupervised in that it doesn't need the state sequences for the symbols. The code is based on 'A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition', Lawrence Rabiner, IEEE, 1989.

kwargs may include following parameters:

Parameters
unlabeled_sequences:list	the training data, a set of sequences of observations
update_outputs	Undocumented
model	a HiddenMarkovModelTagger instance used to begin the Baum-Welch algorithm
max_iterations	the maximum number of EM iterations
convergence_logprob	the maximum change in log probability to allow convergence
**kwargs	Undocumented
Returns
HiddenMarkovModelTagger	the trained model

def _baum_welch_step(self, sequence, model, symbol_to_number): (source) ¶

Undocumented

_states = (source) ¶

Undocumented

_symbols = (source) ¶

Undocumented