module documentation

Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence. These models are finite state machines characterised by a number of states, transitions between these states, and output symbols emitted while in each state. The HMM is an extension to the Markov chain, where each state corresponds deterministically to a given event. In the HMM the observation is a probabilistic function of the state. HMMs share the Markov chain's assumption, being that the probability of transition from one state to another only depends on the current state - i.e. the series of states that led to the current state are not used. They are also time invariant.

The HMM is a directed graph, with probability weighted edges (representing the probability of a transition between the source and sink states) where each vertex emits an output symbol when entered. The symbol (or observation) is non-deterministically generated. For this reason, knowing that a sequence of output observations was generated by a given HMM does not mean that the corresponding sequence of states (and what the current state is) is known. This is the 'hidden' in the hidden markov model.

Formally, a HMM can be characterised by:

  • the output observation alphabet. This is the set of symbols which may be observed as output of the system.
  • the set of states.
  • the transition probabilities a_{ij} = P(s_t = j | s_{t-1} = i). These represent the probability of transition to each state from a given state.
  • the output probability matrix b_i(k) = P(X_t = o_k | s_t = i). These represent the probability of observing each symbol in a given state.
  • the initial state distribution. This gives the probability of starting in each state.

To ground this discussion, take a common NLP application, part-of-speech (POS) tagging. An HMM is desirable for this task as the highest probability tag sequence can be calculated for a given sequence of word forms. This differs from other tagging techniques which often tag each word individually, seeking to optimise each individual tagging greedily without regard to the optimal combination of tags for a larger unit, such as a sentence. The HMM does this with the Viterbi algorithm, which efficiently computes the optimal path through the graph given the sequence of words forms.

In POS tagging the states usually have a 1:1 correspondence with the tag alphabet - i.e. each state represents a single tag. The output observation alphabet is the set of word forms (the lexicon), and the remaining three parameters are derived by a training regime. With this information the probability of a given sentence can be easily derived, by simply summing the probability of each distinct path through the model. Similarly, the highest probability tagging sequence can be derived with the Viterbi algorithm, yielding a state sequence which can be mapped into a tag sequence.

This discussion assumes that the HMM has been trained. This is probably the most difficult task with the model, and requires either MLE estimates of the parameters or unsupervised learning using the Baum-Welch algorithm, a variant of EM.

For more information, please consult the source code for this module, which includes extensive demonstration code.

Class HiddenMarkovModelTagger Hidden Markov model class, a generative model for labelling sequence data. These models define the joint probability of a sequence of symbols and their labels (state transitions) as the product of the starting state probability, the probability of each state transition, and the probability of each observation being generated from each state...
Class HiddenMarkovModelTrainer Algorithms for learning HMM parameters from training data. These include both supervised learning (MLE) and unsupervised learning (Baum-Welch).
Function demo Undocumented
Function demo_bw Undocumented
Function demo_pos Undocumented
Function demo_pos_bw Undocumented
Function load_pos Undocumented
Function logsumexp2 Undocumented
Function _create_hmm_tagger Undocumented
Function _identity Undocumented
Function _log_add Adds the logged values, returning the logarithm of the addition.
Function _market_hmm_example Return an example HMM (described at page 381, Huang et al)
Function _ninf_array Undocumented
Function _untag Undocumented
Constant _TAG Undocumented
Constant _TEXT Undocumented
def demo(): (source)

Undocumented

def demo_bw(): (source)

Undocumented

def demo_pos(): (source)

Undocumented

def demo_pos_bw(test=10, supervised=20, unsupervised=10, verbose=True, max_iterations=5): (source)

Undocumented

def load_pos(num_sents): (source)

Undocumented

def logsumexp2(arr): (source)

Undocumented

def _create_hmm_tagger(states, symbols, A, B, pi): (source)

Undocumented

def _identity(labeled_symbols): (source)

Undocumented

def _log_add(*values): (source)

Adds the logged values, returning the logarithm of the addition.

def _market_hmm_example(): (source)

Return an example HMM (described at page 381, Huang et al)

def _ninf_array(shape): (source)

Undocumented

def _untag(sentences): (source)

Undocumented

_TAG: int = (source)

Undocumented

Value
1
_TEXT: int = (source)

Undocumented

Value
0