class documentation

The Gaussian EM clusterer models the vectors as being produced by a mixture of k Gaussian sources. The parameters of these sources (prior probability, mean and covariance matrix) are then found to maximise the likelihood of the given data. This is done with the expectation maximisation algorithm. It starts with k arbitrarily chosen means, priors and covariance matrices. It then calculates the membership probabilities for each vector in each of the clusters; this is the 'E' step. The cluster parameters are then updated in the 'M' step using the maximum likelihood estimate from the cluster membership probabilities. This process continues until the likelihood of the data does not significantly increase.

Method __init__ Creates an EM clusterer with the given starting parameters, convergence threshold and vector mangling parameters.
Method __repr__ Undocumented
Method classify_vectorspace Returns the index of the appropriate cluster for the vector.
Method cluster_vectorspace Finds the clusters using the given set of vectors.
Method likelihood_vectorspace Returns the likelihood of the vector belonging to the cluster.
Method num_clusters Returns the number of clusters.
Method _gaussian Undocumented
Method _loglikelihood Undocumented
Instance Variable _bias Undocumented
Instance Variable _conv_threshold Undocumented
Instance Variable _covariance_matrices Undocumented
Instance Variable _means Undocumented
Instance Variable _num_clusters Undocumented
Instance Variable _priors Undocumented

Inherited from VectorSpaceClusterer:

Method classify Classifies the token into a cluster, setting the token's CLUSTER parameter to that cluster identifier.
Method cluster Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.
Method likelihood Returns the likelihood (a float) of the token having the corresponding cluster.
Method vector Returns the vector after normalisation and dimensionality reduction
Method _normalise Normalises the vector to unit length.
Instance Variable _should_normalise Undocumented
Instance Variable _svd_dimensions Undocumented
Instance Variable _Tt Undocumented

Inherited from ClusterI (via VectorSpaceClusterer):

Method classification_probdist Classifies the token into a cluster, returning a probability distribution over the cluster identifiers.
Method cluster_name Returns the names of the cluster at index.
Method cluster_names Returns the names of the clusters. :rtype: list
def __init__(self, initial_means, priors=None, covariance_matrices=None, conv_threshold=1e-06, bias=0.1, normalise=False, svd_dimensions=None): (source)

Creates an EM clusterer with the given starting parameters, convergence threshold and vector mangling parameters.

Parameters
initial_means:[seq of] numpy array or seq of SparseArraythe means of the gaussian cluster centers
priors:numpy array or seq of floatthe prior probability for each cluster
covariance_matrices:[seq of] numpy arraythe covariance matrix for each cluster
conv_threshold:int or floatmaximum change in likelihood before deemed convergent
bias:floatvariance bias used to ensure non-singular covariance matrices
normalise:booleanshould vectors be normalised to length 1
svd_dimensions:intnumber of dimensions to use in reducing vector dimensionsionality with SVD
def __repr__(self): (source)

Undocumented

def classify_vectorspace(self, vector): (source)

Returns the index of the appropriate cluster for the vector.

def cluster_vectorspace(self, vectors, trace=False): (source)

Finds the clusters using the given set of vectors.

def likelihood_vectorspace(self, vector, cluster): (source)

Returns the likelihood of the vector belonging to the cluster.

def num_clusters(self): (source)

Returns the number of clusters.

def _gaussian(self, mean, cvm, x): (source)

Undocumented

def _loglikelihood(self, vectors, priors, means, covariances): (source)

Undocumented

Undocumented

_conv_threshold = (source)

Undocumented

_covariance_matrices = (source)

Undocumented

Undocumented

_num_clusters = (source)

Undocumented

Undocumented