class documentation

The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c.

This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process.

Method __init__ No summary
Method __repr__ Undocumented
Method classify_vectorspace Returns the index of the appropriate cluster for the vector.
Method cluster Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.
Method cluster_vectorspace Finds the clusters using the given set of vectors.
Method dendrogram No summary
Method num_clusters Returns the number of clusters.
Method update_clusters Undocumented
Method _merge_similarities Undocumented
Instance Variable _centroids Undocumented
Instance Variable _dendrogram Undocumented
Instance Variable _groups_values Undocumented
Instance Variable _num_clusters Undocumented

Inherited from VectorSpaceClusterer:

Method classify Classifies the token into a cluster, setting the token's CLUSTER parameter to that cluster identifier.
Method likelihood Returns the likelihood (a float) of the token having the corresponding cluster.
Method likelihood_vectorspace Returns the likelihood of the vector belonging to the cluster.
Method vector Returns the vector after normalisation and dimensionality reduction
Method _normalise Normalises the vector to unit length.
Instance Variable _should_normalise Undocumented
Instance Variable _svd_dimensions Undocumented
Instance Variable _Tt Undocumented

Inherited from ClusterI (via VectorSpaceClusterer):

Method classification_probdist Classifies the token into a cluster, returning a probability distribution over the cluster identifiers.
Method cluster_name Returns the names of the cluster at index.
Method cluster_names Returns the names of the clusters. :rtype: list
def __init__(self, num_clusters=1, normalise=True, svd_dimensions=None): (source)
Parameters
num_clustersUndocumented
normalise:booleanshould vectors be normalised to length 1
svd_dimensions:intnumber of dimensions to use in reducing vector dimensionsionality with SVD
def __repr__(self): (source)

Undocumented

def classify_vectorspace(self, vector): (source)

Returns the index of the appropriate cluster for the vector.

def cluster(self, vectors, assign_clusters=False, trace=False): (source)

Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.

def cluster_vectorspace(self, vectors, trace=False): (source)

Finds the clusters using the given set of vectors.

def dendrogram(self): (source)
Returns
DendrogramThe dendrogram representing the current clustering
def num_clusters(self): (source)

Returns the number of clusters.

def update_clusters(self, num_clusters): (source)

Undocumented

def _merge_similarities(self, dist, cluster_len, i, j): (source)

Undocumented

_centroids: list = (source)

Undocumented

_dendrogram = (source)

Undocumented

_groups_values = (source)

Undocumented

_num_clusters = (source)

Undocumented