class GAAClusterer(VectorSpaceClusterer): (source)
Constructor: GAAClusterer(num_clusters, normalise, svd_dimensions)
The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c.
This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process.
Method | __init__ |
No summary |
Method | __repr__ |
Undocumented |
Method | classify |
Returns the index of the appropriate cluster for the vector. |
Method | cluster |
Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector. |
Method | cluster |
Finds the clusters using the given set of vectors. |
Method | dendrogram |
No summary |
Method | num |
Returns the number of clusters. |
Method | update |
Undocumented |
Method | _merge |
Undocumented |
Instance Variable | _centroids |
Undocumented |
Instance Variable | _dendrogram |
Undocumented |
Instance Variable | _groups |
Undocumented |
Instance Variable | _num |
Undocumented |
Inherited from VectorSpaceClusterer
:
Method | classify |
Classifies the token into a cluster, setting the token's CLUSTER parameter to that cluster identifier. |
Method | likelihood |
Returns the likelihood (a float) of the token having the corresponding cluster. |
Method | likelihood |
Returns the likelihood of the vector belonging to the cluster. |
Method | vector |
Returns the vector after normalisation and dimensionality reduction |
Method | _normalise |
Normalises the vector to unit length. |
Instance Variable | _should |
Undocumented |
Instance Variable | _svd |
Undocumented |
Instance Variable | _ |
Undocumented |
Inherited from ClusterI
(via VectorSpaceClusterer
):
Method | classification |
Classifies the token into a cluster, returning a probability distribution over the cluster identifiers. |
Method | cluster |
Returns the names of the cluster at index. |
Method | cluster |
Returns the names of the clusters. :rtype: list |
Parameters | |
num | Undocumented |
normalise:boolean | should vectors be normalised to length 1 |
svd | number of dimensions to use in reducing vector dimensionsionality with SVD |
Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.