<
Previous
|
Next
|
Contents
>
What is K Means Clustering?
K means clustering algorithm was developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data.
Example: Suppose we have 4 objects as your training data points and each object have 2 attributes. Each attribute represents coordinate of the object.
Object | Attribute 1 (X):weight index | Attribute 2 (Y): pH |
Medicine A |
1
|
1
|
Medicine B |
2
|
1
|
Medicine C |
4
|
3
|
Medicine D |
5
|
4
|
We also know before hand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster.
Click here for numerical example (manual calculation) of the k-mean clustering.
See how the k-mean algorithm works ( download code in VB )
For distinction between supervised learning and unsupervised learning, click here
.
Note: K means algorithm is one of the simplest partition clustering method. More advanced algorithms related to k means are Expected Maximization (EM) algorithm especially Gaussian Mixture, Self-Organization Map (SOM) from Kohonen, Learning Vector Quantization (LVQ). To overcome weakness of k means, several algorithms had been proposed such as k medoids, fuzzy c mean and k mode. Check the resources of k means for further study.
Do you have question regarding this k means tutorial? Ask your question here