<
Previous

Next

Contents
>
What is K Means Clustering?
K means clustering algorithm was developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking kmeans clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of Kmean clustering is to classify the data.
Example: Suppose we have 4 objects as your training data points and each object have 2 attributes. Each attribute represents coordinate of the object.
Object  Attribute 1 (X):weight index  Attribute 2 (Y): pH 
Medicine A 
1

1

Medicine B 
2

1

Medicine C 
4

3

Medicine D 
5

4

We also know before hand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster.
Click here for numerical example (manual calculation) of the kmean clustering.
See how the kmean algorithm works ( download code in VB )
For distinction between supervised learning and unsupervised learning, click here
.
Note: K means algorithm is one of the simplest partition clustering method. More advanced algorithms related to k means are Expected Maximization (EM) algorithm especially Gaussian Mixture, SelfOrganization Map (SOM) from Kohonen, Learning Vector Quantization (LVQ). To overcome weakness of k means, several algorithms had been proposed such as k medoids, fuzzy c mean and k mode. Check the resources of k means for further study.
Do you have question regarding this k means tutorial? Ask your question here