KMean e-book

< Previous | Next | Contents >

Read this tutorial off-line on any device. Click here to purchase the complete E-book of this tutorial

What is K Means Clustering?

K means clustering algorithm was developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data.

Example: Suppose we have 4 objects as your training data points and each object have 2 attributes. Each attribute represents coordinate of the object.

Object Attribute 1 (X):weight index Attribute 2 (Y): pH
Medicine A
1
1
Medicine B
2
1
Medicine C
4
3
Medicine D
5
4

We also know before hand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster.

Click here for numerical example (manual calculation) of the k-mean clustering.

See how the k-mean algorithm works ( download code in VB )

For distinction between supervised learning and unsupervised learning, click here .

Note: K means algorithm is one of the simplest partition clustering method. More advanced algorithms related to k means are Expected Maximization (EM) algorithm especially Gaussian Mixture, Self-Organization Map (SOM) from Kohonen, Learning Vector Quantization (LVQ). To overcome weakness of k means, several algorithms had been proposed such as k medoids, fuzzy c mean and k mode. Check the resources of k means for further study.

Read this tutorial off-line on any device. Click here to purchase the complete E-book of this tutorial

Do you have question regarding this k means tutorial? Ask your question here

< Previous | Next | Contents >