Kardi Teknomo
Kardi Teknomo Kardi Teknomo Kardi Teknomo
 
Research
Publications
Tutorials
Resume
Personal
Resources
Contact

What is K-Mean Clustering?

 

<Contents | Previous | Next>

K means clustering algorithm was developed by J. MacQueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data.

Example: Suppose we have 4 objects as your training data points and each object have 2 attributes. Each attribute represents coordinate of the object.

Object Attribute 1 (X):weight index Attribute 2 (Y): pH
Medicine A
1
1
Medicine B
2
1
Medicine C
4
3
Medicine D
5
4

We also know before hand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster.

Click here for numerical example (manual calculation) of the k-mean clustering.

See how the k-mean algorithm works (download free code in VB)

For distinction between supervised learning and unsupervised learning, click here.

Note: K means algorithm is one of the simplest partition clustering method. More advanced algorithms related to k means are Expected Maximization (EM) algorithm especially Gaussian Mixture, Self-Organization Map (SOM) from Kohonen, Learning Vector Quantization (LVQ). To overcome weakness of k means, several algorithms had been proposed such as k medoids, fuzzy c mean and k mode. Check the resources of k means for further study.

<Contents | Previous | Next>

 
 

 
 
© 2006 Kardi Teknomo. All Rights Reserved.
Designed by CNV Media