KMean e-book

< Previous | Next | Contents >

Read this tutorial off-line on any device. Click here to purchase the complete E-book of this tutorial

Weakness of K Means Algorithm

Similar to other algorithm, K-mean clustering has many weaknesses:

  • When the numbers of data are not so many, initial grouping will determine the cluster significantly.
  • The number of cluster, K, must be determined before hand.
  • We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few.
  • Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
  • We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight.
  • weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one.
  • The result is circular cluster shape because based on distance .

One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.

Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here . The key to use other type of dissimilarity is in the distance matrix.

Read this tutorial off-line on any device. Click here to purchase the complete E-book of this tutorial

Do you have question regarding this k means tutorial? Ask your question here

< Previous | Next | Contents >