Weakness of K Means Algorithm
Similar to other algorithm, K-mean clustering has many weaknesses:
- When the numbers of data are not so many, initial grouping will determine the cluster significantly.
- The number of cluster, K, must be determined before hand.
- We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few.
- Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
- We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight.
- weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one.
The result is circular cluster shape because based on
One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.
Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true!
See how you can use multivariate data up to n dimensions (even mixed data type) here
. The key to use other type of dissimilarity is in the distance matrix.