< Previous | Next | Contents >
Weakness of K Means Algorithm
Similar to other algorithm, K-mean clustering has many weaknesses:
- When the numbers of data are not so many, initial grouping will determine the cluster significantly.
- The number of cluster, K, must be determined before hand.
- We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few.
- Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
- We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight.
- weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one.
-
The result is circular cluster shape because based on
distance
.
One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.
Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true!
See how you can use multivariate data up to n dimensions (even mixed data type) here
. The key to use other type of dissimilarity is in the distance matrix.
Do you have question regarding this k means tutorial? Ask your question here