By Kardi Teknomo, PhD.

clustering

< Previous | Next | Content >

Click here to purchase the complete E-book of this tutorial

Numerical Example of Hierarchical Clustering

Minimum distance clustering is also called as single linkage hierarchical clustering or nearest neighbor clustering. Distance between two clusters is defined by the minimum distance between objects of the two clusters, as shown below.

hierarchical clustering

For example, we have given an input distance matrix of size 6 by 6. This distance matrix was calculated based on the object features as explained in the previous section .

hierarchical clustering

We have 6 objects and we put each object into one cluster (analogue to put a ball into a basket). Instead of calling them as objects , now we call them clusters . Thus, in the beginning we have 6 clusters. Our goal is to group those 6 clusters such that at the end of the iterations, we will have only single cluster consists of the whole six original objects.

In each step of the iteration, we find the closest pair clusters. In this case, the closest cluster is between cluster F and D with shortest distance of 0.5. Thus, we group cluster D and F into cluster (D, F). Then we update the distance matrix (see distance matrix below). Distance between ungrouped clusters will not change from the original distance matrix. Now the problem is how to calculate distance between newly grouped clusters (D, F) and other clusters?

hierarchical clustering

That is exactly where the linkage rule comes into effect. Using single linkage, we specify minimum distance between original objects of the two clusters.

Using the input distance matrix, distance between cluster (D, F) and cluster A is computed as

hierarchical clustering

Distance between cluster (D, F) and cluster B is

hierarchical clustering

Similarly, distance between cluster (D, F) and cluster C is

hierarchical clustering

Finally, distance between cluster E and cluster (D, F) is calculated as

hierarchical clustering

Then, the updated distance matrix becomes

hierarchical clustering

Looking at the lower triangular updated distance matrix, we found out that the closest distance between cluster B and cluster A is now 0.71. Thus, we group cluster A and cluster B into a single cluster name (A, B).

Now we update the distance matrix. Aside from the first row and first column, all the other elements of the new distance matrix are not changed.

hierarchical clustering

Using the input distance matrix (size 6 by 6), distance between cluster C and cluster (D, F) is computed as hierarchical clustering

Distance between cluster (D, F) and cluster (A, B) is the minimum distance between all objects involves in the two clusters

hierarchical clustering

Similarly, distance between cluster E and (A, B) is

hierarchical clustering

Then the updated distance matrix is

hierarchical clustering

Observing the lower triangular of the updated distance matrix, we can see that the closest distance between clusters happens between cluster E and (D, F) at distance 1.00. Thus, we cluster them together into cluster ((D, F), E ).

The updated distance matrix is given below.

hierarchical clustering

Distance between cluster ((D, F), E) and cluster (A, B) is calculated as

hierarchical clustering

Distance between cluster ((D, F), E) and cluster C yields the minimum distance of 1.41. This distance is computed as hierarchical clustering . After that, we merge cluster ((D, F), E) and cluster C into a new cluster name (((D, F), E), C).

The updated distance matrix is shown in the figure below

hierarchical clustering

The minimum distance of 2.5 is the result of the following computation

hierarchical clustering

hierarchical clustering

Now if we merge the remaining two clusters, we will get only single cluster contain the whole 6 objects. Thus, our computation is finished. We summarized the results of computation as follow:

  1. In the beginning we have 6 clusters: A, B, C, D, E and F
  2. We merge cluster D and F into cluster (D, F) at distance 0.50
  3. We merge cluster A and cluster B into (A, B) at distance 0.71
  4. We merge cluster E and (D, F) into ((D, F), E) at distance 1.00
  5. We merge cluster ((D, F), E) and C into (((D, F), E), C) at distance 1.41
  6. We merge cluster (((D, F), E), C) and (A, B) into ((((D, F), E), C), (A, B)) at distance 2.50
  7. The last cluster contain all the objects, thus conclude the computation

Using this information, we can now draw the final results of a dendogram. The dendogram is drawn based on the distances to merge the clusters above.

hierarchical clustering

The hierarchy is given as (((D, F), E),C), (A,B). We can also plot the clustering hierarchy into XY space

hierarchical clustering hierarchical clustering

Click here to purchase the complete E-book of this tutorial

Do you have question regarding this Clustering tutorial? Ask your question here


< Previous | Next | Content >

This tutorial is copyrighted .

Preferable reference for this tutorial is

Teknomo, Kardi. (2009) Hierarchical Clustering Tutorial.http://people.revoledu.com/kardi/tutorial/clustering/