The number of features is limited to two only but you
may extent it to any number of features by yourself. The main code is
shown below. You may also see the screen shot
of the program. How it works is explained in the previous
page.
When User click picture box to input new data (X, Y),
the program will make group/cluster the data by minimizing the sum of
squares of Euclidean distances between data and the corresponding cluster centroid.
Each dot is representing an object and the coordinate (X, Y) represents
two attributes of the object. The colors of the dot and label number
represent the cluster. You may try how the cluster may change when additional
data is inputted.
Sub kMeanCluster (Data() As
Variant, numCluster As Integer)
' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)
Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()
Dim isStillMoving As Boolean
isStillMoving = True
If totalData <= numCluster Then
'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData
' cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y
Else
'calculate minimum distance to
assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numCluster
d = dist(X, Y, Centroid(1, i),
Centroid(2, i))
If d < min Then
min = d
cluster = i
End If
Next i
Data(0, totalData) = cluster
Do While isStillMoving
' this loop will surely convergent