By Kardi Teknomo, PhD.
<
Previous
|
Next
|
Contents
>
K Means Code
As an example, I have made a Visual Basic code. You may download the complete updated code plus this tutorial in here
The number of features is limited to two only but you may extent it to any number of features by yourself. The main code is shown below. You may also see the screen shot of the program. How it works is explained in the previous page.
When User click picture box to input new data (X, Y), the program will make group/cluster the data by minimizing the sum of squares of Euclidean distances between data and the corresponding cluster centroid. Each dot is representing an object and the coordinate (X, Y) represents two attributes of the object. The colors of the dot and label number represent the cluster. You may try how the cluster may change when additional data is inputted.
Sub kMeanCluster (Data() As
Variant, numCluster As Integer)
' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)
Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()
Dim isStillMoving As Boolean
isStillMoving = True
If totalData <= numCluster ThenElse'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y' calculate minimum distance to assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numClusterd = dist(X, Y, Centroid(1, i), Centroid(2, i))
If d < min Then
min = d
cluster = iEnd If
Next i
Data(0, totalData) = cluster
Do While isStillMoving
' this loop will surely convergent' calculate new centroids
'1 =X, 2=Y, 3=count number of data
ReDim sumXY(1 To 3, 1 To numCluster)
For i = 1 To totalData
sumXY(1, Data(0, i)) = Data(1,i) + sumXY(1, Data(0, i))
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))Next i
For i = 1 To numCluster
Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)Next i
' assign all data to the new centroids
isStillMoving = FalseFor i = 1 To totalData
min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)For j = 1 To numCluster
d = dist(X, Y, Centroid(1,j), Centroid(2, j))
If d < min Then
min = d
cluster = jEnd If
Next j
If Data(0, i) <> cluster Then
Data(0, i) = cluster
isStillMoving = TrueEnd If
Next i
Loop
End If
End Sub