By Kardi Teknomo, PhD.

KMean e-book

< Previous | Next | Contents >

Tired of ads? Do want to read comfortably this tutorial from any device? Purchase the complete e-book of this k means clustering tutorial .

K Means Code

As an example, I have made a Visual Basic code. You may download the complete updated code plus this tutorial in here

The number of features is limited to two only but you may extent it to any number of features by yourself. The main code is shown below. You may also see the screen shot of the program. How it works is explained in the previous page.

When User click picture box to input new data (X, Y), the program will make group/cluster the data by minimizing the sum of squares of Euclidean distances between data and the corresponding cluster centroid. Each dot is representing an object and the coordinate (X, Y) represents two attributes of the object. The colors of the dot and label number represent the cluster. You may try how the cluster may change when additional data is inputted.

Sub kMeanCluster (Data() As Variant, numCluster As Integer)
' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)

Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()
Dim isStillMoving As Boolean

isStillMoving = True
If totalData <= numCluster Then

'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y


' calculate minimum distance to assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numCluster

d = dist(X, Y, Centroid(1, i), Centroid(2, i))

If d < min Then

min = d
cluster = i

End If

Next i

Data(0, totalData) = cluster

Do While isStillMoving
' this loop will surely convergent

' calculate new centroids

'1 =X, 2=Y, 3=count number of data

ReDim sumXY(1 To 3, 1 To numCluster)

For i = 1 To totalData

sumXY(1, Data(0, i)) = Data(1,i) + sumXY(1, Data(0, i))
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))

Next i

For i = 1 To numCluster

Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)

Next i
' assign all data to the new centroids
isStillMoving = False

For i = 1 To totalData

min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)

For j = 1 To numCluster

d = dist(X, Y, Centroid(1,j), Centroid(2, j))

If d < min Then

min = d
cluster = j

End If

Next j

If Data(0, i) <> cluster Then

Data(0, i) = cluster
isStillMoving = True

End If

Next i


End If

End Sub