Kardi Teknomo
Kardi Teknomo Kardi Teknomo Kardi Teknomo
     
 
Research
Publications
Tutorials
Resume
Personal
Resources
Contact

K-Means Clustering Code Example

By Kardi Teknomo, PhD.

KMean e-book

< Previous | Next | Contents>

Tired of ads? Do want to read comfortably this tutorial from any device? Purchase the complete e-book of this k means clustering tutorial.

As an example, I have made a Visual Basic code. You may download the complete updated code plus this tutorial in here or the old code is available at the mirror.

The number of features is limited to two only but you may extent it to any number of features by yourself. The main code is shown below. You may also see the screen shot of the program. How it works is explained in the previous page.

When User click picture box to input new data (X, Y), the program will make group/cluster the data by minimizing the sum of squares of Euclidean distances between data and the corresponding cluster centroid. Each dot is representing an object and the coordinate (X, Y) represents two attributes of the object. The colors of the dot and label number represent the cluster. You may try how the cluster may change when additional data is inputted.

Sub kMeanCluster (Data() As Variant, numCluster As Integer)
' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)

Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()
Dim isStillMoving As Boolean

isStillMoving = True
If totalData <= numCluster Then

'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData ' cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y

Else

'calculate minimum distance to assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numCluster

d = dist(X, Y, Centroid(1, i), Centroid(2, i))

If d < min Then

min = d
cluster = i

End If

Next i

Data(0, totalData) = cluster

Do While isStillMoving
' this loop will surely convergent

'calculate new centroids

' 1 =X, 2=Y, 3=count number of data

ReDim sumXY(1 To 3, 1 To numCluster)

For i = 1 To totalData

sumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))

Next i

For i = 1 To numCluster

Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)

Next i


'assign all data to the new centroids
isStillMoving = False


For i = 1 To totalData

min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)

For j = 1 To numCluster

d = dist(X, Y, Centroid(1, j), Centroid(2, j))

If d < min Then

min = d
cluster = j

End If

Next j

If Data(0, i) <> cluster Then

Data(0, i) = cluster
isStillMoving = True

End If

Next i

Loop

End If

End Sub

The schematic of 3 matrix variables are given below

 

 

 

To understand other type of distances click here


Read this tutorial off-line in any device. Purchase the complete e-book of K Means Clustering tutorial here

Do you have question regarding this k means tutorial? Ask your question here

< Previous | Next | Contents >

 

 
 
© 2006 Kardi Teknomo. All Rights Reserved.
Designed by CNV Media