Kardi Teknomo
Kardi Teknomo Kardi Teknomo Kardi Teknomo
     
 
Research
Publications
Tutorials
Resume
Personal
Resources
Contact

Visit Tutorials below:
Adaptive Learning from Histogram
Adjacency matrix
Analytic Hierarchy Process (AHP)
ArcGIS tutorial
Arithmetic Mean
Bayes Theorem
Bootstrap Sampling
Bray Curtis Distance
Break Even Point
Chebyshev Distance
City Block Distance
Conditional Probability
Continued Fraction
Data Analysis from Questionnaire
Data Revival from Statistics
Decimal to Rational
Decision tree
Difference equations
Digital Root
Discriminant analysis
Divisibility
Eigen Value using Excel
Euclidean Distance
Euler Integration
Euler Number
Excel Iteration
Excel Macro
Excel Tutorial
Feasibility Study
Financial Analysis
Generalized Inverse
Generalized Mean
Geometric Mean
Ginger Bread Man and Chaos
Graph Theory
Growth Model
Hamming Distance
Harmonic Mean
Hierarchical Clustering
Independent Events
Incident matrix
Jaccard Coefficient
Kernel basis function
Kernel Regression
k-Means clustering
K Nearest Neighbor
LAN Connections Switch
Learning from data
Lehmer Mean
Linear Algebra
Logarithm Rules
Mahalanobis Distance
Market Basket Analysis
Mean Absolute Deviation
Mean and Average
Mean, median, mode
Minkowski Distance
Minkowski Mean
Monte Carlo Simulation
Multi Agent System
Multicriteria decision making
Mutivariate Distance
Newton Raphson
Non-Linear Transformation
Normalization Index
Normalized Rank
Ordinary Differential Equation
Page Rank
Palindrome
PI
Power rules
Prime Factor
Prime Number
Q Learning
Quadratic Function
Rank Reversal
Recursive Statistics
Regression Model
Reinforcement Learning
Root of Polynomial
Runge-Kutta
Scenario Analysis
Sierpinski gasket
Sieve of Erastosthenes
Similarity and Distance
Solving System Equation
Standard deviation
Summation Tricks
Support Vector Machines
System dynamic
Time Average
Tower of Hanoi
Variance
Vedic Square
Visual Basic (VB) tutorial
What If Analysis

K-Mean Clustering Code in Matlab

By Kardi Teknomo, PhD.

KMean e-book

< Previous | Next | Contents>

Purchase the latest e-book with complete code of this k means clustering tutorial here

For you who like to use Matlab, Matlab Statistical Toolbox contain a function name kmeans. If you do not have the statistical toolbox, you may use my generic code below. The latest code of kMeanCluster and distMatrix can be downloaded here. The updated code can goes to N dimensions. Alternatively, you may use the old code below (limited to only two-dimensions). For more information about what is k means clustering, how the algorithm works, and numerical example of this code, or application to machine learning and other resources in k means clustering, your may visit the Content of this tutorial

function y=kMeansCluster(m,k,isRand)
%%%%%%%%%%%%%%%%
%
% kMeansCluster - Simple k means clustering algorithm
% Author: Kardi Teknomo, Ph.D.
%
% Purpose: classify the objects in data matrix based on the attributes
% Criteria: minimize Euclidean distance between centroids and object points
% For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html % Output: matrix data plus an additional column represent the group of each object
%
% Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form
% m = [ 1 1;
% 2 1;
% 4 3;
% 5 4]
% k = 2
% kMeansCluster(m,k) produces m = [ 1 1 1;
% 2 1 1;
% 4 3 2;
% 5 4 2]
% Input:
% m - required, matrix data: objects in rows and attributes in columns
% k - optional, number of groups (default = 1)
% isRand - optional, if using random initialization isRand=1, otherwise input any number (default)
% it will assign the first k data as initial centroids
%
% Local Variables
% f - row number of data that belong to group i
% c - centroid coordinate size (1:k, 1:maxCol)
% g - current iteration group matrix size (1:maxRow)
% i - scalar iterator
% maxCol - scalar number of rows in the data matrix m = number of attributes
% maxRow - scalar number of columns in the data matrix m = number of objects
% temp - previous iteration group matrix size (1:maxRow)
% z - minimum value (not needed)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3, isRand=0; end
if nargin<2, k=1; end

[maxRow, maxCol]=size(m)
if maxRow<=k,
y=[m, 1:maxRow]
else

% initial value of centroid
if isRand,
p = randperm(size(m,1)); % random initialization
for i=1:k
c(i,:)=m(p(i),:)
end
else
for i=1:k
c(i,:)=m(i,:) % sequential initialization
end
end

temp=zeros(maxRow,1); % initialize as zero vector

while 1,
d=DistMatrix(m,c); % calculate objcets-centroid distances
[z,g]=min(d,[],2); % find group matrix g
if g==temp,
break; % stop the iteration
else
temp=g; % copy group matrix to temporary variable
end
for i=1:k
f=find(g==i);
if f % only compute centroid if f is not empty
c(i,:)=mean(m(find(g==i),:),1);
end
end
end

y=[m,g];

end

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. The code below works only for two dimensions. If you want to use it for multi-dimensional Euclidean distance, you may purchase the tutorial and the code here. Learn about other type of distance here.

     

function d=DistMatrix(A,B)
             %%%%%%%%%%%%%%%%%%%%%%%%%
             % DISTMATRIX return distance matrix between point A=[x1 y1] and B=[x2 y2]
             % Author: Kardi Teknomo, Ph.D.
             % see http://people.revoledu.com/kardi/
             %
             % Number of point in A and B are not necessarily the same.
             % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
             %
             % A and B must contain two column,
             % first column is the X coordinates
             % second column is the Y coordinates
             % The distance matrix are distance between points in A as row
             % and points in B as column.
             % example: Spacing= dist(A,A)
             % Headway = dist(A,B), with hA ~= hB or hA=hB
             % A=[1 2; 3 4; 5 6]; B=[4 5; 6 2; 1 5; 5 8]
             % dist(A,B)= [ 4.24 5.00 3.00 7.21;
             % 1.41 3.61 2.24 4.47;
             % 1.41 4.12 4.12 2.00 ]
             %%%%%%%%%%%%%%%%%%%%%%%%%%%
             [hA,wA]=size(A);
             [hB,wB]=size(B);
             if hA==1& hB==1
                   d=sqrt(dot((A-B),(A-B)));
            else
                   C=[ones(1,hB);zeros(1,hB)];
                   D=flipud(C);
                   E=[ones(1,hA);zeros(1,hA)];
                   F=flipud(E);
                  G=A*C;
                  H=A*D;
                  I=B*E;
                  J=B*F;
                  d=sqrt((G-I').^2+(H-J').^2);
           end

Purchase the latest e-book with complete code of this k means clustering tutorial here

For more interactive example, you may use the K means program that I made using VB

Do you have question regarding this k means tutorial? Ask your question here

< Previous | Next | Contents>

 

 
 
© 2006 Kardi Teknomo. All Rights Reserved.
Designed by CNV Media