Home Numerical Excel Tutorial Microscopic Pedestrian Simulation Kardi Teknomo's Tutorial Micro-PedSim Free Download Personal Development Handbook

 Research Publications Tutorials Resume Personal Resources Contact

K-Mean Clustering Code in Matlab

By Kardi Teknomo, PhD.

< Previous | Next | Contents>

Purchase the latest e-book with complete code of this k means clustering tutorial here

For you who like to use Matlab, Matlab Statistical Toolbox contain a function name kmeans. If you do not have the statistical toolbox, you may use my generic code below. The latest code of kMeanCluster and distMatrix can be downloaded here. The updated code can goes to N dimensions. Alternatively, you may use the old code below (limited to only two-dimensions). For more information about what is k means clustering, how the algorithm works, and numerical example of this code, or application to machine learning and other resources in k means clustering, your may visit the Content of this tutorial

```function y=kMeansCluster(m,k,isRand)%%%%%%%%%%%%%%%%%                                                        % kMeansCluster - Simple k means clustering algorithm                                                              % Author: Kardi Teknomo, Ph.D.                                                                  %                                                                                                                    % Purpose: classify the objects in data matrix based on the attributes    % Criteria: minimize Euclidean distance between centroids and object points                    % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html
% Output: matrix data plus an additional column represent the group of each object               %                                                                                                                % Example: m = [ 1 1; 2 1; 4 3; 5 4]  or in a nice form                         %          m = [ 1 1;                                                                                     %                2 1;                                                                                         %                4 3;                                                                                         %                5 4]                                                                                         %          k = 2                                                                                             % kMeansCluster(m,k) produces m = [ 1 1 1;                                        %                                   2 1 1;                                                                   %                                   4 3 2;                                                                   %                                   5 4 2]                                                                   % Input:%   m      - required, matrix data: objects in rows and attributes in columns                                                 %   k      - optional, number of groups (default = 1)%   isRand - optional, if using random initialization isRand=1, otherwise input any number (default)%            it will assign the first k data as initial centroids%% Local Variables%   f      - row number of data that belong to group i%   c      - centroid coordinate size (1:k, 1:maxCol)%   g      - current iteration group matrix size (1:maxRow)%   i      - scalar iterator %   maxCol - scalar number of rows in the data matrix m = number of attributes%   maxRow - scalar number of columns in the data matrix m = number of objects%   temp   - previous iteration group matrix size (1:maxRow)%   z      - minimum value (not needed)%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3,        isRand=0;   endif nargin<2,        k=1;        end    [maxRow, maxCol]=size(m)if maxRow<=k,     y=[m, 1:maxRow]else		% initial value of centroid    if isRand,        p = randperm(size(m,1));      % random initialization        for i=1:k            c(i,:)=m(p(i),:)      	end    else        for i=1:k           c(i,:)=m(i,:)        % sequential initialization    	end    end    	temp=zeros(maxRow,1);   % initialize as zero vector    	while 1,        d=DistMatrix(m,c);  % calculate objcets-centroid distances        [z,g]=min(d,[],2);  % find group matrix g        if g==temp,            break;          % stop the iteration        else            temp=g;         % copy group matrix to temporary variable        end        for i=1:k            f=find(g==i);            if f            % only compute centroid if f is not empty                c(i,:)=mean(m(find(g==i),:),1);            end        end	end    	y=[m,g];    end```

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. The code below works only for two dimensions. If you want to use it for multi-dimensional Euclidean distance, you may purchase the tutorial and the code here. Learn about other type of distance here.

`     `

function d=DistMatrix(A,B)
%%%%%%%%%%%%%%%%%%%%%%%%%
% DISTMATRIX return distance matrix between point A=[x1 y1] and B=[x2 y2]
% Author: Kardi Teknomo, Ph.D.
% see http://people.revoledu.com/kardi/
%
% Number of point in A and B are not necessarily the same.
% It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
%
% A and B must contain two column,
% first column is the X coordinates
% second column is the Y coordinates
% The distance matrix are distance between points in A as row
% and points in B as column.
% example: Spacing= dist(A,A)
% Headway = dist(A,B), with hA ~= hB or hA=hB
% A=[1 2; 3 4; 5 6]; B=[4 5; 6 2; 1 5; 5 8]
% dist(A,B)= [ 4.24 5.00 3.00 7.21;
% 1.41 3.61 2.24 4.47;
% 1.41 4.12 4.12 2.00 ]
%%%%%%%%%%%%%%%%%%%%%%%%%%%
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1& hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB)];
D=flipud(C);
E=[ones(1,hA);zeros(1,hA)];
F=flipud(E);
G=A*C;
H=A*D;
I=B*E;
J=B*F;
d=sqrt((G-I').^2+(H-J').^2);
end

Purchase the latest e-book with complete code of this k means clustering tutorial here

Do you have question regarding this k means tutorial? Ask your question here

< Previous | Next | Contents>