By Kardi Teknomo, PhD.

## K Means Algorithm in Matlab

For you who like to use Matlab, Matlab Statistical Toolbox contain a function name kmeans . If you do not have the statistical toolbox, you may use my generic code below. The latest code of kMeanCluster and distMatrix can be downloaded here . The updated code can goes to N dimensions. Alternatively, you may use the old code below (limited to only two-dimensions). For more information about what is k means clustering , how the algorithm works , and numerical example of this code, or application to machine learning and other resources in k means clustering, your may visit the Content of this tutorial

```function y=kMeansCluster(m,k,isRand)%%%%%%%%%%%%%%%%%                                                        % kMeansCluster - Simple k means clustering algorithm                                                              % Author: Kardi Teknomo, Ph.D.                                                                  %                                                                                                                    % Purpose: classify the objects in data matrix based on the attributes    % Criteria: minimize Euclidean distance between centroids and object points                    % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html
% Output: matrix data plus an additional column represent the group of each object               %                                                                                                                % Example: m = [ 1 1; 2 1; 4 3; 5 4]  or in a nice form                         %          m = [ 1 1;                                                                                     %                2 1;                                                                                         %                4 3;                                                                                         %                5 4]                                                                                         %          k = 2                                                                                             % kMeansCluster(m,k) produces m = [ 1 1 1;                                        %                                   2 1 1;                                                                   %                                   4 3 2;                                                                   %                                   5 4 2]                                                                   % Input:%   m      - required, matrix data: objects in rows and attributes in columns                                                 %   k      - optional, number of groups (default = 1)%   isRand - optional, if using random initialization isRand=1, otherwise input any number (default)%            it will assign the first k data as initial centroids%% Local Variables%   f      - row number of data that belong to group i%   c      - centroid coordinate size (1:k, 1:maxCol)%   g      - current iteration group matrix size (1:maxRow)%   i      - scalar iterator %   maxCol - scalar number of rows in the data matrix m = number of attributes%   maxRow - scalar number of columns in the data matrix m = number of objects%   temp   - previous iteration group matrix size (1:maxRow)%   z      - minimum value (not needed)%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3,        isRand=0;   endif nargin<2,        k=1;        end    [maxRow, maxCol]=size(m)if maxRow<=k,     y=[m, 1:maxRow]else		% initial value of centroid    if isRand,        p = randperm(size(m,1));      % random initialization        for i=1:k            c(i,:)=m(p(i),:)      	end    else        for i=1:k           c(i,:)=m(i,:)        % sequential initialization    	end    end    	temp=zeros(maxRow,1);   % initialize as zero vector    	while 1,        d=DistMatrix(m,c);  % calculate objcets-centroid distances        [z,g]=min(d,[],2);  % find group matrix g        if g==temp,            break;          % stop the iteration        else            temp=g;         % copy group matrix to temporary variable        end        for i=1:k            f=find(g==i);            if f            % only compute centroid if f is not empty                c(i,:)=mean(m(find(g==i),:),1);            end        end	end    	y=[m,g];    end```

The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. The code below works only for two dimensions. If you want to use it for multi-dimensional Euclidean distance, you may purchase the tutorial and the code here . Learn about other type of distance here .

function d=DistMatrix(A,B)
%%%%%%%%%%%%%%%%%%%%%%%%%
% DISTMATRIX return distance matrix between point A=[x1 y1] and B=[x2 y2]
% Author: Kardi Teknomo, Ph.D.
% see http://people.revoledu.com/kardi/
%
% Number of point in A and B are not necessarily the same.
% It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
%
% A and B must contain two column,
% first column is the X coordinates
% second column is the Y coordinates
% The distance matrix are distance between points in A as row
% and points in B as column.
% example: Spacing= dist(A,A)
% Headway = dist(A,B), with hA ~= hB or hA=hB
% A=[1 2; 3 4; 5 6]; B=[4 5; 6 2; 1 5; 5 8]
% dist(A,B)= [ 4.24 5.00 3.00 7.21;
% 1.41 3.61 2.24 4.47;
% 1.41 4.12 4.12 2.00 ]
%%%%%%%%%%%%%%%%%%%%%%%%%%%
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1& hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB)];
D=flipud(C);
E=[ones(1,hA);zeros(1,hA)];
F=flipud(E);
G=A*C;
H=A*D;
I=B*E;
J=B*F;
d=sqrt((G-I').^2+(H-J').^2);
end