|
By Kardi Teknomo, PhD.
<Contents | Previous | Next>
For you who like to use Matlab, Matlab Statistical Toolbox contain a function name kmeans. If you do not have the statistical toolbox, you may use my generic code below. The kMeanCluster and distMatrix can be downloaded as text files. Alternatively, you may simply copy and paste the code below. For more information about what is k means clustering, how the algorithm works, and numerical example of this code, or application to machine learning and other resources in k means clustering, your may visit the Content of this tutorial.
function y=kMeansCluster(m,k,isRand) %%%%%%%%%%%%%%%% % % kMeansCluster - Simple k means clustering algorithm % Author: Kardi Teknomo, Ph.D. % % Purpose: classify the objects in data matrix based on the attributes % Criteria: minimize Euclidean distance between centroids and object points % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html
% Output: matrix data plus an additional column represent the group of each object % % Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form % m = [ 1 1; % 2 1; % 4 3; % 5 4] % k = 2 % kMeansCluster(m,k) produces m = [ 1 1 1; % 2 1 1; % 4 3 2; % 5 4 2] % Input: % m - required, matrix data: objects in rows and attributes in columns % k - optional, number of groups (default = 1) % isRand - optional, if using random initialization isRand=1, otherwise input any number (default) % it will assign the first k data as initial centroids % % Local Variables % f - row number of data that belong to group i % c - centroid coordinate size (1:k, 1:maxCol) % g - current iteration group matrix size (1:maxRow) % i - scalar iterator % maxCol - scalar number of rows in the data matrix m = number of attributes % maxRow - scalar number of columns in the data matrix m = number of objects % temp - previous iteration group matrix size (1:maxRow) % z - minimum value (not needed) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<3, isRand=0; end if nargin<2, k=1; end [maxRow, maxCol]=size(m) if maxRow<=k, y=[m, 1:maxRow] else % initial value of centroid if isRand, p = randperm(size(m,1)); % random initialization for i=1:k c(i,:)=m(p(i),:) end else for i=1:k c(i,:)=m(i,:) % sequential initialization end end temp=zeros(maxRow,1); % initialize as zero vector while 1, d=DistMatrix(m,c); % calculate objcets-centroid distances [z,g]=min(d,[],2); % find group matrix g if g==temp, break; % stop the iteration else temp=g; % copy group matrix to temporary variable end for i=1:k f=find(g==i); if f % only compute centroid if f is not empty c(i,:)=mean(m(find(g==i),:),1); end end end y=[m,g]; end
The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here.
function d=DistMatrix(A,B)
%%%%%%%%%%%%%%%%%%%%%%%%%
% DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2]
% Copyright (c) 2005 by Kardi Teknomo, http://people.revoledu.com/kardi/
%
% Numbers of rows (represent points) in A and B are not necessarily the same.
% It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway),
%
% A and B must contain the same number of columns (represent variables of n dimensions),
% first column is the X coordinates, second column is the Y coordinates, and so on.
% The distance matrix is distance between points in A as rows
% and points in B as columns.
% example: Spacing= dist(A,A)
% Headway = dist(A,B), with hA ~= hB or hA=hB
% A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0] % dist(A,B)= [ 4.69 5.83; % 5.00 7.00; % 5.48 7.48; % 4.69 5.83] % % dist(B,A)= [ 4.69 5.00 5.48 4.69; % 5.83 7.00 7.48 5.83] %%%%%%%%%%%%%%%%%%%%%%%%%%%
[hA,wA]=size(A); [hB,wB]=size(B); if wA ~= wB, error(' second dimension of A and B must be the same'); end for k=1:wA C{k}= repmat(A(:,k),1,hB); D{k}= repmat(B(:,k),1,hA); end S=zeros(hA,hB); for k=1:wA S=S+(C{k}-D{k}').^2; end d=sqrt(S);
For more interactive example, you may use the K means program that I make using VB
<Contents | Previous
| Next>
|