Hamming Distance for Binary Variables
Finite binary 0 and 1 sequence is sometimes called a word in coding theory. If two words have the same length, we can count the number of digits in positions where they have different digit. The length of different digits is called Hamming distance. If = number of variables with value 1 for the th objects and 0 for the th object and = number of variables with value 0 for the th objects and 1 for the th object, we have
Below is an interactive program to compute hamming distance. Try to experiment with your own input values.
The coordinate of Apple is (1,1,1,1) and coordinate of Banana is (0,1,0,0). Because each object is represented by 4 variables, we say that these objects has 4 dimensions.
Since there are 3 digits different ( and ) , we say that the hamming distance is 3.
(1, 1, 1, 0, 0) and
(0, 1, 0, 0, 0) has Hamming distance of 2 because the first and third digit are different.
(1, 0, 0, 1, 1, 0) and
(0, 1, 0, 1, 0, 1) has Hamming distance of 4 because the first, second, fifth and sith digits are different)
(0, 1, 0) and (0, 1, 0) has zero hamming distance because no digit different between the two words.
Hamming distance is somewhat related to Simple matching distance. Hamming distance divided by the length of word (= total number of variables) will give simple matching distance.
In principle, Hamming distance measure the number of disagreement between two vectors. Hamming distance can also be used for ordinal variables to measure disorder of disorder-vector from a pattern-vector.
The algorithm to compute Hamming distance for ordinal variable involve operation “Putting back” unmatched digit into the correct place, after remove them in the first step:
Hamming distance is the count of operation “Putting back”. This algorithm also works for binary variable. Of course, the Hamming distance is equal to the number of unmatched digit.
Suppose we have two judges (A and B) who give rank of importance over 6 products. The ranking vector is given as follow A=[1, 2, 3, 4, 5, 6] and B = [2, 5, 3, 1, 4, 6]. We want to measure Hamming distance between A and B
We set rank vector A as pattern-vector and vector B as disorder-vector. Our goal is to count the number of steps of operation “Putting back” of digits to make disorder-vector into pattern-vector. Diagram below show the steps. Since we count four number of interchange, thus the Hamming distance between A and B is 4.
Preferable reference for this tutorial is
Teknomo, Kardi. Similarity Measurement. http:\\people.revoledu.com\kardi\ tutorial\Similarity\
© 2006 Kardi Teknomo. All Rights Reserved.
Designed by CNV Media