Similarity

Distance for Binary Variables

< Previous | Next | Content >


We often face variables that only binary value such as Yes and No, or Agree and Disagree, True and False, Success and Failure, 0 and 1, Absence or Present, Positive and Negative, etc. For such binary variables, there are only two possible values, which can be represented as positive and negative. Similarity of dissimilarity (distance) of two objects that represented by binary variables can be measured in term of number of occurrence (frequency) of positive and negative in each object.

For example:

Feature of Fruit

Sphere shape

Sweet

Sour

Crunchy

Object Distance for Binary Variables =Apple

Yes

Yes

Yes

Yes

Object Distance for Binary Variables =Banana

No

Yes

No

No

The coordinate of Apple is (1,1,1,1) and coordinate of Banana is (0,1,0,0). Because each object is represented by 4 variables, we say that these objects has 4 dimensions.

Let

Distance for Binary Variables = number of variables that positive for both objects

Distance for Binary Variables = number of variables that positive for the Distance for Binary Variables th objects and negative for the Distance for Binary Variables th object

Distance for Binary Variables = number of variables that negative for the Distance for Binary Variables th objects and positive for the Distance for Binary Variables th object

Distance for Binary Variables = number of variables that negative for both objects

Distance for Binary Variables = total number of variables

Object Distance for Binary Variables
Yes No
object Distance for Binary Variables

Yes

Distance for Binary Variables

Distance for Binary Variables

No

Distance for Binary Variables

Distance for Binary Variables

For our example above, we have measured Apple and Banana have Distance for Binary Variables , Distance for Binary Variables and Distance for Binary Variables , Distance for Binary Variables . Thus, Distance for Binary Variables

The most common use of binary dissimilarity (distance) is

  1. Simple Matching distance simple matching coefficient
  2. Jaccard's distance jaccard distnce
  3. Hamming distance hamming distance

B.S. Everit (1978) listed 10 other similarity measures for presence-absence data that have been proposed

  1. simple matching coefficient = simple matching coefficient
  2. Distance for Binary Variables
  3. jaccard coefficient = Jaccard's coefficient
  4. Distance for Binary Variables
  5. Distance for Binary Variables
  6. Distance for Binary Variables
  7. Distance for Binary Variables
  8. Distance for Binary Variables
  9. Distance for Binary Variables
  10. Distance for Binary Variables

< Content | Previous | Next >

Rate this tutorial

This tutorial is copyrighted.

Preferable reference for this tutorial is

Teknomo, Kardi (2015) Similarity Measurement. http:\people.revoledu.comkardi tutorialSimilarity