Similarity Measurement

In this simple tutorial, you will learn the basic knowledge to expand your data type into multivariate (different type of measurement scale, such as nominal, ordinal, and quantitative) data and go beyond 2 dimensional data scale up to N dimensions . Comprehesive example is given at the last part of this tutorial. You also may download the MS Excel companion file of this tutorial here

This knowledge about similarity and dissimilarity is necessary for data mining, pattern recognition, machine intelligent, artificial intelligent and multi-agents system fields. However, the application is not only limited to computer science field. Other fields of natural and social science as well as engineering and statistics have been applied this kind of simple knowledge. Tools such as K means clustering , Discriminant analysis , K-Nearest Neighbors , or Decision Tree and Hierarchical clustering rely heavily on the distance matrix explained in this tutorial.

What is similarity?

What is distance ?
What is the relationship between similarity and dissimilarity?

Why do we need to measure similarity? (Applications)
How do we measure similarity or dissimilarity?
How do we compute dissimilarity or similarity for binary variables ?

Simple Matching Coefficient
Jaccard's Coefficient
Hamming Distance

How do we compute dissimilarity or similarity for nominal / categorical variables?

Assign each value of category as a binary dummy variable
Assign each value of category into several binary dummy variables

How do we compute dissimilarity or similarity for ordinal variables?

Normalized Rank Transformation
Spearman Distance
Footrule Distance
Kendall Distance
Cayley Distance
Hamming Distance for Ordinal Variable
Ulam Distance

How do we compute dissimilarity or similarity for text and string variables ?
How do we compute dissimilarity or similarity for quantitative variables ?

Euclidean Distance
City block (Manhattan) distance
Chebyshev Distance
Minkowski Distance
Canberra distance
Bray Curtis (Sorensen) distance
Angular separation
Correlation coefficient

How do we compute dissimilarity between two groups ( Mahalanobis distance )?
How do we normalize the similarity or dissimilarity?
How do we aggregate mixed type of variables?

Comprehensive example: Distance matrix of Multivariate data

Resources

Rate and give comment for this tutorial

This tutorial is copyrighted.

Preferable reference for this tutorial is

Teknomo, Kardi (2015) Similarity Measurement. http:\people.revoledu.comkardi tutorialSimilarity