Similarity Measurement
In this simple tutorial, you will learn the basic knowledge to expand your data type into multivariate (different type of measurement scale, such as nominal, ordinal, and quantitative) data and go beyond 2 dimensional data scale up to N dimensions . Comprehesive example is given at the last part of this tutorial. You also may download the MS Excel companion file of this tutorial here
This knowledge about similarity and dissimilarity is necessary for data mining, pattern recognition, machine intelligent, artificial intelligent and multi-agents system fields. However, the application is not only limited to computer science field. Other fields of natural and social science as well as engineering and statistics have been applied this kind of simple knowledge. Tools such as K means clustering , Discriminant analysis , K-Nearest Neighbors , or Decision Tree and Hierarchical clustering rely heavily on the distance matrix explained in this tutorial.
What is similarity?
What is distance ?Why do we need to measure similarity? (Applications)
What is the relationship between similarity and dissimilarity?
How do we measure similarity or dissimilarity?
How do we compute dissimilarity or similarity for binary variables ?
Simple Matching CoefficientHow do we compute dissimilarity or similarity for nominal / categorical variables?
Jaccard's Coefficient
Hamming Distance
Assign each value of category as a binary dummy variableHow do we compute dissimilarity or similarity for ordinal variables?
Assign each value of category into several binary dummy variables
Normalized Rank TransformationHow do we compute dissimilarity or similarity for text and string variables ?
Spearman Distance
Footrule Distance
Kendall Distance
Cayley Distance
Hamming Distance for Ordinal Variable
Ulam Distance
How do we compute dissimilarity or similarity for quantitative variables ?
Euclidean DistanceHow do we compute dissimilarity between two groups ( Mahalanobis distance )?
City block (Manhattan) distance
Chebyshev Distance
Minkowski Distance
Canberra distance
Bray Curtis (Sorensen) distance
Angular separation
Correlation coefficient
How do we normalize the similarity or dissimilarity?
How do we aggregate mixed type of variables?
Comprehensive example: Distance matrix of Multivariate dataResources
Rate and give comment for this tutorial
Preferable reference for this tutorial is
Teknomo, Kardi (2015) Similarity Measurement. http:\people.revoledu.comkardi tutorialSimilarity