Share this: Google+
In this simple tutorial, you will learn the basic knowledge to expand your data type into multivariate (different type of measurement scale, such as nominal, ordinal, and quantitative) data and go beyond 2 dimensional data scale up to N dimensions. Comprehesive example is given at the last part of this tutorial. You also may download the MS Excel companion file of this tutorial here
This knowledge about similarity and dissimilarity is necessary for data mining, pattern recognition, machine intelligent, artificial intelligent and multi-agents system fields. However, the application is not only limited to computer science field. Other fields of natural and social science as well as engineering and statistics have been applied this kind of simple knowledge. Tools such as K means clustering, Discriminant analysis, K-Nearest Neighbors, or Decision Tree and Hierarchical clustering rely heavily on the distance matrix explained in this tutorial.Why do we need to measure similarity? (Applications)
How do we measure similarity or dissimilarity?
How do we compute dissimilarity or similarity for binary variables?
How do we compute dissimilarity or similarity for nominal / categorical variables?
Assign each value of category as a binary dummy variableHow do we compute dissimilarity or similarity for ordinal variables?
Assign each value of category into several binary dummy variables
Normalized Rank TransformationHow do we compute dissimilarity or similarity for text and string variables?
Hamming Distance for Ordinal Variable
How do we compute dissimilarity or similarity for quantitative variables?
Euclidean DistanceHow do we compute dissimilarity between two groups (Mahalanobis distance)?
City block (Manhattan) distance
Bray Curtis (Sorensen) distance
How do we normalize the similarity or dissimilarity?
How do we aggregate mixed type of variables?
Preferable reference for this tutorial is
Teknomo, Kardi (2015) Similarity Measurement. http:\people.revoledu.comkardi tutorialSimilarity