Similarity

< Previous | Next | Content >

Aggregate Multivariate Distance

In reality, we have very rare of single type measurement scale. Most of cases in real measurements (especially in behavioral survey) may consist of mixed type measurement scale of nominal, ordinal, and quantitative scale. How do we handle this situation?

  1. Use only normalized distance or similarity (which has value [0, 1]) for all variables.
  2. Determine the weight of each feature variable Aggregate Multivariate distance (usually between 0 and 1)
  3. Then, general aggregated similarity and dissimilarity index are simple weighted average of distance matrices of each features variables

Aggregate Multivariate distance and Aggregate Multivariate distance

Index Aggregate Multivariate distance represents the features variables. Aggregate Multivariate distance and Aggregate Multivariate distance are similarity and dissimilarity of between object Aggregate Multivariate distance and Aggregate Multivariate distance for feature Aggregate Multivariate distance .

The weights are determined arbitrary, based on unit or based on the data (calibration). For example if one variable has unit ton/cubic meter and the other variable is kg/cubic meter, then weight of 1/1000 is expected to be given to give equal units. Equal weights (all Aggregate Multivariate distance = 1) for all variables may be the default weight if no other information is given.

See comprehensive example on how to aggregate different data type here.

< Previous | Next | Content >

Rate this tutorial

This tutorial is copyrighted.

Preferable reference for this tutorial is

Teknomo, Kardi (2015) Similarity Measurement. http:\people.revoledu.comkardi tutorialSimilarity