| |||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||
|
Normalization In this section of Similarity tutorial, you will learn on how to put Distance or Similarity as Performance index into a range of 0 and 1 or [0, 1] in short. The process of transforming our index from its value into a range of 0 and 1 is called normalization . I will also briefly discuss about statistical normalization in this section. Suppose the dissimilarity index is in the range of [ There are several ways to normalize an index. In principle, to aggregate a sequence of numbers into range of [0, 1] we need to make them positive and divide with something that is bigger than the nominator. Using this principle, we can make use any inequality to normalize the index. The following are simple transformations that can be used for wide range of application. Please take care of the condition of each transformation. Check also below for Statistical Normalization and Normalizing Negative Data Normalization Methods1. One way to normalize an index is to use this function The value of
It gives
Setting higher value of For example,
2. If we know the maximum and minimum value of our index, then transformation
It will change transform it into range of [0, 1]. If
The graph of
3. In case we know the value of our index is always zero or positive, but we do not know the maximum value of our index. Suppose the number of indices are fixed to be
The normalized value of (6) is smaller than (5) because
4. If our index can take negative value, we can normalize each indices by taking its relative absolute value or square value to the total:
or
5. Bray Curtis Normalization. If we have a pair of indices which always zero or positive and both cannot be zero at the same time, we can normalize them using absolute difference divided by the summation.
Removing the absolute sign will give range of
6. To normalized ordinal value of comparison index, perform the following steps:
Example
See also: Distance for ordinal variables
7. We know from mathematics that for any positive values, arithmetic means is always larger or equal to geometric mean. We can use this knowledge to normalize our index. Provided that
For example
8. Another inequalities from mathematics theory said that absolute value of arithmetic mean is smaller or equal to quadratic mean. We can use this knowledge to normalize our index for any real value of
For example Normalizing negative dataAll above normalization will work well if your data is positive or zero. How if your data contain some negative numbers? For example, you have data -1, 3 and 4. The sum is 6. If you normalize it by the maximum value you will get -1/6, ½, and 2/3. The sum of the three is still one but now you have negative number (-1/6) as part of your index. How to solve this problem? The solution is simple: Shift your data by adding all numbers with the absolute of the most negative (minimum value of your data) such that the most negative one will become zero and all other number become positive. Then you can normalize your data as usual with any of above procedures. For example: Your data is -1, 3 and 4. The most negative number is -1, thus you add all numbers with +1 to become: 0, 4, 5 then normalize it become: 0, 4/9 and 5/9.
Statistical NormalizationFinally, I would like to give a note about another type of normalization which also called Statistical normalization. The purpose of statistical normalization is to convert a data derived from any Normal distribution into Normal distribution with mean zero and variance = 1. The formula of statistical normalization is Z = (X-u) /s You have your data as vector X then you minus with the mean of the data, u, and divide this difference by the standard deviation, you will get another vector Z that has normal distribution with zero mean and unit variance (it is also called Standard Normal distribution, N(0,1) ). However, the range of the standard Normal distribution is not between [0,1]. The range of standard Normal distribution is about -3 to +3 (actually infinity to infinity but using -3 to +3 you already capture 99.9% of your data).
Preferable reference for this tutorial is Teknomo, Kardi. Similarity Measurement. http:\\people.revoledu.com\kardi\ tutorial\Similarity\
|
|||||||||||||||
|
||||||||||||||||
© 2006 Kardi Teknomo. All Rights Reserved. Designed by CNV Media |
||||||||||||||||