Correction of Measurement based on Statistics

Suppose we have the following problem. We have a series of measurement that produces a mean and a variance. Then we realized that one of the data was measured incorrectly. Thus, we removed the incorrect data and replace it with the new measurement. Now we would like to know the corrected mean and the corrected variance.

The usual way is of course calculating the sum of all the measurements data that we have and computing the statistics using traditional mean and variance. If you have only a few data, computing the sum of all measurements, square the deviation from the mean is a simple task. However, if you have millions or billions of data that you have computed efficiently the mean and variance based on recursive average and recursive variance without storing the measurement data, you will feel so much inefficient to correct just one measurement (out of billions measurements) using traditional mean and variance. Is there more efficient way to correct single measurement without computing sum of all measurements and all the sum of square the deviation from the mean?

The formulas below give the answers to compute mean and variance based on correction of measurement efficiently.

$$\mu_{1} =\mu + \frac{x_{+}-x_{-}}{n}$$
$$\sigma _{1}^{2}=\sigma^{2}+\frac{x_{+}^{2}-x_{-}^{2}}{n}-\frac{x_{+}-x_{-}}{n}(2\mu+\frac{x_{+}-x_{-}}{n})$$

In above formulas, we use notation $$\mu$$ and $$\sigma$$ are respectively to symbolize the mean and variance before the correction of measurement. We also use notation $$\mu_{1}$$ and $$\sigma_{1}^{2}$$ are respectively to symbolize the mean and variance after the correction of measurement. Let $$n$$ be the number of measurement data to compute the mean and variance. When we are correcting the measurement data, we put out a single data $$x_{-}$$ and we replace it with another single data $$x_{+}$$ into the series of measurement data. Thus, in correction of measurement, the actual number of data does not change. It remains $$n$$.

Let us take a simple example.
Using the previous measurement data of 4, 6, 12, 9 we have obtained $$\mu=7\frac{3}{4}$$ and $$\sigma^{2}=9\frac{3}{16}$$. Now suppose you just realize that the second measurement data was incorrect and you will replace that measurement with new value of 7. Thus, in here we have $$x_{-}=6$$ and $$x_{+}=7$$ with the number of data $$n=4$$.
Without considering all the measurement data, we can compute the corrected mean as
$$\mu_{1} =\mu + \frac{x_{+}-x_{-}}{n} = 7\frac{3}{4}+\frac{7-6}{4}=8$$
We can also compute the corrected variance without recomputed the whole measurement data
$$\sigma _{1}^{2}=\sigma^{2}+\frac{x_{+}^{2}-x_{-}^{2}}{n}-\frac{x_{+}-x_{-}}{n}(2\mu+\frac{x_{+}-x_{-}}{n})$$ $$=9\frac{3}{16}+\frac{49-36}{4}-\frac{7-6}{4}(16+\frac{7-6}{4})= 8\frac{1}{2}$$

The interactive program below will give you better understanding of the simple example above. You can also try with your own data. Type your previous average, previous variance and the number of data. Then you also need to input the incorrect measurement data and the correct data (only one data, but not necessarily the last measurement). When you click the "Get corrected mean and the corrected variance" button, the program will give you results of the corrected mean and the corrected variance

Current average , Current variance , Number of data

Incorrect measurement data

Correct data (to replace the incorrect measurement data)

Caution should be noted to use the program above. If the number of data is only less than or equal to 2, the result may be incorrect. Similarly if the wrong- data actually does not exist in the sequence of measurement, the formula will still give the answer though it is incorrect.

Preferable reference for this tutorial is

Teknomo, Kardi. (2006) Recursive Average and Variance.
http://people.revoledu.com/kardi/tutorial/RecursiveStatistic/index.html