|
|||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||
|
How can we be sure that the best line is linear? The purpose of modeling is to find the best model that can represent your data. Suppose you have a regression formula The first step is to see visually by plotting the data. Use independent variable as x-axis and dependent variable as y-axis. This plot will give you idea on what type of model you may use as the best-fit model for your data. Modeling is quite an art that we need to ‘guess' what is the best model. If the plot shows that the data is not linear, you must try to use other type of model or other combination of variables. Do not force yourself to use linear model when your data is non-linear! Several indices can be used to examine the goodness of fit of the model. These indices must be used with care and understanding on the meaning. Most common indices are
To say that your model is fit, you need to prove that all those indices should exceed the criteria. Below is the brief discussion of these indices together with the criteria. One of the indices to measure model goodness of fit is R-squared, or coefficient of determination. It is the proportion of variation explained by the best line model. It depends on the ratio of sum of square error from the regression model (SSE) and the sum of squares difference around the mean (SST = sum of square total)
However, the SST and SSE are not measure of the variance. To use the proportion of variances, we need to average the sum of square. As the result we have
Where mean square error is
Standard Error is another index that often be used for goodness of fit of the model
Another index for goodness of fit of the model is F-statistic,
where Mean Square Regression is given as
The F statistics is often presented as ANOVA (analysis of Variance) table below
If the R-squared approach one, the value of standard error will approach zero and the value of F statistic goes to infinity. The F statistic is compared with the F value from the F distribution with degree of freedom ( You may allow some degree of error for your model to be quite small. This error degree is called significant level, denoted by While the other four indices above represent the overall fitness of the model, t statistics explain the fitness of individual model parameter. If the t-statistics of a parameter is less than t distribution with degree of freedom n-2 at significant level
In the next sections, you may see how to obtain our best line model using linear regression formula by hand calculation or spreadsheet. You may apply that formula without worrying about how to compute using the linear regression formula, check how you could do it with just a few clicks and little typing using Microsoft Excel.
Send your comments, questions and suggestions
Preferable reference for this tutorial is Teknomo, Kardi. Regression Model using Microsoft Excel. http:\\people.revoledu.com\kardi\ tutorial\Regression\
|
|||||||||||||||||||||||||||||||||||
© 2006 Kardi Teknomo. All Rights Reserved. Designed by CNV Media |
||||||||||||||||||||||||||||||||||||