By Lyra Filiola

Bootstrap Sampling Tutorial

< Back | Next | Index >

Bootstrap Statistics for Generalized Linear Regression

In this section we will performs Bootstrap for general linear regression model (or so called Multiple Linear regression model). You can input as many independent variables as you want. Note that the independent variable should have been a matrix with the first column filled by "1".

For example:

 > x1<-1:10



 > x2<-11:20



 > x3<-21:30



 > x<-matrix(1,10)



 > x<-cbind(x,x1,x2,x3)



 > x



   x1 x2 x3



   [1,] 1 1 11 21



   [2,] 1 2 12 22



   [3,] 1 3 13 23



   [4,] 1 4 14 24



   [5,] 1 5 15 25



   [6,] 1 6 16 26



   [7,] 1 7 17 27



   [8,] 1 8 18 28



   [9,] 1 9 19 29



   [10,] 1 10 20 30



 >

In this section we will discuss about the script I made to perform Boostrap on general linear regression model. Similar to previous section, the script is separated part by part and the explanation of each part of the script will follow after that. Later, you can see some example application on how to use this script using R. The script itself can be downloaded here.

The Bootstrap's script for general linear regression case is as follows:

  I multiple.reg.boot<-function(x,y,p,b,k,alpha)



   {



II n<-length(y)



   int1<-t(x)



   int2<-int1%*%x



   int3<-solve(int2)



   int4<-int1%*%y



   koef.beta<-int3%*%int4



   index.beta<-matrix(0:(p-1))



   out<-cbind(index.beta,koef.beta)



   cat("Least Square Estimator","\n")



   print(out)
   cat("\n")



III cat("Standard Error for Beta","\n")



   int5<-t(y)



   int6<-t(koef.beta)



   int7<-int5%*%y



   int8<-int6%*%int1



   int9<-int8%*%y



   sse<-int7-int9



   mse<-sse/(n-p)



   msedata<-mse[1]



   var.all<-msedata*int3



   var.koef.beta<-diag(var.all)



   se.koef.beta<-sqrt(var.koef.beta)



   print(se.koef.beta)



   cat("\n")
IV cat(((1-(alpha))*100),"% Confidence Interval for Beta","\n")



   low<-koef.beta-qt((1-(alpha/2)),(n-p))*se.koef.beta



   up<-koef.beta+qt((1-(alpha/2)),(n-p))*se.koef.beta



   cat("Lower Bound","\n")



   print(low)



   cat("\n")



   cat("Upper Bound","\n")



   print(up)



   cat("\n")



         
V matcoef<-matrix(0:(p-1))



   beta.boot=NULL



   for(i in 1:b)



   {



VI v<-sample(1:n,k,replace=TRUE)



   ystarpre<-y[v]



   xstarpre<-x[v,1:p]



   ystar<-data.matrix(ystarpre)



   xstar<-data.matrix(xstarpre)



   inta<-t(xstar)



   intb<-inta%*%xstar



   detb<-det(intb)
VII if(detb==0)



   {



   repeat



   {



   v<-sample(1:n,k,replace=TRUE)



   ystarpre<-y[v]



   xstarpre<-x[v,1:p]



   ystar<-data.matrix(ystarpre)



   xstar<-data.matrix(xstarpre)



   inta<-t(xstar)



   intb<-inta%*%xstar



   detb<-det(intb)



   if(detb!=0) next



   }



   }
VIII intc<-solve(intb)



   intd<-inta%*%ystar



   koef.beta.boot<-intc%*%intd



   {



   beta.boot=koef.beta.boot



   }



   matcoef<-cbind(matcoef,beta.boot)



   }
IX matcoef1<-matcoef[,2:(b+1)]



   mean.beta.boot<-apply(matcoef1,1,mean)



   bias.beta.boot<-mean.beta.boot-koef.beta



   var.beta.boot<-apply(matcoef1,1,var)



   se.beta.boot<-sqrt(var.beta.boot)
 loboundpre<-matrix(0:(p-1))



   upboundpre<-matrix(0:(p-1))



   for (t in 1:p)



   {



   matquant<-matcoef1[t,]



   loboundquant<-quantile(matquant,(alpha/2))



   upboundquant<-quantile(matquant,(1-(alpha/2)))
 loboundpre<-rbind(loboundpre,loboundquant)



   upboundpre<-rbind(upboundpre,upboundquant)



   }
 lobound0<-loboundpre[(p+1):(2*p),]



   upbound0<-upboundpre[(p+1):(2*p),]
 lobound<-matrix(0:(p-1))



   lobound<-cbind(lobound,lobound0)
 upbound<-matrix(0:(p-1))



   upbound<-cbind(upbound,upbound0)



         
X cat("Bootstrap Correlation Model","\n")



   cat("bias beta","\n")



   print(bias.beta.boot)



   cat("\n")



   cat("standard error beta","\n")



   print(se.beta.boot)



   cat("\n")



   cat((1-alpha)*100,"% Confidence Interval for beta","\n")



   cat("lower bound =","\n")



   print(lobound)



   cat("upper bound =","\n")



   print(upbound)



   cat("\n")
 }

Next I will explain the script above part by part

Part I
This program named " multiple.reg.boot " and the inputs are as follows:
x : matrix of independent variable
y : matrix of dependent variable
p : number column of x
b : number of Bootstrap's replications
k : length of Bootstrap sample
alpha : significance level

Part II
This part is calculating regression's estimator using least square method.

Part III
This part calculates the standard error for estimator of regression.

Part IV
This part calculates the confidence interval for estimator of regression.

Part V
This part is the beginning of Bootstrapping. First, we define matcoef and Bootstrap estimator for regression, beta.boot. Those statistics will be bootstrapped.

Part VI
This part obtains Bootstrap sample.

Part VII
This part anticipates if the determinant of is equal to 0, so that the inverse is not defined. In this part, syntax at part VI is repeated until it finds such Bootstrap sample that the determinant of is not equal to 0.

Part VIII
This part is the finishing of Bootstrapping data. The result is stored at a matrix named matcoef.

Part IX
Remember in part V, matcoef defined as certain matrix and then it is added by Bootstrap estimator of regression. For the next computation, we don't need the first column in matcoef. So, matcoef1 is defined as matcoef without the first column. The bias, standard error and confidence interval is computed here.

Part X
This part sets the output of bootstrapping data.

Example:
To see how the script works, we use the data set about "Zarthan Company" taken from Neter (1990).

Dataset is divided into independent variable and dependent variable. For this case, "Target Population" and "Per Capita Discretionary Income" are the independent variable, and "Sales" is the dependent variable.

> y



   [1] 162 120 223 131 67 169 81 192 116 55 252 232 144 103 212



 > x



   x1 x2



   [1,] 1 274 2450



   [2,] 1 180 3254



   [3,] 1 375 3802



   [4,] 1 205 2838



   [5,] 1 86 2347



   [6,] 1 265 3782



   [7,] 1 98 3008



   [8,] 1 330 2450



   [9,] 1 195 2137



   [10,] 1 53 2560



   [11,] 1 430 4020



   [12,] 1 372 4427



   [13,] 1 236 2660



   [14,] 1 157 2088



   [15,] 1 370 2605



 >

Suppose you have save the script in directory name " D:/MyDirectory ", then you can load this script into R with the following command


> source("D:/MyDirectory/multipleregboot.R")


Alternatively you can use menu File > Source R code... and pointing to the script file.

To use the data set above for Boostrap general linear regression model, you type

> multiple.reg.boot(x,y,3,500,15,0.1)

The command above will generate 15 Bootstrap samples, each sample has 500 replication from the orginal data set that has independent variable name x and dependent variable name y with 10% significant level. The independent variables contain 3 columns. The result of this computation is as follows:

Least Square Estimator 



   [,1] [,2]



   0 3.45261279



   x1 1 0.49600498



   x2 2 0.00919908



         
Standard Error for Beta 



   x1 x2 



   2.4306504935 0.0060544412 0.0009681139 
90 % Confidence Interval for Beta 



           Lower Bound 



           



   [,1]



   -0.879505337



   x1 0.485214221



   x2 0.007473624
Upper Bound 



             



   [,1]



   7.78473092



   x1 0.50679573



   x2 0.01092454
Bootstrap Correlation Model 



           bias beta 



           



   [,1]



   -8.032102e-02



   x1 -3.723742e-04



   x2 1.598148e-05
standard error beta 



   x1 x2 



   3.026618419 0.005882961 0.001164328 
90 % Confidence Interval for beta 



           lower bound = 



   lobound0



   loboundquant 0 -1.78008148



   loboundquant 1 0.48541920



   loboundquant 2 0.00719702
upper bound = 



   upbound0



   upboundquant 0 8.76600905



   upboundquant 1 0.50505110



   upboundquant 2 0.01119679

The output above shows that Bootstrap also did a good job in estimating the property of general linear regression's estimators. The result from Bootstrap method is not differ very much from the classical method. Table below show some comparison between Bootstrap method and Classical Least square method.

Bootstrap Sampling Tutorial

REFERENCES
Dalgaard, P. (2002). Introductory Statistics with R. New York: Springer.

Efron, B. and Tibishirani, R. J. (1993). An Introduction to the Bootstrap. London�: Chapman & Hall.

Greene, W. H. (2000). Econometric Analysis. New York: Macmillan Publishing Company.

Neter, J., and friends. (1990). Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs, Third Edition. Boston: Irwin.

R Development Core Team (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

< Back | Next | Index >

This tutorial is copyrighted .

Preferable reference for this tutorial is

Filiola, L., (2006) Bootstrap Computation using R, http://people.revoledu.com/kardi/tutorial/Bootstrap/Lyra/index.html