By Lyra Filiola
   
 
Bootstrap Statistics for Generalized Linear Regression
In this section we will performs Bootstrap for general linear regression model (or so called Multiple Linear regression model). You can input as many independent variables as you want. Note that the independent variable should have been a matrix with the first column filled by "1".
  For example:
  
  
   
  
 
 > x1<-1:10
 > x2<-11:20
 > x3<-21:30
 > x<-matrix(1,10)
 > x<-cbind(x,x1,x2,x3)
 > x
   x1 x2 x3
   [1,] 1 1 11 21
   [2,] 1 2 12 22
   [3,] 1 3 13 23
   [4,] 1 4 14 24
   [5,] 1 5 15 25
   [6,] 1 6 16 26
   [7,] 1 7 17 27
   [8,] 1 8 18 28
   [9,] 1 9 19 29
   [10,] 1 10 20 30
 >
 In this section we will discuss about the script I made to perform Boostrap on general linear regression model. Similar to previous section, the script is separated part by part and the explanation of each part of the script will follow after that. Later, you can see some example application on how to use this script using R. The script itself can be downloaded here.
The Bootstrap's script for general linear regression case is as follows:
I multiple.reg.boot<-function(x,y,p,b,k,alpha) { II n<-length(y) int1<-t(x) int2<-int1%*%x int3<-solve(int2) int4<-int1%*%y koef.beta<-int3%*%int4 index.beta<-matrix(0:(p-1)) out<-cbind(index.beta,koef.beta) cat("Least Square Estimator","\n") print(out)cat("\n") III cat("Standard Error for Beta","\n") int5<-t(y) int6<-t(koef.beta) int7<-int5%*%y int8<-int6%*%int1 int9<-int8%*%y sse<-int7-int9 mse<-sse/(n-p) msedata<-mse[1] var.all<-msedata*int3 var.koef.beta<-diag(var.all) se.koef.beta<-sqrt(var.koef.beta) print(se.koef.beta) cat("\n")IV cat(((1-(alpha))*100),"% Confidence Interval for Beta","\n") low<-koef.beta-qt((1-(alpha/2)),(n-p))*se.koef.beta up<-koef.beta+qt((1-(alpha/2)),(n-p))*se.koef.beta cat("Lower Bound","\n") print(low) cat("\n") cat("Upper Bound","\n") print(up) cat("\n")V matcoef<-matrix(0:(p-1)) beta.boot=NULL for(i in 1:b) { VI v<-sample(1:n,k,replace=TRUE) ystarpre<-y[v] xstarpre<-x[v,1:p] ystar<-data.matrix(ystarpre) xstar<-data.matrix(xstarpre) inta<-t(xstar) intb<-inta%*%xstar detb<-det(intb)VII if(detb==0) { repeat { v<-sample(1:n,k,replace=TRUE) ystarpre<-y[v] xstarpre<-x[v,1:p] ystar<-data.matrix(ystarpre) xstar<-data.matrix(xstarpre) inta<-t(xstar) intb<-inta%*%xstar detb<-det(intb) if(detb!=0) next } }VIII intc<-solve(intb) intd<-inta%*%ystar koef.beta.boot<-intc%*%intd { beta.boot=koef.beta.boot } matcoef<-cbind(matcoef,beta.boot) }IX matcoef1<-matcoef[,2:(b+1)] mean.beta.boot<-apply(matcoef1,1,mean) bias.beta.boot<-mean.beta.boot-koef.beta var.beta.boot<-apply(matcoef1,1,var) se.beta.boot<-sqrt(var.beta.boot)loboundpre<-matrix(0:(p-1)) upboundpre<-matrix(0:(p-1)) for (t in 1:p) { matquant<-matcoef1[t,] loboundquant<-quantile(matquant,(alpha/2)) upboundquant<-quantile(matquant,(1-(alpha/2)))loboundpre<-rbind(loboundpre,loboundquant) upboundpre<-rbind(upboundpre,upboundquant) }lobound0<-loboundpre[(p+1):(2*p),] upbound0<-upboundpre[(p+1):(2*p),]lobound<-matrix(0:(p-1)) lobound<-cbind(lobound,lobound0)upbound<-matrix(0:(p-1)) upbound<-cbind(upbound,upbound0)X cat("Bootstrap Correlation Model","\n") cat("bias beta","\n") print(bias.beta.boot) cat("\n") cat("standard error beta","\n") print(se.beta.boot) cat("\n") cat((1-alpha)*100,"% Confidence Interval for beta","\n") cat("lower bound =","\n") print(lobound) cat("upper bound =","\n") print(upbound) cat("\n")}
Next I will explain the script above part by part
  
   Part I
  
  
   This program named "
   
    
     multiple.reg.boot
    
   
   " and the inputs are as follows:
   
    x : matrix of independent variable
    
     y : matrix of dependent variable
     
      p : number column of x
      
       b : number of Bootstrap's replications
       
        k : length of Bootstrap sample
        
         alpha : significance level
 
  
   Part II
  
  
   This part is calculating regression's estimator using least square method.
  
 
  
   Part III
  
  
   This part calculates the standard error for estimator of regression.
  
 
  
   Part IV
  
  
   This part calculates the confidence interval for estimator of regression.
  
 
  
   Part V
  
  
   This part is the beginning of Bootstrapping. First, we define matcoef and Bootstrap estimator for regression, beta.boot. Those statistics will be bootstrapped.
  
 
  
   Part VI
  
  
   This part obtains Bootstrap sample.
  
 
  
   Part VII
  
  
   This part anticipates if the determinant of is equal to 0, so that the inverse is not defined. In this part, syntax at part VI is repeated until it finds such Bootstrap sample that the determinant of is not equal to 0.
  
 
  
   Part VIII
  
  
   This part is the finishing of Bootstrapping data. The result is stored at a matrix named matcoef.
  
 
  
   Part IX
  
  
   Remember in part V, matcoef defined as certain matrix and then it is added by Bootstrap estimator of regression. For the next computation, we don't need the first column in matcoef. So, matcoef1 is defined as matcoef without the first column. The bias, standard error and confidence interval is computed here.
  
 
  
   Part X
  
  
   This part sets the output of bootstrapping data.
  
 
  
   Example:
  
  
   To see how the script works, we use the data set about "Zarthan Company" taken from Neter (1990).
  
 
Dataset is divided into independent variable and dependent variable. For this case, "Target Population" and "Per Capita Discretionary Income" are the independent variable, and "Sales" is the dependent variable.
> y [1] 162 120 223 131 67 169 81 192 116 55 252 232 144 103 212 > x x1 x2 [1,] 1 274 2450 [2,] 1 180 3254 [3,] 1 375 3802 [4,] 1 205 2838 [5,] 1 86 2347 [6,] 1 265 3782 [7,] 1 98 3008 [8,] 1 330 2450 [9,] 1 195 2137 [10,] 1 53 2560 [11,] 1 430 4020 [12,] 1 372 4427 [13,] 1 236 2660 [14,] 1 157 2088 [15,] 1 370 2605 >
Suppose you have save the script in directory name " D:/MyDirectory ", then you can load this script into R with the following command
  
   
    > source("D:/MyDirectory/multipleregboot.R")
   
   
   
  
 
Alternatively you can use menu File > Source R code... and pointing to the script file.
To use the data set above for Boostrap general linear regression model, you type
> multiple.reg.boot(x,y,3,500,15,0.1)
The command above will generate 15 Bootstrap samples, each sample has 500 replication from the orginal data set that has independent variable name x and dependent variable name y with 10% significant level. The independent variables contain 3 columns. The result of this computation is as follows:
Least Square Estimator 
   [,1] [,2]
   0 3.45261279
   x1 1 0.49600498
   x2 2 0.00919908
         
 Standard Error for Beta x1 x2 2.4306504935 0.0060544412 0.0009681139
90 % Confidence Interval for Beta 
           Lower Bound 
           
   [,1]
   -0.879505337
   x1 0.485214221
   x2 0.007473624
 Upper Bound 
             
   [,1]
   7.78473092
   x1 0.50679573
   x2 0.01092454
 Bootstrap Correlation Model 
           bias beta 
           
   [,1]
   -8.032102e-02
   x1 -3.723742e-04
   x2 1.598148e-05
 standard error beta x1 x2 3.026618419 0.005882961 0.001164328
90 % Confidence Interval for beta 
           lower bound = 
   lobound0
   loboundquant 0 -1.78008148
   loboundquant 1 0.48541920
   loboundquant 2 0.00719702
 upper bound = upbound0 upboundquant 0 8.76600905 upboundquant 1 0.50505110 upboundquant 2 0.01119679
The output above shows that Bootstrap also did a good job in estimating the property of general linear regression's estimators. The result from Bootstrap method is not differ very much from the classical method. Table below show some comparison between Bootstrap method and Classical Least square method.
   
 
  
   REFERENCES
  
  
  
  
   Dalgaard, P. (2002). Introductory Statistics with R. New York: Springer.
  
 
Efron, B. and Tibishirani, R. J. (1993). An Introduction to the Bootstrap. London�: Chapman & Hall.
Greene, W. H. (2000). Econometric Analysis. New York: Macmillan Publishing Company.
Neter, J., and friends. (1990). Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs, Third Edition. Boston: Irwin.
  R Development Core Team (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
  
  
 
This tutorial is copyrighted .
Preferable reference for this tutorial is
Filiola, L., (2006) Bootstrap Computation using R, http://people.revoledu.com/kardi/tutorial/Bootstrap/Lyra/index.html
