 < Back | Next | Index >

## Bootstrap Statistics for Generalized Linear Regression

In this section we will performs Bootstrap for general linear regression model (or so called Multiple Linear regression model). You can input as many independent variables as you want. Note that the independent variable should have been a matrix with the first column filled by "1".

For example:

``` > x1<-1:10

> x2<-11:20

> x3<-21:30

> x<-matrix(1,10)

> x<-cbind(x,x1,x2,x3)

> x

x1 x2 x3

[1,] 1 1 11 21

[2,] 1 2 12 22

[3,] 1 3 13 23

[4,] 1 4 14 24

[5,] 1 5 15 25

[6,] 1 6 16 26

[7,] 1 7 17 27

[8,] 1 8 18 28

[9,] 1 9 19 29

[10,] 1 10 20 30

>```

In this section we will discuss about the script I made to perform Boostrap on general linear regression model. Similar to previous section, the script is separated part by part and the explanation of each part of the script will follow after that. Later, you can see some example application on how to use this script using R. The script itself can be downloaded here.

The Bootstrap's script for general linear regression case is as follows:

```  I multiple.reg.boot<-function(x,y,p,b,k,alpha)

{

II n<-length(y)

int1<-t(x)

int2<-int1%*%x

int3<-solve(int2)

int4<-int1%*%y

koef.beta<-int3%*%int4

index.beta<-matrix(0:(p-1))

out<-cbind(index.beta,koef.beta)

cat("Least Square Estimator","\n")

print(out)```
```   cat("\n")

III cat("Standard Error for Beta","\n")

int5<-t(y)

int6<-t(koef.beta)

int7<-int5%*%y

int8<-int6%*%int1

int9<-int8%*%y

sse<-int7-int9

mse<-sse/(n-p)

msedata<-mse

var.all<-msedata*int3

var.koef.beta<-diag(var.all)

se.koef.beta<-sqrt(var.koef.beta)

print(se.koef.beta)

cat("\n")```
```IV cat(((1-(alpha))*100),"% Confidence Interval for Beta","\n")

low<-koef.beta-qt((1-(alpha/2)),(n-p))*se.koef.beta

up<-koef.beta+qt((1-(alpha/2)),(n-p))*se.koef.beta

cat("Lower Bound","\n")

print(low)

cat("\n")

cat("Upper Bound","\n")

print(up)

cat("\n")

```
```V matcoef<-matrix(0:(p-1))

beta.boot=NULL

for(i in 1:b)

{

VI v<-sample(1:n,k,replace=TRUE)

ystarpre<-y[v]

xstarpre<-x[v,1:p]

ystar<-data.matrix(ystarpre)

xstar<-data.matrix(xstarpre)

inta<-t(xstar)

intb<-inta%*%xstar

detb<-det(intb)```
```VII if(detb==0)

{

repeat

{

v<-sample(1:n,k,replace=TRUE)

ystarpre<-y[v]

xstarpre<-x[v,1:p]

ystar<-data.matrix(ystarpre)

xstar<-data.matrix(xstarpre)

inta<-t(xstar)

intb<-inta%*%xstar

detb<-det(intb)

if(detb!=0) next

}

}```
```VIII intc<-solve(intb)

intd<-inta%*%ystar

koef.beta.boot<-intc%*%intd

{

beta.boot=koef.beta.boot

}

matcoef<-cbind(matcoef,beta.boot)

}```
```IX matcoef1<-matcoef[,2:(b+1)]

mean.beta.boot<-apply(matcoef1,1,mean)

bias.beta.boot<-mean.beta.boot-koef.beta

var.beta.boot<-apply(matcoef1,1,var)

se.beta.boot<-sqrt(var.beta.boot)```
``` loboundpre<-matrix(0:(p-1))

upboundpre<-matrix(0:(p-1))

for (t in 1:p)

{

matquant<-matcoef1[t,]

loboundquant<-quantile(matquant,(alpha/2))

upboundquant<-quantile(matquant,(1-(alpha/2)))```
``` loboundpre<-rbind(loboundpre,loboundquant)

upboundpre<-rbind(upboundpre,upboundquant)

}```
``` lobound0<-loboundpre[(p+1):(2*p),]

upbound0<-upboundpre[(p+1):(2*p),]```
``` lobound<-matrix(0:(p-1))

lobound<-cbind(lobound,lobound0)```
``` upbound<-matrix(0:(p-1))

upbound<-cbind(upbound,upbound0)

```
```X cat("Bootstrap Correlation Model","\n")

cat("bias beta","\n")

print(bias.beta.boot)

cat("\n")

cat("standard error beta","\n")

print(se.beta.boot)

cat("\n")

cat((1-alpha)*100,"% Confidence Interval for beta","\n")

cat("lower bound =","\n")

print(lobound)

cat("upper bound =","\n")

print(upbound)

cat("\n")```
` }`

Next I will explain the script above part by part

Part I
This program named " multiple.reg.boot " and the inputs are as follows:
x : matrix of independent variable
y : matrix of dependent variable
p : number column of x
b : number of Bootstrap's replications
k : length of Bootstrap sample
alpha : significance level

Part II
This part is calculating regression's estimator using least square method.

Part III
This part calculates the standard error for estimator of regression.

Part IV
This part calculates the confidence interval for estimator of regression.

Part V
This part is the beginning of Bootstrapping. First, we define matcoef and Bootstrap estimator for regression, beta.boot. Those statistics will be bootstrapped.

Part VI
This part obtains Bootstrap sample.

Part VII
This part anticipates if the determinant of is equal to 0, so that the inverse is not defined. In this part, syntax at part VI is repeated until it finds such Bootstrap sample that the determinant of is not equal to 0.

Part VIII
This part is the finishing of Bootstrapping data. The result is stored at a matrix named matcoef.

Part IX
Remember in part V, matcoef defined as certain matrix and then it is added by Bootstrap estimator of regression. For the next computation, we don't need the first column in matcoef. So, matcoef1 is defined as matcoef without the first column. The bias, standard error and confidence interval is computed here.

Part X
This part sets the output of bootstrapping data.

Example:
To see how the script works, we use the data set about "Zarthan Company" taken from Neter (1990).

Dataset is divided into independent variable and dependent variable. For this case, "Target Population" and "Per Capita Discretionary Income" are the independent variable, and "Sales" is the dependent variable.

```> y

 162 120 223 131 67 169 81 192 116 55 252 232 144 103 212

> x

x1 x2

[1,] 1 274 2450

[2,] 1 180 3254

[3,] 1 375 3802

[4,] 1 205 2838

[5,] 1 86 2347

[6,] 1 265 3782

[7,] 1 98 3008

[8,] 1 330 2450

[9,] 1 195 2137

[10,] 1 53 2560

[11,] 1 430 4020

[12,] 1 372 4427

[13,] 1 236 2660

[14,] 1 157 2088

[15,] 1 370 2605

>```

Suppose you have save the script in directory name " D:/MyDirectory ", then you can load this script into R with the following command

> source("D:/MyDirectory/multipleregboot.R")

Alternatively you can use menu File > Source R code... and pointing to the script file.

To use the data set above for Boostrap general linear regression model, you type

> multiple.reg.boot(x,y,3,500,15,0.1)

The command above will generate 15 Bootstrap samples, each sample has 500 replication from the orginal data set that has independent variable name x and dependent variable name y with 10% significant level. The independent variables contain 3 columns. The result of this computation is as follows:

```Least Square Estimator

[,1] [,2]

0 3.45261279

x1 1 0.49600498

x2 2 0.00919908

```
```Standard Error for Beta

x1 x2

2.4306504935 0.0060544412 0.0009681139 ```
```90 % Confidence Interval for Beta

Lower Bound

[,1]

-0.879505337

x1 0.485214221

x2 0.007473624```
```Upper Bound

[,1]

7.78473092

x1 0.50679573

x2 0.01092454```
```Bootstrap Correlation Model

bias beta

[,1]

-8.032102e-02

x1 -3.723742e-04

x2 1.598148e-05```
```standard error beta

x1 x2

3.026618419 0.005882961 0.001164328 ```
```90 % Confidence Interval for beta

lower bound =

lobound0

loboundquant 0 -1.78008148

loboundquant 1 0.48541920

loboundquant 2 0.00719702```
```upper bound =

upbound0

upboundquant 0 8.76600905

upboundquant 1 0.50505110

upboundquant 2 0.01119679```

The output above shows that Bootstrap also did a good job in estimating the property of general linear regression's estimators. The result from Bootstrap method is not differ very much from the classical method. Table below show some comparison between Bootstrap method and Classical Least square method. REFERENCES
Dalgaard, P. (2002). Introductory Statistics with R. New York: Springer.