fsrm 16:958:587 advanced simulation methods for finance (lecture 4) · 2016. 2. 17. · (lecture 4)...
TRANSCRIPT
![Page 1: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/1.jpg)
FSRM 16:958:587 Advanced SimulationMethods for Finance
(Lecture 4)
Min-ge Xie
Department of Statistics & Biostatistics,Rutgers University
![Page 2: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/2.jpg)
Bootstrap – a Simulation & Resampling Method
General statements (overly simplified intro/some key words)
Bootstrap method in Statistics is a resampling (simulation)approach for making statistical inference for unknownparameters (of a underlying population from which the observedsample data are generated).
Bootstrap samples are simulated “phantom" samples based onobserved sample data;Bootstrap distributions are derived from the bootstrap samplesand they can be used to make statistical inference.
� Although it’s a simulation method, it is a little different from thesimulation techniques that we’ve learned before.
![Page 3: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/3.jpg)
![Page 4: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/4.jpg)
Bootstrap – a Simulation & Resampling Method
Motivation/background
The primary task of a statistician is to summarize a samplebased study and generalize the finding to the parent(underlying) population in a scientific manner.
The summary (often through a sample statistic such asmean, median, correlation, etc) will fluctuate from sampleto sample
We would like to know the magnitude of these fluctuationsto get an overall picture — This fluctuation can often bedescribed in the form of a probability distribution called asampling distribution.
![Page 5: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/5.jpg)
Bootstrap – a Simulation & Resampling Method
Motivation/background (continue) —Suppose we do not make much assumption (do not knowmuch) about the underlying population:
Ideally, if we can repeated draw samples from the targetdistribution again and again =⇒We can have multiple(many) copies of sample statistic in these repeatedlydrawing samples =⇒ The multiple (many) copies ofsample statistic can then provide us a good idea about thefluctuation and the sampling distribution.
But, in reality, we only have one set (copy) of observeddata (sample).
![Page 6: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/6.jpg)
Bootstrap – a Simulation & Resampling Method
The idea behind bootstrap is to use the observed sample data as a“surrogate population”, for the purpose of approximating the samplingdistribution of a statistic
Specifically,
— We resample with replacement from the sample data at hand andcreate a large number of “phantom samples” known as bootstrapsamples.
— These bootstrap samples can be used to quantify the fluctuation (‘makeinference’) of a "population parameter" of the “surrogate population”.
— Under some conditions, the “phantom" inference is the same as (canhelp to derive) the real inference that we are looking for.
This leads us to the "bootstrap method"!
![Page 7: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/7.jpg)
Bootstrap – a Simulation & Resampling Method
The idea behind bootstrap is to use the observed sample data as a“surrogate population”, for the purpose of approximating the samplingdistribution of a statistic
Specifically,
— We resample with replacement from the sample data at hand andcreate a large number of “phantom samples” known as bootstrapsamples.
— These bootstrap samples can be used to quantify the fluctuation (‘makeinference’) of a "population parameter" of the “surrogate population”.
— Under some conditions, the “phantom" inference is the same as (canhelp to derive) the real inference that we are looking for.
This leads us to the "bootstrap method"!
![Page 8: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/8.jpg)
Bootstrap – a Simulation & Resampling Method
Read a companion note (introductory review article) by Singh and Xie
(2010) in International Encyclopedia of Education.
(http://stat.rutgers.edu/~mxie/RCPapers/bootstrap.pdf)or(http://www.sciencedirect.com/science/referenceworks/9780080448947)
![Page 9: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/9.jpg)
Bootstrap – a Simulation & Resampling Method
Setting/Setup:We have a sample data set x1, . . . , xn
i.i.d∼ F (x); Also, let θbe a population characteristic of the distribution F , and wehave an estimator θ for θ, where θ = θ (x1, . . . , xn) is afunction of the sample set x = (x1, . . . , xn).
For example, θ is the mean of distribution F (populationmean) and θ = X .
Goal:We need to make an inference about θ – Beside the pointestimator θ, we like to know the sampling distribution ofθ = θ (x1, . . . , xn); in particular, find a confidence intervalfor θ etc.
![Page 10: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/10.jpg)
Bootstrap – a Simulation & Resampling Method
Bootstrap (generate) a set of new (“phantom”) data
From the observed sample data set {x1, x2, . . . , xn},resample with replacement to get a new data set of size n:
— Randomly pick (each with probability 1/n) a data point from{x1, x2, . . . , xn} and set it to be x∗
1 ; repeat the exactly samerandom pick n − 1 times to get x∗
2 , x,3 . . . , x
∗n .
This new set of data {x∗1 , . . . , x
∗n} is called a set of
bootstrap sample.
� To make statistical inference, we repeat this bootstrappingsampling process a large number of (say N) times to get Nsets of bootstrap samples.
![Page 11: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/11.jpg)
Bootstrap – a Simulation & Resampling Method
Bootstrap (generate) a set of new (“phantom”) data
From the observed sample data set {x1, x2, . . . , xn},resample with replacement to get a new data set of size n:
— Randomly pick (each with probability 1/n) a data point from{x1, x2, . . . , xn} and set it to be x∗
1 ; repeat the exactly samerandom pick n − 1 times to get x∗
2 , x,3 . . . , x
∗n .
This new set of data {x∗1 , . . . , x
∗n} is called a set of
bootstrap sample.
� To make statistical inference, we repeat this bootstrappingsampling process a large number of (say N) times to get Nsets of bootstrap samples.
![Page 12: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/12.jpg)
Bootstrap – a Simulation & Resampling Method
A bootstrap sampling algorithm to get a CI for θ:1 Generating N bootstrap datasets, each of size n and
compute the corresponding bootstrap estimator θ∗:1st : {x∗
1 , . . . , x∗n }[1] ∼ {x1, . . . , xn}, θ∗1 = θ
({x∗
1 , . . . , x∗n }[1]
)2nd : {x∗
1 , . . . , x∗n }[2] ∼ {x1, . . . , xn}, θ∗2 = θ
({x∗
1 , . . . , x∗n }[2]
)...
Nth: {x∗1 , . . . , x
∗n }[N] ∼ {x1, . . . , xn}, θ∗N = θ
({x∗
1 , . . . , x∗n }[N]
)2 Sort {θ∗1, θ∗2, . . . , θ∗N} from the smallest to the largest.
Now we have θ∗(1) ≤ θ∗(2) ≤ · · · ≤ θ
∗(N); a histogram could be
constructed.
![Page 13: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/13.jpg)
Bootstrap – a Simulation & Resampling Method
Claim (symmetric case): A (1− α)100% confidenceInterval for the parameter θ is simply[
θ∗(L), θ∗(U)
],
where L =αN2
and U =(
1− α
2
)N, for 0 < α < 1.
For example, a 95% C.I. for parameter θ is[θ∗(25), θ
∗(975)
], if
N = 1000(Note: L = 1000× .025 = 25, U = 1000× .975 = 975)An equivalent way to write the above confidence interval issimply
[θ∗α/2, θ
∗1−α/2
]
![Page 14: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/14.jpg)
Bootstrap – a Simulation & Resampling Method
Claim: (asymmetric case) A (1− α)100% confidenceInterval for the parameter θ is[
2θ − θ∗(U),2θ − θ∗(L)
],
where L =αN2
and U =(
1− α
2
)N, for 0 < α < 1.
For example, a 95% C.I. for parameter θ is just[2θ − θ∗(975),2θ − θ
∗(25)
], if N = 1000.
![Page 15: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/15.jpg)
Bootstrap – a Simulation & Resampling Method
Example (cf., Example 2 of the companion note by Singh andXie (2010)):Data: Two types of measurements to assess body fat in n = 20collegiate football players
BOD 2.5 4.0 4.1 6.2 7.1 7.0 8.3 9.2 9.3 12.0 12.2 12.6 14.2 14.4 15.1 15.2 16.3 17.1 17.9 17.9 HW 8.0 6.2 9.2 6.4 8.6 12.2 7.2 12.0 14.9 12.1 15.3 14.8 14.3 16.3 17.9 19.5 17.5 14.3 18.3 16.2
— BOD is BOD POD, a whole body air-displacement plethysmograph
— HW refers to hydrostatic weighing.
Question: To study the correlation between the BOD and HWmeasurements (find a confidence interval)
![Page 16: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/16.jpg)
Bootstrap – a Simulation & Resampling Method
## R program of bootstrap algorithm for correla3on parameter ## Example 2 of Singh and Xie (2010) # Data BOD = scan() 2.5 4.0 4.1 6.2 7.1 7.0 8.3 9.2 9.3 12.0 12.2 12.6 14.2 14.4 15.1 15.2 16.3 17.1 17.9 17.9 HW = scan() 8.0 6.2 9.2 6.4 8.6 12.2 7.2 12.0 14.9 12.1 15.3 14.8 14.3 16.3 17.9 19.5 17.5 14.3 18.3 16.2 data.ex2 = cbind(BOD, HW) ## Boxplot of the data and the scaNer plot par(mfrow = c(1,2)) boxplot(data.ex2); plot(BOD, HW)
![Page 17: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/17.jpg)
Bootstrap – a Simulation & Resampling Method
BOD HW
510
1520
5 10 15
68
1012
1416
1820
BOD
HW
![Page 18: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/18.jpg)
Bootstrap – a Simulation & Resampling Method
A generic bootstrap algorithm for this example
Step 1: At each iteration k = 0, 1, 2, . . . ,N = 1000, generate a
bootstrap data set of n = 20 pairs by repeating the following procedure:
1 For i = 1, . . . ,20, randomly sample a pair (x∗i , y
∗i ) from the
20 observed data pairs {(2.5,8.0), (4.0,6.2), . . . , (17.9,16.2)} (sample with replacement);These new 20 pairs form a bootstrap sample set(x∗,y∗) = {(x∗
1 , y∗1 ), (x
∗2 , y
∗2 ), . . . , (x
∗20, y
∗20)}.
2 Compute the bootstrap sample correlation coefficientρ∗ = corr(x∗,y∗).
Step 2: Produce a histogram using the N = 1000 ρ∗’s and also sortthese ρ∗. The histogram (next page) suggests that the bootstrapdistribution is skewed; so the 95% confidence interval for ρ is[2ρ− ρ∗(975), 2ρ− ρ∗(25)].
![Page 19: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/19.jpg)
Bootstrap – a Simulation & Resampling Method
## R program of bootstrap algorithm for correla3on parameter ## Example 2 of Singh and Xie (2010) # Bootstrapping and calcula3on of bootstrap corr coef. corr.b=matrix(0,1000) for(i in 1:1000) { # sample genera3ng a set of new bootstrap sample indx = sample(1:nrow(data.ex2), replace = T) data.bt = data.ex2[indx,] # calculate correla3on coeeficient corr.b[i]= cor(data.bt[,1], data.bt[,2]) }
![Page 20: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/20.jpg)
Bootstrap – a Simulation & Resampling Method
## Es%mate of corr using the orginal data cor(data.ex2[,1], data.ex2[,2]); cor(BOD, HW) 1] 0.8678753 [1] 0.8678753 # Histogram and boostrap 95% CI hist(corr.b);summary(corr.b) Min. :0.6495 1st Qu.:0.8434 Median :0.8736 Mean :0.8667 3rd Qu.:0.8966 Max. :0.9584 corr.b.srt = sort(corr.b) CI.95 = c(2* cor(BOD, HW) -‐ corr.b.srt[975], 2 * cor(BOD, HW) -‐ corr.b.srt[25]); CI.95 [1] 0.7998790 0.9692848
![Page 21: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/21.jpg)
Bootstrap – a Simulation & Resampling Method
Histogram of corr.b
corr.b
Frequency
0.6 0.7 0.8 0.9 1.0
0100
200
300
400
500
![Page 22: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/22.jpg)
Bootstrap – a Simulation & Resampling Method
Bootstrap Central Limit Theory (Singh, 1981):
Theorem: Under some mild conditions, we have when n islarge (n→∞),
(θ∗ − θ)∣∣∣∣θ ∼ (θ − θ0)
∣∣∣∣θ0. (1)
Proof: Omitted.
(Notation: The distribution (1) has a cumulative distributionfunction G(·). )
![Page 23: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/23.jpg)
Bootstrap – a Simulation & Resampling Method
Based on the Bootstrap Central Limit Theory, we can show thatthe claims on pages 14-15 are justified.
Proof of the claims on page 14-15:Case (i) The distribution (1) is symmetric.
We define the cumulative distribution of the bootstrap estimatorwhen given the sample data:
Bn (t) = P(θ∗ ≤ t |θ
).
(The Bn (t) is also known as bootstrap distribution).
![Page 24: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/24.jpg)
Bootstrap – a Simulation & Resampling Method
We have the following statements:
Bn (t) is monotonically increasing in t (since it is a cumulativedistribution function).
When t = θ∗α, Bn
(θ∗α
)= P
(θ∗ ≤ θ∗α
∣∣θ) = α.
So we know that θ∗α = B−1n (α).
When t = θ0 the true parameter value, we have
Bn (θ0) = P(θ∗ ≤ θ0
∣∣∣∣θ) = P(θ∗ − θ ≤ θ0 − θ
∣∣∣∣θ)= G
(θ0 − θ
)(by G’s deinition)
= G(θ − θ0
)(by symmetry)
∼ U (0, 1) (by the theorem that we also have θ − θ0 ∼ G)
![Page 25: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/25.jpg)
Bootstrap – a Simulation & Resampling Method
So for any α, 0 < α < 1,
P{θ0 ≤ θ∗α
}= P
{θ0 ≤ B−1
n (α)}= P {Bn (θ0) ≤ α}
= P (U ≤ α) = α.
Thus,[θ∗2.5%, θ
∗97.5%
]is a 95% confidence interval for θ (with
95% confidence to cover the true θ0), because
P(θ∗2.5% ≤ θ0 ≤ θ∗97.5%
)= P
(θ0 ≤ θ∗97.5%
)− P
(θ0 ≤ θ∗2.5%
)= 97.5%− 2.5% = 95%.
![Page 26: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/26.jpg)
Bootstrap – a Simulation & Resampling Method
Case (ii) The distribution (1) is not symmetric.We define
Cn (t) = P(
2θ − θ∗ ≤ t |θ).
We have the following statements:
Cn (t) is monotonically increasing in t .
When t = 2θ − θ∗α, Cn
(2θ − θ∗α
)= P
(2θ − θ∗ ≤ 2θ − θ∗α
∣∣θ)= P
(θ∗ ≥ θ∗α
∣∣θ) = 1− α. So we know that 2θ − θ∗α = C−1n (1− α).
When t = θ0 the true parameter value, we have
Cn (θ0) = P(
2θ − θ∗ ≤ θ0
∣∣∣∣θ) = P(θ∗ − θ ≥ θ − θ0
∣∣∣∣θ)= 1−G
(θ − θ0
)(by G’s deinition)
∼ U (0, 1) (by the theorem that we also have θ − θ0 ∼ G)
![Page 27: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/27.jpg)
Bootstrap – a Simulation & Resampling Method
So for any α, 0 < α < 1,
P{θ0 ≤ 2θ − θ∗α
}= P
{θ0 ≤ C−1
n (1− α)}= P {Cn (θ0) ≤ 1− α}
= P (U ≤ 1− α) = 1− α.
Thus,[2θ − θ∗97.5%, 2θ − θ∗2.5%
]is a 95% confidence interval for θ (with 95%
confidence to cover the true θ0), because
P(
2θ − θ∗97.5% ≤ θ0 ≤ 2θ − θ∗2.5%)
= P(θ0 ≤ 2θ − θ∗2.5%
)− P
(θ0 ≤ 2θ − θ∗97.5%
)= 97.5%− 2.5% = 95%.
�
![Page 28: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/28.jpg)
Bootstrap – a Simulation & Resampling Method
Other primary applications (beside CI’s) of the bootstrapsampling method
Approximating Standard Error of a Sample Estimate —Use
seB =
[1N
N∑i=1
(θ∗i − θ
)2
] 12
to estimate the standard error se(θ).
Bias correction by bootstrap — Often, Bias(θ) =θ − θ0 ≈ O(1/n). This bias can be estimated by
BiasB =1N
N∑i=1
θ∗i − θ.
![Page 29: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/29.jpg)
Let’s bootstrap Bill Gates! ... Happy "bootstrapers"
![Page 30: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/30.jpg)
Bootstrap – a Simulation & Resampling Method
Application – Bootstrap method in regression models
Linear regression model
yi = β0 + β1xi + εi , for i = 1,2, . . . ,n,
where εi ∼(0, σ2).
Least square (LS) estimator
β1 =
∑ni=1 (xi − x) (yi − y)∑n
i=1 (xi − x)2 .
We want to make inference on β1.
![Page 31: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/31.jpg)
Bootstrap – a Simulation & Resampling Method
If εi ∼ N(0, σ2),
β1 ∼ N
β1, σ2
{n∑
i=1
(xi − x)2
}−1
(n − 2)s2 ∼ σ2χ2n−2.
=⇒ we can use the conventional t = β1/s (or z when n islarge) test to make inference on β1.
Alternatively, we can use bootstrap approach to makeinference for β1 (only need to assume εi ∼ (0, σ2)).
![Page 32: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/32.jpg)
Bootstrap – a Simulation & Resampling Method
If εi ∼ N(0, σ2),
β1 ∼ N
β1, σ2
{n∑
i=1
(xi − x)2
}−1
(n − 2)s2 ∼ σ2χ2n−2.
=⇒ we can use the conventional t = β1/s (or z when n islarge) test to make inference on β1.
Alternatively, we can use bootstrap approach to makeinference for β1 (only need to assume εi ∼ (0, σ2)).
![Page 33: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/33.jpg)
Bootstrap – a Simulation & Resampling Method
Method 1: Resample data pairs
Resample from (xi , yi) , preserve the pairs(x1, y1) (x∗
1 , y∗1 )
(x2, y2) (x∗2 , y
∗2 )
...bootstrap=⇒
...
(xn, yn) (x∗n , y
∗n )
⇓ ⇓
β1 β∗1
Repeat N times to get N copies of β∗1 ’s.
Based on these N copies of β∗1 ’s, we can make inferenceabout β1 (compute confidence intervals, making tests, etc).
![Page 34: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/34.jpg)
Bootstrap – a Simulation & Resampling Method
Method 2: Resample residuals
Based on the sample data {(x1, y1) , (x2, y2) , · · · , (xn, yn)},we can obtain LS estimates β0 and β1. Also, compute theresiduals {e1, . . . ,en}.
Resample from the residual set {e1, . . . ,en} to obtainbootstrap residuals {e∗
1, . . . ,e∗n}.
Define y∗i = β0 + β1xi + e∗
i , for i = 1, . . . ,n, so that we havea bootstrap data set:
{(x1, y∗
1),(x2, y∗
2), · · · , (xn, y∗
n )}
.
Based on this bootstrap data set, we can get a bootstrapestimate β∗1.
![Page 35: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/35.jpg)
Bootstrap – a Simulation & Resampling Method
Method 2: Resample residuals (continue)
Repeat the last bullet step N times to get N copies of β∗1 ’s.
Based on these N copies of β∗1 ’s, we can make inferenceabout β1 (compute confidence intervals, making tests, etc).
![Page 36: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/36.jpg)
Bootstrap – a Simulation & Resampling Method
### Example: Bootstrap method for regression ## Anne7e Dobson (1990) "An IntroducCon to ## Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <-‐ c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <-‐ c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <-‐ gl(2, 10, 20, labels = c("Ctl","Trt")) weight <-‐ c(ctl, trt) mydata <-‐ data.frame(weight, group) ## Linear regression: lm.D9 <-‐ lm(weight ~ group, data = mydata)
![Page 37: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/37.jpg)
Bootstrap – a Simulation & Resampling Method
> summary(lm.D9) ## Parameter es3mate Coefficients: Es3mate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-‐15 *** groupTrt -‐0.3710 0.3114 -‐1.191 0.249 ## 95% Confidence Intervals: > confint(lm.D9, "groupTrt") 2.5 % 97.5 % groupTrt -‐1.0253 0.2833003
![Page 38: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/38.jpg)
Bootstrap – a Simulation & Resampling Method
## Bootstrap Methods: ## Func3on for method 1: boot.meth1 <-‐ func3on(data = mydata, indices){
data <-‐ data[indices,] # select obs. in bootstrap sample mod <-‐ lm(formula = weight ~ group, data=data) coefficients(mod) # return coefficient vector
} ## Func3on for method 2: boot.meth2 <-‐ func3on(data = mydata, indices, fit = lm.D9){ weight.boot <-‐ fiOed(lm.D9) + residuals(lm.D9)[indices] data.star <-‐ data; data.star[,1] <-‐ weight.boot mod <-‐ lm(weight ~ group, data = data.star)
coefficients(mod) # return coefficient vector }
![Page 39: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/39.jpg)
Bootstrap – a Simulation & Resampling Method
> #### Use my own code to Run boostrap > ## Boostrap sample size 5000 > b1.vec = b2.vec = rep(0, 5000) > for (ii in 1:5000) { + b.indx = sample(1:nrow(mydata), replace = TRUE) + b1.vec[ii] = boot.meth1(mydata, b.indx)["groupTrt"] + b2.vec[ii] = boot.meth2(mydata, b.indx, lm.D9)["groupTrt”]} > ## ## Confidence intervals > b1.vec = sort(b1.vec); b2.vec = sort(b2.vec) > c(low = b1.vec[125], up = b1.vec[4875]); low up -‐0.9650505 0.2376923 > c(low = b2.vec[125], up = b2.vec[4875]); low up -‐0.9632 0.1942 > ## histograms > par(mfrow = c(1,2)); hist(b1.vec); hist(b2.vec)
![Page 40: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/40.jpg)
Bootstrap – a Simulation & Resampling Method
Histogram of b1.vec
b1.vec
Frequency
-1.5 -1.0 -0.5 0.0 0.5
0200
400
600
800
1200
Histogram of b2.vec
b2.vec
Frequency
-1.0 -0.5 0.0 0.5
0200
400
600
800
1200
![Page 41: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/41.jpg)
Bootstrap – a Simulation & Resampling Method
> ## Run bootstrap through the R's "boot" func5on: > library(boot) > out.boot.meth1 <-‐ boot(mydata, boot.meth1, 5000) > out.boot.meth1 Bootstrap Sta5s5cs : original bias std. error t1* 5.032 -‐0.002156102 0.1769029 t2* -‐0.371 0.005725787 0.3045753 > boot.ci(out.boot.meth1, index=2, type=c("norm", "perc")) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 5000 bootstrap replicates Intervals : Level Normal Percen5le 95% (-‐0.9737, 0.2202 ) (-‐0.9492, 0.2200 ) Calcula5ons and Intervals on Original Scale
![Page 42: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/42.jpg)
Bootstrap – a Simulation & Resampling Method
> out.boot.meth2 <-‐ boot(mydata, boot.meth2, 5000) > out.boot.meth2 Bootstrap Sta;s;cs : original bias std. error t1* 5.032 0.00198706 0.2065991 t2* -‐0.371 -‐0.00548282 0.2957972 > boot.ci(out.boot.meth2, index=2, type=c("norm", "perc")) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 5000 bootstrap replicates Intervals : Level Normal Percen;le 95% (-‐0.9453, 0.2142 ) (-‐0.9500, 0.2023 ) Calcula;ons and Intervals on Original Scale
![Page 43: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/43.jpg)
Bootstrap – a Simulation & Resampling Method
> plot(out.boot.meth1, index = 2)
Histogram of t
t*
Density
-1.5 -1.0 -0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
-3 -2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
Quantiles of Standard Normal
t*
![Page 44: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/44.jpg)
Bootstrap – a Simulation & Resampling Method
> plot(out.boot.meth2, index = 2)
Histogram of t
t*
Density
-1.5 -1.0 -0.5 0.0 0.5
0.0
0.5
1.0
1.5
-3 -2 -1 0 1 2 3
-1.5
-1.0
-0.5
0.0
0.5
Quantiles of Standard Normal
t*
![Page 45: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/45.jpg)
Bootstrap – a Simulation & Resampling Method
Further remarks on bootstrap estimation
We introduced the bootstrap approach and illustrated itusing some basic and regression examples. Themethodology is very broad and can be used in manyapplications.
– It is a simulation based method, one may not get exactlythe same numerical answer when repeating the same code.(A common practical solution to this problem: fix randomseed at the beginning.)
![Page 46: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/46.jpg)
Bootstrap – a Simulation & Resampling Method
Further remarks on bootstrap estimation (continue...)
Most bootstrap methods are developed to studyindependent observations. When used to studycorrelations or dependent data, the key is to preserve thecorrelation/dependence.
– For example, in our examples on correlation coefficientsand regressions, we have tried to preserve thecorrelation/dependence.
– For dependent samples (for examples, time series models,Brownian motion or other stochastic processes), a usefulscheme of moving-block bootstrap. [Self study material -
(http://www2.econ.iastate.edu/classes/econ674/
bunzel/documents/DepBootstrap.pdf)]
![Page 47: FSRM 16:958:587 Advanced Simulation Methods for Finance (Lecture 4) · 2016. 2. 17. · (Lecture 4) Min-ge Xie Department of Statistics & Biostatistics, Rutgers University. Bootstrap](https://reader035.vdocument.in/reader035/viewer/2022071000/5fbc41695141673f1f462a6d/html5/thumbnails/47.jpg)
Good night!