bootstrap methods - (an introduction) · 2020. 10. 26. · thejackknife: introduction...

Bootstrap Methods(An introduction)

N. Torelli, L. Egidi, G. Di CredicoFall 2020

University of Trieste

1

Resampling methods

The nonparametric bootstrap

The parametric bootstrap

Bootstrap-based confidence intervals

2

Resampling methods

The idea of resampling methods

Resampling methods are computer-intensive methods that employsimulation to carry out inferential conclusions for the data available.

In some sense, they replace mathematical formulas with computersimulation, though proving their validity requires quite sophisticatedmathematics.

There are several such methods, but by far the most important arebootstrap methods. They are relatively modern, but their initialdevelopment predates the modern computer age!

(Note: this lecture follows in particular the CASI book)

3

The jackknife: introduction

The jackknife is, so to speak, the ancestor of the bootstrap. Its mainusage is to obtain a nonparametric estimate of the standard error of anestimate, resulting in a simpler alternative to the delta method for complexfunctions of model parameters.

Let us consider a random sample yi , . . . , yn, with Yi ∼ F , for somedistribution F .

We are interested in a real-valued statistic (parameter estimate) ψ̂ = s(y),where s(·) is a given function of the n observations.

4

The jackknife: details

Let y(i) be the sample without the i-th observation yi

y(i) = (y1, y2, . . . , yi−1, yi+1, . . . , yn)

so that ψ̂(i) = s(y(i)) is the corresponding statistic of interest.

The jackknife estimate of standard error for ψ̂ is

ŜEjack =[

n − 1n

n∑i=1

(ψ̂(i) − ψ̂(·)

)2]1/2, with ψ̂(·) =

1n

n∑i=1

ψ̂(i) .

Note: the formula works also when each observation yi is multidimensional(e.g. for regression models).

5

The jackknife: comments

1. When ψ̂ = y , with some algebra we obtain that ŜEjack = s/√

n, theusual estimated standard error of the mean. (This is the reason tointroduce the factor (n − 1)/n in the definition).

2. The jackknife standard error is upwardly biased as an estimate of thetrue standard error. The bias disappears with larger n.

3. The important property of the procedure is that the definition can beapplied to any statistic of interest, even very complex ones.

We only need an algorithm to compute s(y): computer power replacesthe theoretical Taylor series calculations of the delta method.

6

An example

Let us consider the standard error of the correlation coefficient for arandom sample of bivariate normal data (x1, y1)>, . . . , (xn, yn)>:

ψ̂ =∑

i(xi − x) (yi − y)√∑i(xi − x)2

√∑i(yi − y)2

This is an estimate of the corresponding parameter cov(Xi ,Yi)/(σxσy ),and we can compute its standard error by the multidimensional version ofthe delta method.

The related formula is not exactly friendly (from CASI book, page 157)

7

R lab: the jackknife at work I

As a first example, let us consider the case of the sample mean.

library(DAAG); y

R lab: the jackknife at work II

The second example concerns the correlation coefficient, and we use thesame data set of the CASI book, the kidneydata dataset; here Tot is acomposite measure of kidney overall function.

load("kidneydata.RData")with(as.data.frame(kidneydata), plot(Tot ~ age))

●

●

●●

●

●●

●

●●

● ●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

● ●

●●

● ●

● ●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

● ●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

20 30 40 50 60 70 80 90

−6

−2

24

age

Tot

9

R lab: the jackknife at work II

The standard error based on the delta method is stored in the variablese_delta, and it is taken from the CASI book. We note that SE_jack isslightly larger.SE_delta

The nonparametric bootstrap

Introduction

As reported in the CASI book, the jackknife lies between classicalmethodology and a full-throated use of electronic computation, whereasthe bootstrap is an undisputed computer-intensive statistical method.

Another important difference is that the bootstrap has a rather wide scopeof application, while instead the jackknife is mainly used for standarderrors.

11

A legendary beginning

The two methods are indeed related, as testified by the paper thatintroduced the bootstrap.

Figure 1: The legendary 1979 paper

12

The bootstrap idea

The bootstrap idea is very simple, and to illustrate it we start from thesame problem introduced for the jackknife, namely the estimation of thestandard error of ψ̂ = s(y).

The standard error requires the computation of var(ψ̂), somethingcomputable by drawing a large number of independent random samplesfrom the true model F .

This is impossible, since F is unknown, so the bootstrap uses instead anestimate F̂ in place of F , and then it proceeds with the simulation.

In particular, when F̂ is the empirical distribution function (we met it inthe very first class) a single simulated sample is obtained by randomselection with replacement from the observed sample.

13

An example (from Boos and Stefanski, 2010, Significance)

Figure 2: n = 25 adult male yearly incomes in a fictitious county

The data were actually generated from a known distribution, namely

Yi ∼ 30 exp(Zi) , Zi ∼ N(0, 1) i = 1, . . . , 25

so that in this case we know the true distribution of the data (thepopulation).

14

Example: two bootstrap samples

Nonparametric bootstrap treats the data of the previous table as thepopulation and draws samples of size n = 25 (with replacement) from it.

y

The bootstrap at work

The bootstrap samples can be used to obtain an estimate of the standarderror: denoted by ψ̂∗b, b = 1, . . . ,B the statistic of interest for eachbootstrap sample, we get

ŜEboot =[

1B − 1

B∑b=1

(ψ̂∗b − ψ̂∗·

)2]1/2, with ψ̂∗· = 1B

B∑b=1

ψ̂∗b .

We can surely go beyond the computation of standard errors, since the setof bootstrap estimates ψ̂∗1, . . . , ψ̂∗B can be used to approximate thedistribution of ψ̂.

16

R lab: bootstrap distribution of ψ̂ = Y

B

R lab: comparison with the true distribution

B

Back to the standard error computation

Provide B is large enough, the bootstrap-based standard error is unbiased,thus outperforming the jackknife.

n

More on the boostrap idea

The bootstrap idea can be appreciated by noticing the parallelinterpretation existing for the statistical model for the sample data

F i.i.d.−→ y s(·)−→ ψ̂

and the bootstrap mechanism

F̂ i.i.d.−→ y∗ s(·)−→ ψ̂∗ .

The link between the two representations is given by the fact that F̂approaches the true F when n→∞, which is the key fact.

20

Comments on nonparametric bootstrap

1. It is completely automatic! The underlying math is not simple, but ithas been rigorously carried out.

2. It is large-sample method, since its accuracy increases with n.

3. Can be extended to any statistic of interest, not just estimatedstandard errors.

4. Can be extended to more complex settings, including some modelswith dependent data.

5. It also has some limitations, like being not appropriate for sampleextremes, such as the minimum or maximum value of the observeddata. (For these latter problems, there are some specific adjustments,but they are not simple).

21

The parametric bootstrap

Parametric bootstrap

Going back to the bootstrap mechanism, there is actually no need F̂ bethe nonparametric estimate of F (the empirical cdf).

Another alternative is to assume a parametric statistical model fθ(y) forthe data, and simulate the bootstrap samples from fθ̂, where as usual θ̂ isa point estimate of θ.

The mechanism becomes

fθ̂ −→ y∗ s(·)−→ ψ̂∗

Extensions to (very) complex models become easier, but a realistic modelis required.

22

R lab: parametric bootstrap for ψ̂ = Y

n

Application to hypothesis testing

Parametric bootstrap can be employed for obtaining p-values bysimulation, also for those cases when the model has some parameters thathave to be estimated also under H0.

For example, we can obtain a fairly good approximation to the exactp-value for the classic one sample t-test.

We only need to keep in mind that the bootstrap samples must begenerated from the model estimated under H0. This means that for testing{

H0 : µ = µ0H1 : µ 6= µ0

for the usual i.i.d. normal model, we need to generate data with µ = µ0and σ2 = σ̂20 =

∑i(yi − µ0)2/n.

24

R lab: t-test by parametric bootstrap

library(DAAG); y

Bootstrap-based confidenceintervals

The bootstrap automation of confidence intervals

We mentioned the standard approximate Wald-type 95% confidenceinterval for a parameter of interest ψ, in a model with parameter θ:

ψ̂ ± 1.96SE(ψ̂)

This is widely used, but it has two shortcomings:

1. It requires the estimated standard error SE(ψ̂), which may be hard tocompute.

2. It is symmetric around the point estimate, and sometimes this leadsto inaccuracy, since the finite sample distribution of ψ̂ (and hence ofthe related pivot) is often asymmetric.

The first point is solved by ŜEboot , but the bootstrap provides somefurther, more satisfactory solutions for confidence intervals.

26

Bootstrap-based confidence intervals

There are several available methods, and an extensive literature. Here wefocus on the main ones (the approach is inspired by the MASS book),which are

1. The percentile method.2. The basic method.3. The studentized method.

These three methods work both for nonparametric and parametricbootstrap.

Further methods exist, such as the BCa method, but they are used lessoften in practice.

27

Running example for confidence intervals

We use the student score dataset of the CASI book as a runningexample. It concerns the score of 22 students in 5 tests:

score

R lab: bootstrap (nonparametric) standard error for the studentscore data

psi_fun

R lab: bootstrap distribution

hist.scott(s_vect, main = "")abline(v = psi_obs, col = 2)

s_vect

Den

sity

0.4 0.5 0.6 0.7 0.8 0.9

01

23

45

6

30

The percentile method

It simply uses the quantiles of the bootstrap distribution ψ̂∗1, . . . , ψ̂∗B .

In the example, we get

perc_ci

The basic method

The basic intervals are based on the idea that the distribution of ψ̂∗ − ψ̂mimics that of ψ̂ − ψ. If this is the case, we would get

0.95 = Pr(L ≤ ψ̂ − ψ ≤ U) ≈ Pr(L ≤ ψ̂∗ − ψ̂ ≤ U)

Using the first probability we obtain that a confidence interval for ψ is(ψ̂ − U, ψ̂ − L), and then we use the second probability to obtain thatL + ψ̂ and U + ψ̂ are estimated by the 2.5% and 97.5% bootstrapquantiles, respectively (here denoted by q∗0.025 and q∗0.975).

Putting the two things together we get the basic bootstrap confidenceinterval

(ψ̂ − U, ψ̂ − L) = (2 ψ̂ − q∗0.975, 2 ψ̂ − q∗0.025)

32

R lab: basic confidence interval

basic_ci

The studentized method

The last method is perhaps the most reliable of all the methods, but itrequires a standard error estimate SE(ψ̂∗) from each bootstrap sample.

Denoting by z∗0.025 and z∗0.975 the bootstrap quantiles of z∗1, . . . , z∗B ,where z∗b = (ψ̂∗b − ψ̂)/SE(ψ̂∗b), the studentized bootstrapconfidence interval is given by

(ψ̂ − SE(ψ̂) z∗0.975, ψ̂ − SE(ψ̂) z∗0.025)

This is perhaps too challenging for the running example, since explicitestimates of SE(ψ̂∗) would be very hard. We could employ the jackknifewithin each bootstrap sample, or a double bootstrap scheme, though . . .

34

Resampling methodsThe nonparametric bootstrapThe parametric bootstrapBootstrap-based confidence intervals

bootstrap methods - (an introduction) · 2020. 10. 26. · thejackknife: introduction...

Documents