bootstrap methods - (an introduction) · 2020. 10. 26. · thejackknife: introduction...

38
Bootstrap Methods (An introduction) N. Torelli, L. Egidi, G. Di Credico Fall 2020 University of Trieste 1

Upload: others

Post on 12-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Bootstrap Methods(An introduction)

    N. Torelli, L. Egidi, G. Di CredicoFall 2020

    University of Trieste

    1

  • Resampling methods

    The nonparametric bootstrap

    The parametric bootstrap

    Bootstrap-based confidence intervals

    2

  • Resampling methods

  • The idea of resampling methods

    Resampling methods are computer-intensive methods that employsimulation to carry out inferential conclusions for the data available.

    In some sense, they replace mathematical formulas with computersimulation, though proving their validity requires quite sophisticatedmathematics.

    There are several such methods, but by far the most important arebootstrap methods. They are relatively modern, but their initialdevelopment predates the modern computer age!

    (Note: this lecture follows in particular the CASI book)

    3

  • The jackknife: introduction

    The jackknife is, so to speak, the ancestor of the bootstrap. Its mainusage is to obtain a nonparametric estimate of the standard error of anestimate, resulting in a simpler alternative to the delta method for complexfunctions of model parameters.

    Let us consider a random sample yi , . . . , yn, with Yi ∼ F , for somedistribution F .

    We are interested in a real-valued statistic (parameter estimate) ψ̂ = s(y),where s(·) is a given function of the n observations.

    4

  • The jackknife: details

    Let y(i) be the sample without the i-th observation yi

    y(i) = (y1, y2, . . . , yi−1, yi+1, . . . , yn)

    so that ψ̂(i) = s(y(i)) is the corresponding statistic of interest.

    The jackknife estimate of standard error for ψ̂ is

    ŜEjack =[

    n − 1n

    n∑i=1

    (ψ̂(i) − ψ̂(·)

    )2]1/2, with ψ̂(·) =

    1n

    n∑i=1

    ψ̂(i) .

    Note: the formula works also when each observation yi is multidimensional(e.g. for regression models).

    5

  • The jackknife: comments

    1. When ψ̂ = y , with some algebra we obtain that ŜEjack = s/√

    n, theusual estimated standard error of the mean. (This is the reason tointroduce the factor (n − 1)/n in the definition).

    2. The jackknife standard error is upwardly biased as an estimate of thetrue standard error. The bias disappears with larger n.

    3. The important property of the procedure is that the definition can beapplied to any statistic of interest, even very complex ones.

    We only need an algorithm to compute s(y): computer power replacesthe theoretical Taylor series calculations of the delta method.

    6

  • An example

    Let us consider the standard error of the correlation coefficient for arandom sample of bivariate normal data (x1, y1)>, . . . , (xn, yn)>:

    ψ̂ =∑

    i(xi − x) (yi − y)√∑i(xi − x)2

    √∑i(yi − y)2

    This is an estimate of the corresponding parameter cov(Xi ,Yi)/(σxσy ),and we can compute its standard error by the multidimensional version ofthe delta method.

    The related formula is not exactly friendly (from CASI book, page 157)

    7

  • R lab: the jackknife at work I

    As a first example, let us consider the case of the sample mean.

    library(DAAG); y

  • R lab: the jackknife at work II

    The second example concerns the correlation coefficient, and we use thesame data set of the CASI book, the kidneydata dataset; here Tot is acomposite measure of kidney overall function.

    load("kidneydata.RData")with(as.data.frame(kidneydata), plot(Tot ~ age))

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●● ●

    ●●

    ●●

    ● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    20 30 40 50 60 70 80 90

    −6

    −2

    24

    age

    Tot

    9

  • R lab: the jackknife at work II

    The standard error based on the delta method is stored in the variablese_delta, and it is taken from the CASI book. We note that SE_jack isslightly larger.SE_delta

  • The nonparametric bootstrap

  • Introduction

    As reported in the CASI book, the jackknife lies between classicalmethodology and a full-throated use of electronic computation, whereasthe bootstrap is an undisputed computer-intensive statistical method.

    Another important difference is that the bootstrap has a rather wide scopeof application, while instead the jackknife is mainly used for standarderrors.

    11

  • A legendary beginning

    The two methods are indeed related, as testified by the paper thatintroduced the bootstrap.

    Figure 1: The legendary 1979 paper

    12

  • The bootstrap idea

    The bootstrap idea is very simple, and to illustrate it we start from thesame problem introduced for the jackknife, namely the estimation of thestandard error of ψ̂ = s(y).

    The standard error requires the computation of var(ψ̂), somethingcomputable by drawing a large number of independent random samplesfrom the true model F .

    This is impossible, since F is unknown, so the bootstrap uses instead anestimate F̂ in place of F , and then it proceeds with the simulation.

    In particular, when F̂ is the empirical distribution function (we met it inthe very first class) a single simulated sample is obtained by randomselection with replacement from the observed sample.

    13

  • An example (from Boos and Stefanski, 2010, Significance)

    Figure 2: n = 25 adult male yearly incomes in a fictitious county

    The data were actually generated from a known distribution, namely

    Yi ∼ 30 exp(Zi) , Zi ∼ N(0, 1) i = 1, . . . , 25

    so that in this case we know the true distribution of the data (thepopulation).

    14

  • Example: two bootstrap samples

    Nonparametric bootstrap treats the data of the previous table as thepopulation and draws samples of size n = 25 (with replacement) from it.

    y

  • The bootstrap at work

    The bootstrap samples can be used to obtain an estimate of the standarderror: denoted by ψ̂∗b, b = 1, . . . ,B the statistic of interest for eachbootstrap sample, we get

    ŜEboot =[

    1B − 1

    B∑b=1

    (ψ̂∗b − ψ̂∗·

    )2]1/2, with ψ̂∗· = 1B

    B∑b=1

    ψ̂∗b .

    We can surely go beyond the computation of standard errors, since the setof bootstrap estimates ψ̂∗1, . . . , ψ̂∗B can be used to approximate thedistribution of ψ̂.

    16

  • R lab: bootstrap distribution of ψ̂ = Y

    B

  • R lab: comparison with the true distribution

    B

  • Back to the standard error computation

    Provide B is large enough, the bootstrap-based standard error is unbiased,thus outperforming the jackknife.

    n

  • More on the boostrap idea

    The bootstrap idea can be appreciated by noticing the parallelinterpretation existing for the statistical model for the sample data

    F i.i.d.−→ y s(·)−→ ψ̂

    and the bootstrap mechanism

    F̂ i.i.d.−→ y∗ s(·)−→ ψ̂∗ .

    The link between the two representations is given by the fact that F̂approaches the true F when n→∞, which is the key fact.

    20

  • Comments on nonparametric bootstrap

    1. It is completely automatic! The underlying math is not simple, but ithas been rigorously carried out.

    2. It is large-sample method, since its accuracy increases with n.

    3. Can be extended to any statistic of interest, not just estimatedstandard errors.

    4. Can be extended to more complex settings, including some modelswith dependent data.

    5. It also has some limitations, like being not appropriate for sampleextremes, such as the minimum or maximum value of the observeddata. (For these latter problems, there are some specific adjustments,but they are not simple).

    21

  • The parametric bootstrap

  • Parametric bootstrap

    Going back to the bootstrap mechanism, there is actually no need F̂ bethe nonparametric estimate of F (the empirical cdf).

    Another alternative is to assume a parametric statistical model fθ(y) forthe data, and simulate the bootstrap samples from fθ̂, where as usual θ̂ isa point estimate of θ.

    The mechanism becomes

    fθ̂ −→ y∗ s(·)−→ ψ̂∗

    Extensions to (very) complex models become easier, but a realistic modelis required.

    22

  • R lab: parametric bootstrap for ψ̂ = Y

    n

  • Application to hypothesis testing

    Parametric bootstrap can be employed for obtaining p-values bysimulation, also for those cases when the model has some parameters thathave to be estimated also under H0.

    For example, we can obtain a fairly good approximation to the exactp-value for the classic one sample t-test.

    We only need to keep in mind that the bootstrap samples must begenerated from the model estimated under H0. This means that for testing{

    H0 : µ = µ0H1 : µ 6= µ0

    for the usual i.i.d. normal model, we need to generate data with µ = µ0and σ2 = σ̂20 =

    ∑i(yi − µ0)2/n.

    24

  • R lab: t-test by parametric bootstrap

    library(DAAG); y

  • Bootstrap-based confidenceintervals

  • The bootstrap automation of confidence intervals

    We mentioned the standard approximate Wald-type 95% confidenceinterval for a parameter of interest ψ, in a model with parameter θ:

    ψ̂ ± 1.96SE(ψ̂)

    This is widely used, but it has two shortcomings:

    1. It requires the estimated standard error SE(ψ̂), which may be hard tocompute.

    2. It is symmetric around the point estimate, and sometimes this leadsto inaccuracy, since the finite sample distribution of ψ̂ (and hence ofthe related pivot) is often asymmetric.

    The first point is solved by ŜEboot , but the bootstrap provides somefurther, more satisfactory solutions for confidence intervals.

    26

  • Bootstrap-based confidence intervals

    There are several available methods, and an extensive literature. Here wefocus on the main ones (the approach is inspired by the MASS book),which are

    1. The percentile method.2. The basic method.3. The studentized method.

    These three methods work both for nonparametric and parametricbootstrap.

    Further methods exist, such as the BCa method, but they are used lessoften in practice.

    27

  • Running example for confidence intervals

    We use the student score dataset of the CASI book as a runningexample. It concerns the score of 22 students in 5 tests:

    score

  • R lab: bootstrap (nonparametric) standard error for the studentscore data

    psi_fun

  • R lab: bootstrap distribution

    hist.scott(s_vect, main = "")abline(v = psi_obs, col = 2)

    s_vect

    Den

    sity

    0.4 0.5 0.6 0.7 0.8 0.9

    01

    23

    45

    6

    30

  • The percentile method

    It simply uses the quantiles of the bootstrap distribution ψ̂∗1, . . . , ψ̂∗B .

    In the example, we get

    perc_ci

  • The basic method

    The basic intervals are based on the idea that the distribution of ψ̂∗ − ψ̂mimics that of ψ̂ − ψ. If this is the case, we would get

    0.95 = Pr(L ≤ ψ̂ − ψ ≤ U) ≈ Pr(L ≤ ψ̂∗ − ψ̂ ≤ U)

    Using the first probability we obtain that a confidence interval for ψ is(ψ̂ − U, ψ̂ − L), and then we use the second probability to obtain thatL + ψ̂ and U + ψ̂ are estimated by the 2.5% and 97.5% bootstrapquantiles, respectively (here denoted by q∗0.025 and q∗0.975).

    Putting the two things together we get the basic bootstrap confidenceinterval

    (ψ̂ − U, ψ̂ − L) = (2 ψ̂ − q∗0.975, 2 ψ̂ − q∗0.025)

    32

  • R lab: basic confidence interval

    basic_ci

  • The studentized method

    The last method is perhaps the most reliable of all the methods, but itrequires a standard error estimate SE(ψ̂∗) from each bootstrap sample.

    Denoting by z∗0.025 and z∗0.975 the bootstrap quantiles of z∗1, . . . , z∗B ,where z∗b = (ψ̂∗b − ψ̂)/SE(ψ̂∗b), the studentized bootstrapconfidence interval is given by

    (ψ̂ − SE(ψ̂) z∗0.975, ψ̂ − SE(ψ̂) z∗0.025)

    This is perhaps too challenging for the running example, since explicitestimates of SE(ψ̂∗) would be very hard. We could employ the jackknifewithin each bootstrap sample, or a double bootstrap scheme, though . . .

    34

    Resampling methodsThe nonparametric bootstrapThe parametric bootstrapBootstrap-based confidence intervals