introduction to bootstrap methods - university of...

16
Introduction to Bootstrap Methods Miguel Sarzosa Department of Economics University of Maryland Econ626: Empirical Microeconomics, 2012

Upload: buicong

Post on 06-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Introduction to Bootstrap Methods

Miguel Sarzosa

Department of Economics

University of Maryland

Econ626: Empirical Microeconomics, 2012

Page 2: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

1 Recreating the Universe

2 Bootstrap Estimates

3 The Jackknife

4 Applications

5 Now we go to Stata!

Page 3: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Why is Bootstrap Important?

Only in specific instances we are able to infer population parametersfrom a data set

I Data sets are often samples of the population

Most of the estimators we use for inference rely on asymptotic results(CLT). A setting that is impossible to have in the data unless we usean approximation like bootstrap.Bootstrap is used for two things

1 Computation of SE, CI (practical)2 Asymptotic behavior of estimators (theory)

Page 4: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

How does it work?

Bootstrap views the sample you have in your data set as thepopulation of interest.We obtain estimates of characteristics of the distribution of a givenestimator µ by drawing B sub-samples of size N with replacement.Then we get B estimates of µ (i.e., µ

b

) from which we can obtainmoments like the mean and the variance

µ ⇠

B

�1

B

Âb=1

µb

,(B �1)�1

B

Âb=1

µ

b

�B

�1

B

Âb=1

µb

!!

Page 5: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Algorithm

1 Take subsample of size N

2 Calculate the desired statistic on the sample3 Repeat 1. and 2. B times, where B is a large number

Page 6: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

How Many Repetitions?

In most of the cases, the more the better. However, bootstrapping can becomputationally intensive. Andrews and Buchinsky (2000) came up withthe rule that you should take B = 384w repetitions, where w with thestatistic inquired

SE: w = (2+ g4

)4. Where g4

depends on the excess kurtosis ) fattertails mean higher B

I For no excess kurtosis B = 192I For g

4

= 8, B = 960

Two sided CI: Depends on the critical valueI For a = 0.05, B = 348I For a = 0.01, B = 685

One sided CI: Depends on the critical valueI For a = 0.05, B = 634I For a = 0.01, B = 989

Page 7: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

SE Estimation

One of the main uses of bootstrap is to calculate the correct SE. SEestimation through bootstrap is very useful for example in 2-stepestimations (e.g., IV) whose SE are di�cult to compute. The variance ofan estimate q calculated using bootstrap is given by

S

2

q = (B �1)�1

B

Âb=1

⇣q ⇤

b

� ¯q⌘

2

where¯q = B

�1

B

Âb=1

q ⇤b

(1)

Sq is consistent, therefore it can be used to obtain CI and hypothesistesting

Page 8: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Bias Estimation

From (1) it is easy to see that the bias of the estimate q is given by

Biasq =⇣¯q � q

This allows to correct for the bias in an estimate. The bias-correctedestimate is given by

qcorr

= q �Biasq = q �⇣¯q � q

⌘= 2q � ¯q

Page 9: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

The JackknifeUses N subsamples of size N �1 (drops one observation at a time). Thenthe jackknife estimate is given by

¯q = N

�1 Âi

q�i

Then the bias isBiasq = (N �1)

⇣¯q � q

and the corrected estimate

qcorr

= q �Biasq = N q � (N �1) ¯q

The jackknife SE is given by

Sq =

"N �1

N

B

Âb=1

⇣q�i

� ¯q⌘

2

#1/2

Page 10: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Heteroskedastic Errors

In the presence of heteroskedasticity, we use HEW SE, but they performvery poorly in small samples. Bootstrap can be a better choice

Paired Bootstrap: Obtain samples of (yi

,xi

) and estimate (i.e.,regress). Assuming that each draw is i.i.d we are able to do validinference because we are still allowing for Var [u

i

|xi

] to vary with x

i

.In Stata vce(bootstrap)

Wild Bootstrap: Obtain samples of (y⇤i

,xi

) where

y

⇤i

= x

i

b + u

⇤i

and

u

⇤i

=

8<

:

1�p

5

2

u

i

with probability 1+p

5

2

p5⇣

1� 1�p

5

2

⌘u

i

with probability 1� 1+p

5

2

p5

Page 11: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Panel Data and Clustered Data

Note that in the Paired Bootstrap we assumed the (yi

,xi

) draws werei.i.d. In the case we are not able to claim that because theobservations are not independently distributed (i.e., panel or clustereddata) we use panel bootstrap.Suppose a panel has two dimensions i and t. In the panel bootstrap,we resample over i and not over t. That is when we bootstrap wechoose the is that will appear in the subsample and obtain all the t

observations of those is chosenA key assumption is that the data are independent over i

The same procedure is done when the data is clustered. We resampleover the clusters, and then get all observations belonging to thatcluster. We need a huge number of clusters.

I In Stata: vce(bootstrap, cluster(varlist ))

Page 12: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Implementation Issues

Remember that the assumption of independence across observationsor clusters of observations is crucialIn some cases, the second moment might not exist, evenasymptotically, so the bootstrap results will be missleadingThe basic bootstrap assumes the estimator is a smooth estimator,p

N -consistent and asymptotically normal

Page 13: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Implementation in Stata

Stata users can perfirm bootstrapped estimations using two ways:1 There are some commands that incorporate the bootstrap option by

typing vce(bootstrap) or vce(bootstrap, cluster(varlist ))

2 Stata users can also incorporate bootstrap estimations in a largenumber of commands including:

1 Those without the vce(bootstrap) option,2 Non-estimation commands, (e.g., summarize)3 User-written commands

using the bootstrap command

Page 14: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

The bootstrap CommandThe syntax of bootstrap is di�erent from the main style of Statacommands. It requires to first specify what is the estimate that is going tobe bootstrapped, then the bootstrap options and then the command thatis going to be boostrapped.

bootstrap exp_list [, options eform_option ] : command

whereexp_list specifies the estimates that will be bootstrapped (e.g.,_b, _b[x1] or _se)among the most importatnt options we have

Ireps(#)

Iseed(#)

Icluster(varlist )

Istrata(varlist )

Isize(#)

Page 15: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Example

bootstrap _b, reps(100) seed(10101) cluster(clusvar): proIVpro

if desocupa==0 & year==2000

Page 16: Introduction to Bootstrap Methods - University Of Marylandeconweb.umd.edu/~sarzosa/teach/3/Disc3_Bootstrap.pdf · Why is Bootstrap Important? Only in specific instances we are able

Now we go to Stata!