nguyen ngoc anh nguyen ha trang applied econometrics instrumental variable approach depocen

33
Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Upload: madlyn-owen

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Nguyen Ngoc Anh

Nguyen Ha Trang

Applied EconometricsInstrumental Variable Approach

DEPOCEN

Page 2: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Topics That Will Be Covered in this Workshop

Why use IV?– Discussion of endogeneity bias– Statistical motivation for IV

What is an IV?– Identification issues– Statistical properties of IV estimators

How is an IV model estimated?– Software and data examples– Diagnostics: IV relevance, IV exogeneity, Hausman

Page 3: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Review of the Linear Model (in metrix algebra) Population model: Y = α + βX + ε

– Assume that the true slope is positive, so β > 0

Sample model: Y = a + bX + e– Least squares (LS) estimator of β:

bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)

Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?

Page 4: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Review of the Linear Model

Key assumption of the linear model:– E(|x) = E( ) = 0 Cov(x, ) = E(x ) = 0 – E(X′e) = Cov(X,e) = E(e | X) = 0– Exogeneity assumption = X is uncorrelated with the

unobserved determinants of Y

Important statistical property of the LS estimator under exogeneity:

E(bLS) = β + Cov(X,e) / Var(X)

plim(bLS) = β + Cov(X,e) / Var(X)

Second terms 0, so bLS unbiased and consistent

21

ˆxx

yxx

i

ii

11

211

ˆ ,

ˆ

ESo

xx

xx

i

ii

Page 5: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Review of the Linear Model

When you regress Y on X, Y = β0 + β1X + ε and the OLS estimate of β1 can be described as

But since X and ε are correlated, bOLS does not estimate β1 but some other quantity that depends on the correlation of X and ε

Cov ,Cov ,

Cov , Cov ,

Cov , Cov , Cov ,

Cov , Cov ,

OLSX XY X

bX X X X

X X ε X ε X

X X X X

0 1

11

β β

ββ

ε

Page 6: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Endogeneity and the Evaluation Problem

When is the exogeneity assumption violated?– Measurement error → Attenuation bias– Instantaneous causation → Simultaneity bias– Omitted variables → Selection bias

Selection bias is the problem in observational research that undermines causal inference– Measurement error and instantaneous causation

can be posed as problems of omitted variables

Potential outcome approach!!!!

Page 7: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

When Is the Exogeneity Assumption Violated?

Omitted variable (W) that is correlated with both X and Y– Classic problem of omitted variables bias

Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y)

X Y

W

Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

Page 8: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Example #1: Police Hiring

Measurement error– Mobilization of sworn officers (M.E. in X) as well

as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size

Instantaneous causation– More police might be hired during a crime wave

Omitted variables– Large departments may differ in fundamental

ways difficult to measure (e.g., urban, heterogeneous)

Page 9: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Example #2: Delinquent Peers

Measurement error– Highly delinquent youth probably overestimate the

delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y)

Instantaneous causation– If there is influence/imitation, then it is bidirectional

Omitted variables– High-risk youth probably select themselves into

delinquent peer groups (“birds of a feather”)

Page 10: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Regression EstimationIgnoring Omitted Variables

Suppose we estimate treatment effect model:Y = α + βX + ε

– Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated)

Least squares estimator:bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0)

– Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)

Page 11: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Estimating Treatment EffectsConsider treatment assignment (dummy variable) X and outcome

Y

Regress Y on X

Yi = β0 + β1Xi + εi

The estimate of β1 is just the difference between the mean Y for X = 1 (the treatment group) and the mean Y for X = 0 (the control group)

Thus the OLS estimate is

= β1 +

Y β β ε

Y β ε

1 0 1 1

0 0 0

1 0Y Y 1 0

Page 12: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Estimating Treatment Effects(With Random Assignment)

If the treatment is randomly assigned, then X is uncorrelated with ε (X is exogenous)

If X is uncorrelated with ε if and only if

But if , then the mean difference is

= β1 + = β1

This implies that standard methods (OLS) give an unbiased estimate of β1, which is the average treatment effect

That is, the treatment-control mean difference is an unbiased estimate of β1,

1 0

1 0

1 0Y Y 1 0

Page 13: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

What goes wrong without randomization?

If we do not have randomization, there is no guarantee that X is uncorrelated with ε (X may be endogenous)

Thus the OLS estimate is still

= β1 +

If X is correlated with ε, then

Hence does not estimate β1, but some other quantity that depends on the correlation of X and ε

If X is correlated with ε, then standard methods give a biased estimate of β1

1 0Y Y 1 0

1 0Y Y

1 0

Page 14: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Omitted Variables in applied research

What variables of interest to us are surely endogenous?– Micro = Employment, education, marriage, military

service, fertility, conviction, family structure,....– Macro = Poverty, unemployment rate, collective

efficacy, immigrant concentration,....

Basically, EVERYTHING!– (I’m sorry ....... But it suck)

Page 15: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Potential outcome framework

Page 16: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Traditional Strategies to Deal with Omitted Variables

Randomization (physical control) Covariate adjustment (statistical control)

– Control for potential W’s in a regression model– But...we have no idea how many W’s there are, so

model misspecification is still a real problem here

Page 17: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Quasi-Experimental Strategies to Deal with Omitted Variables

Difference in differences (fixed-effects model)– Requires panel data

Propensity score matching– Requires a lot of measured background variables

Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized

Instrumental variables estimation– Requires an exclusion restriction

Page 18: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Instrumental Variables Estimation Is a Viable Approach

An “instrumental variable” for X is one solution to the problem of omitted variables bias

Requirements for Z to be a valid instrument for X– Relevant = Correlated with X– Exogenous = Not correlated

with Y but through its correlation with X

Z

X Y

W

e

Page 19: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Important Point about Instrumental Variables Models I often hear...“A good instrument should not be

correlated with the dependent variable”– WRONG!!!

Z has to be correlated with Y, otherwise it is useless as an instrument– It can only be correlated with Y through X– (trong X có 2 phần, 1 phần dính với e một phần với

Y, muốn tận dụng phần dính với Y)

A good instrument must not be correlated with the unobserved determinants of Y

Page 20: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Important Point about Instrumental Variables Models

Not all of the available variation in X is used– Only that portion of X which is “explained” by

Z is used to explain Y

X Y

Z X = Endogenous variable

Y = Response variable

Z = Instrumental variable

Page 21: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Important Point about Instrumental Variables Models

X Y

Z

Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

X YZ

Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for

Page 22: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Important Point about Instrumental Variables Models

The IV estimator is BIASED– In other words, E(bIV) ≠ β (finite-sample bias)

– The appeal of IV derives from its consistency “Consistency” is a way of saying that E(b) → β as N → ∞ So…IV studies often have very large samples

– But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β anyway

Asymptotic behavior of IVplim(bIV) = β + Cov(Z,e) / Cov(Z,X)

– If Z is truly exogenous, then Cov(Z,e) = 0

Page 23: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Instrumental Variables Terminology

Three different models to be familiar with– First stage: X = α0 + α1Z + ω

– Structural model: Y = β0 + β1X + ε

– Reduced form: Y = δ0 + δ1Z + ξ

Page 24: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

More on the Method of Two-Stage Least Squares (2SLS)

Step 1: X = a0 + a1Z1 + a2Z2 + + akZk + u – Obtain fitted values (X̃) from the first-stage model

Step 2: Y = b0 + b1X̃ + e – Substitute the fitted X̃ in place of the original X– Note: If done manually in two stages, the standard

errors are based on the wrong residual e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X

Best to just let the software do it for you

Page 25: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Some examples

Page 26: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Some examples

Page 27: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Including Control Variables in an IV/2SLS Model

Control variables (W’s) should be entered into the model at both stages– First stage: X = a0 + a1Z + a2W + u

– Second stage: Y = b0 + b1X̃ + b2W + e

Control variables are considered “instruments,” they are just not “excluded instruments”– They serve as their own instrument

Page 28: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Functional Form Considerations with IV/2SLS

Binary endogenous regressor (X)– Consistency of second-stage estimates do not

hinge on getting first-stage functional form correct

Binary response variable (Y)– IV probit (or logit) is feasible but is technically

unnecessary

In both cases, linear model is tractable, easily interpreted, and consistent– Although variance adjustment is well advised

Page 29: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Technical Conditions Required for Model Identification

Order condition = At least the same # of IV’s as endogenous X’s– Just-identified model: # IV’s = # X’s– Overidentified model: # IV’s > # X’s

Rank condition = At least one IV must be significant in the first-stage model– Number of linearly independent columns in a matrix

E(X | Z,W) cannot be perfectly correlated with E(X | W)

Page 30: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Instrumental Variables and Randomized Experiments

Imperfect compliance in randomized trials– Some individuals assigned to treatment group will

not receive Tx, and some assigned to control group will receive Tx

Assignment error; subject refusal; investigator discretion

– Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior

A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

Page 31: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Durbin-Wu-Hausman (DWH) Test

Balances the consistency of IV against the efficiency of LS– H0: IV and LS both consistent, but LS is efficient

– H1: Only IV is consistent

DWH test for a single endogenous regressor:DWH = (bIV – bLS) / √(s2

bIV – s2

bLS) ~ N(0,1)

– If |DWH| > 1.96, then X is endogenous and IV is the preferred estimator despite its inefficiency

Page 32: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Durbin-Wu-Hausman (DWH) Test

A roughly equivalent procedure for DWH:1. Estimate the first-stage model

2. Include the first-stage residual in the structural model along with the endogenous X

3. Test for significance of the coefficient on residual

Note: Coefficient on endogenous X in this model is bIV (standard error is smaller, though)– First-stage residual is a “generated regressor”

Page 33: Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Software Considerations

Basic model specification in Stataivreg y (x = z) w [weight = wtvar], options

y = dependent variable

x = endogenous variable

z = instrumental variable

w = control variable(s)