instrumental variables estimation (with examples from criminology) robert apel, ph.d. school of...

Instrumental Variables Estimation (with Examples from Criminology)

Robert Apel, Ph.D.School of Criminal Justice

University at Albany

Center for Social and Demographic Analysis

University at Albany

May 5 & 7, 2009

Vital Statistics

Ph.D., Criminology and Criminal Justice, 2004– University of Maryland

Coursework in Department of Economics Dissertation used instrumental variables

– State child labor laws as instrumental variables for the causal effect of youth employment on antisocial behavior

Topics That Will Be Covered in this Workshop

Why use IV?– Discussion of endogeneity bias– Statistical motivation for IV

What is an IV?– Identification issues– Statistical properties of IV estimators

How is an IV model estimated?– Software and data examples– Diagnostics: IV relevance, IV exogeneity, Hausman

Review of the Linear Model

Population model: Y = α + βX + ε– Assume that the true slope is positive, so β > 0

Sample model: Y = a + bX + e– Least squares (LS) estimator of β:

bLS = (X′X)–1X′Y = Cov(X,Y) / Var(X)

Under what conditions can we speak of bLS as a causal estimate of the effect of X on Y?

Review of the Linear Model

Key assumption of the linear model:E(X′e) = Cov(X,e) = E(e | X) = 0

– Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y

Important statistical property of the LS estimator under exogeneity:

E(bLS) = β + Cov(X,e) / Var(X)

plim(bLS) = β + Cov(X,e) / Var(X)

Second terms 0, so bLS unbiased and consistent

Endogeneity and the Evaluation Problem

When is the exogeneity assumption violated?– Measurement error → Attenuation bias– Instantaneous causation → Simultaneity bias– Omitted variables → Selection bias

Selection bias is the problem in observational research that undermines causal inference– Measurement error and instantaneous causation

can be posed as problems of omitted variables

When Is the Exogeneity Assumption Violated?

(1) Measurement error in X (u) that is correlated with M.E. in Y (v) or with the model error (e)– Classical M.E. leads to attenuation, 0 < E(bLS) < β,

but non-random M.E. (or correlation between M.E. and X, Y, V, and/or e) introduces unknown biases

And, if there are multiple X’s, bias contaminates the whole model, not just the coefficient on the X measured with error (a.k.a. “smearing”)

X Y

vu

e


(2) Instantaneous causation of Y on X– Direction of the bias depends on what the sign is

for the feedback effect, Y → X If positive, E(bLS) > β, so overestimate true effect

If negative, E(bLS) < β, so underestimate true effect and in severe cases can even flip the sign so that E(bLS) < 0 even though β > 0

X YThis non-recursivity complicates the relationship between price and quantity in economics


(3) Omitted variable (W) that is correlated with both X and Y– Classic problem of omitted variables bias

Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y)

X Y

W

Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

Example #1: Police Hiring

Measurement error– Mobilization of sworn officers (M.E. in X) as well as

differential victim reporting or crime recording (M.E. in Y) may be correlated with police size

Instantaneous causation– More police might be hired during a crime wave

Omitted variables– Large departments may differ in fundamental ways

difficult to measure (e.g., urban, heterogeneous)

Example #2: Sanction Perceptions

Measurement error– Measures of perceived sanction risk are probably

“noisy” (M.E. in X), resulting in attenuation at best

Instantaneous causation– Perceptions are sensitive to the success/failure of

criminal behavior, so feedback is negative

Omitted variables– Perceived risk probably correlated with unobserved

determinants of crime (e.g., intelligence)

Example #3: Delinquent Peers

Measurement error– Highly delinquent youth probably overestimate the

delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y)

Instantaneous causation– If there is influence/imitation, then it is bidirectional

Omitted variables– High-risk youth probably select themselves into

delinquent peer groups (“birds of a feather”)

Regression EstimationIgnoring Omitted Variables

Suppose we estimate treatment effect model:Y = α + βX + ε

– Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated)

Least squares estimator:bLS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0)

– Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)


But suppose the population treatment effect model is instead:

Y = α + βX + (δW + ω)– Now the residual conveys information about W

Consider a plausible example– Y = crime, X = marriage, W = “marriageability”

“Marriageability” can be broadly construed to encompass earnings potential, desire for children, willingness to compromise, faithfulness, verbal communication skills,...

– Including “signals” that individuals emit about these qualities


What does LS estimate when W is omitted?bLS = [C(X,Y)/V(X)] + [C(W,Y)/V(W)] × [C(X,W)/V(X)]

= β + δ × [E(W | X = 1) – E(W | X = 0)]

Marriage effect on crime will be overestimated– IMPORTANT: Even if β = 0, bLS < 0

True impact of marriage on crime

(–)

Impact of marriage-ability on crime

(–)

Difference in marriageability between married and unmarried

(+)


So...bLS = β + δ × [E(W | X = 1) – E(W | X = 0)]

Estimate of β is unbiased if and only if1. Marriageability is uncorrelated with crime

δ = 0

or...

2. Marriageability is “balanced” (i.e., equivalent) between married and unmarried subjects

E(W | X = 1) = E(W | X = 0)

Omitted Variables in Criminological Research

What variables of interest to criminologists are surely endogenous?– Micro = Employment, education, marriage, military

service, fertility, conviction, family structure,....– Macro = Poverty, unemployment rate, collective

efficacy, immigrant concentration,....

Basically, EVERYTHING!– (I’m sorry to be the one to break it to you)

Traditional Strategies to Deal with Omitted Variables

Randomization (physical control)– Achieves balance (in expectation) on any and all

potential W’s– Control variables are technically unnecessary

Covariate adjustment (statistical control)– Control for potential W’s in a regression model– But...we have no idea how many W’s there are, so

model misspecification is still a real problem here

Quasi-Experimental Strategies to Deal with Omitted Variables

Difference in differences (fixed-effects model)– Requires panel data

Propensity score matching– Requires a lot of measured background variables

Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized

Instrumental variables estimation– Requires an exclusion restriction

Instrumental Variables Estimation Is a Viable Approach

An “instrumental variable” for X is one solution to the problem of omitted variables bias

Requirements for Z to be a valid instrument for X– Relevant = Correlated with X– Exogenous = Not correlated

with Y but through its correlation with X

Z

X Y

W

e

Important Point about Instrumental Variables Models

I often hear...“A good instrument should not be correlated with the dependent variable”– WRONG!!!

Z has to be correlated with Y, otherwise it is useless as an instrument– It can only be correlated with Y through X

A good instrument must not be correlated with the unobserved determinants of Y


Not all of the available variation in X is used– Only that portion of X which is “explained” by

Z is used to explain Y

X Y

Z

X = Endogenous variable

Y = Response variable

Z = Instrumental variable


X Y

Z

Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y

X YZ

Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for


The IV estimator is BIASED– In other words, E(bIV) ≠ β (finite-sample bias)

– The appeal of IV derives from its consistency “Consistency” is a way of saying that E(b) → β as N → ∞ So…IV studies often have very large samples

– But with endogeneity, E(bLS) ≠ β and plim(bLS) ≠ β anyway

Asymptotic behavior of IVplim(bIV) = β + Cov(Z,e) / Cov(Z,X)

– If Z is truly exogenous, then Cov(Z,e) = 0

Instrumental Variables Terminology

Three different models to be familiar with– First stage: X = α0 + α1Z + ω

– Structural model: Y = β0 + β1X + ε

– Reduced form: Y = δ0 + δ1Z + ξ

An interesting equality:δ1 = α1 × β1

so…

β1 = δ1 / α1

Z X Yα1 β1

Z Yδ1

ω ε

ξ

Different Types of Instrumental Variables Estimators

Wald estimator for binary instrument:bWald = [E(Y | Z = 1) – E(Y | Z = 0)] / [E(X | Z = 1) – E(X | Z = 0)]

– Difference in response ÷ Difference in treatment

Instrumental variables (IV) estimator:bIV = (Z′X)–1Z′Y = Cov(Z,Y) / Cov(Z,X)

– Shows that bIV can be recovered from two samples

Two-stage least squares (2SLS) estimator:b2SLS = (X̃′X̃)–1X̃′Y = Cov(X̃,Y) / Var(X̃)

– X̃ represents “fitted” value from first-stage model

Different Types of Instrumental Variables Estimators

Single binary instrument and no control variables...

bWald = bIV = b2SLS

Single instrument (binary or continuous) with or without control variables...

bIV = b2SLS

Multiple instruments (binary or continuous) with or without control variables...

b2SLS

More on the Method of Two-Stage Least Squares (2SLS)

Step 1: X = a0 + a1Z1 + a2Z2 + + akZk + u – Obtain fitted values (X̃) from the first-stage model

Step 2: Y = b0 + b1X̃ + e – Substitute the fitted X̃ in place of the original X– Note: If done manually in two stages, the standard

errors are based on the wrong residual e = Y – b0 – b1X̃ when it should be e = Y – b0 – b1X

Best to just let the software do it for you

Including Control Variables in an IV/2SLS Model

Control variables (W’s) should be entered into the model at both stages– First stage: X = a0 + a1Z + a2W + u

– Second stage: Y = b0 + b1X̃ + b2W + e

Control variables are considered “instruments,” they are just not “excluded instruments”– They serve as their own instrument

Functional Form Considerations with IV/2SLS

Binary endogenous regressor (X)– Consistency of second-stage estimates do not

hinge on getting first-stage functional form correct

Binary response variable (Y)– IV probit (or logit) is feasible but is technically

unnecessary

In both cases, linear model is tractable, easily interpreted, and consistent– Although variance adjustment is well advised

Functional Form Considerations with IV/2SLS

Quadratic second stage with a continuous endogenous regressor– Entering first-stage fitted values and their square

into second-stage model leads to inconsistency The square of a linear projection is not equivalent to a

linear projection on a quadratic

– Squares and cross-products of IV’s should be treated as additional instruments

Kelejian (1971)

– Linear and squared X’s are treated as two different endogenous regressors

Technical Conditions Required for Model Identification

Order condition = At least the same # of IV’s as endogenous X’s– Just-identified model: # IV’s = # X’s– Overidentified model: # IV’s > # X’s

Rank condition = At least one IV must be significant in the first-stage model– Number of linearly independent columns in a matrix

E(X | Z,W) cannot be perfectly correlated with E(X | W)

Statistical Inference with IV

Variance estimationσ2

βLS = σ2

ε / SSTX

σ2βIV

= σ2ε / (SSTX R2

X,Z)

where…

ε = Y – β0 – β1X

NOTICE: Because R2X,Z < 1 sbIV

> sbLS

– IV standard errors tend to be large, especially when R2

X,Z is very small, which can lead to type II errors

Instrumental Variables and Randomized Experiments

Imperfect compliance in randomized trials– Some individuals assigned to treatment group will

not receive Tx, and some assigned to control group will receive Tx

Assignment error; subject refusal; investigator discretion

– Some individuals who receive Tx will not change their behavior, and some who do not receive Tx will change their behavior

A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

Instrumental Variables and Randomized Experiments

Two different measures of treatment (X)– Treatment assigned = Exogenous

Intention-to-treat (ITT) analysis– Reduced-form model: Y = δ0 + δ1Z + ξ

Often leads to underestimation of treatment effect

– Treatment delivered = Endogenous Individuals who do not comply probably differ in ways that

can undermine the study Self-selection bias and inconsistency

Angrist (2006), J.E.C.

Minneapolis D.V. experiment– Sherman and Berk (1984)

Cases of male-on-female misdemeanor assault in two high-density precincts, in which both parties present at scene

– Random assignment of arrest-mediation-separation– But...treatment assigned was not treatment delivered

Fidelity vis-à-vis arrest, but many subjects (~25%) assigned to mediation/separation were arrested

– “Upgrading” was more likely when suspect was rude, suspect assaulted officer, weapons were involved, victim persistently demanded arrest, and incident violated restraining order


TreatmentAssigned(Arrest)

TreatmentDelivered(Arrest)

Recidivism+ –

ViolenceProneness

++


Estimates of effect of arrest (vs. mediate or separate) on D.V. recividism (Tables 2, 3)– OLS: b = –.070 (s.e. = .038)– ITT: b = –.108 (s.e. = .041)– 2SLS: b = –.140 (s.e. = .053)

Deterrent effect of arrest is twice as large in 2SLS as opposed to OLS– In this context, 2SLS is known as a “local average

treatment effect” (I’ll come back to this)

Sexton and Hebel (1984), J.A.M.A.

Maternal smoking and birth weight– Sexton and Hebel (1984)

Sample of pregnant women who were confirmed smokers, recruited from prenatal care registrants

– At least 10 cigarettes per day and not past 18th week

– Random assignment of staff assistance in a smoking cessation program

Personal visits; telephone and mail contacts

– But...some smokers in treatment group did not quit and some smokers in control group did quit


SmokingIntervention

SmokingFrequency

–

SmokingPropensity

+

BirthWeight

–

–

DifficultPregnancy

––


(1) First-stage model

Mean cigarettes smoked:

Treatment = 6.4

Control = 12.8

First-stage effect: bFS = –6.4

(2) Reduced-form model

Mean birth weight:

Treatment = 3,278g

Control = 3,186g

Reduced-form effect: bRF = 92

(3) Structural model

Effect of smoking frequency on mean birth weight:

bIV = 92 / –6.4 = –14.4g

Each cigarette reduces birth weight by 14.4 grams


As an interesting aside, it’s also possible to estimate the effect of continuing smoking (vs. quitting) from the data– First stage: bFS = –0.23 (57% vs. 80% smokers)

– Reduced form: bRF = 92g

– Structural: bIV = 92 / –0.23 = –400g

Women who kept smoking by the 8th month of pregnancy bore children who were 400 grams lighter, on average

Permutt and Hebel (1989), Biometrics

Estimates of the effect of smoking frequency (in 8th month) on birth weight– OLS: b = 2g (s.e. not reported)– 2SLS: b = –14g (s.e. = 7g)

Here as well, 2SLS yields the “local average treatment effect” of smoking on birth weight

Instrumental Variables and Local Average Treatment Effects

Definition of a L.A.T.E.– The average treatment effect for individuals “who

can be induced to change [treatment] status by a change in the instrument”

Imbens and Angrist (1994, p. 470)

– The average causal effect of X on Y for “compliers,” as opposed to “always takers” or “never takers”

Not a particularly well-defined (sub)population

L.A.T.E. is instrument-dependent, in contrast to the population A.T.E.

L.A.T.E. in the Previous Two Examples

In the D.V. study...– For men who were arrested as per the experimental

protocol, arrest resulted in a mean 14-point decline in the probability of recidivism compared to non-arrest interventions

In the maternal smoking study...– For women who reduced their smoking frequency

because they were assigned to the intervention, each one-cigarette reduction resulted in a 14-gram increase in birth weight (from mean 11 cigarettes)

instrumental variables estimation (with examples from criminology) robert apel, ph.d. school of...

Documents

effect of x

y varx

x direction

iv model

exogeneity assumption

instantaneous causation

economics slide

model error e classical