dylan small department of statistics, wharton school, university of pennsylvania joint work with:...

Post on 26-Dec-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dylan Small

Department of Statistics,

Wharton School, University of Pennsylvania

Joint work with: Paul Rosenbaum Mike Baiocchi Marshall Joffe Tom Ten Have

Strategies for Using Partially Valid Instrumental Variables

Overview• Example of Instrumental Variables (IV) method:

Effect of World War II military service on future earnings.

• Sensitivity to unobserved biases for IV method.• Strength of IVs and sensitivity to unobserved

biases: How do small studies with strong IVs compare

to large studies with weak IVs?• Extended instrumental variables methods when

exclusion restriction for IV is invalid.

WWII Veteran Status and Earnings• Does military service raise or lower

earnings?• Angrist and Krueger (1994) studied this in

context of WWII military service and 1980 earnings (using 5% public use sample of US Census).

• Lower earnings? Military service in WWII interrupts education or career.

• Higher earnings? Labor market might favor veterans, GI Bill increases education.

This is association not causation: WWII Vets might not be comparable to Non-Vets in terms of health, criminal behavior…

WWII Vets (76% of men)earned on average $4500more in 1980 than Non-Vets.

We created matched triples: men matched on quarter of birth, race, age, education upto 8 years and location of birth.

This figure provides reason to doubt military service increases earnings by $4500.

From 1924 to 1926, the proportion of veterans stayed about constant and the earningsstayed about the same. From 1926 to 1928, the proportion of veterans decreasedby 50% but earnings increased, suggesting military service decreases earnings.

Unmeasured Confounding

Veteran Status

Earnings

UnobservedVariables

Graph is conditional on measured confounders(race, education up to 8years, location of birth)

Instrumental Variables Strategy

Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y.

Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome.

W: Veteran Status

Z: Year of Birth

Y:Earnings

UnobservedVariables

X

X

Y=OutcomeW=TreatmentZ=IV

Graph is conditional on measured confounders(race, education up to 8years, location of birth)

Prototype IV Design: Matched Pair Encouragement Design

Consider a matched pair design in which there are I matched pairs and one unit j in each pair i is encouraged to receive treatment ( 1ijZ ) and the other unit j’ is not encouraged to receive

treatment '( 1)ijZ .

Rubin Causal Model: Each subject ij has two potential outcomes:

ijTr = outcome if encouraged

ijCr = outcome if not encouraged

and two potential treatment receiveds:

ijTw = dose of treatment received if encouraged

ijCw = dose of treatment received if not encouraged

Randomization InferenceA simple model says that the effect of encouragement on the outcome is proportional to its effect on the treatment received:

( )ij ij ij ijT C T Cr r w w (1)

In WWII study, casual effect of military service Let ijR = observed outcome, ijW observed treatment received.

Under model (1), ijC ij ijr R W .

In this context, the encouragement variable Z is said to be a valid instrumental variable (IV) if Z is effectively randomly assigned:

1 2 1 2

1 1( 1, 0) , ( 0, 1)

2 2i i i iP Z Z P Z Z

If Z is a valid IV, we can test 0 0:H by testing whether 0ij ijR W

(= 0 if ijCr ) is independent of ijZ , e.g., by a Wilcoxon signed rank test.

95% CI for effect of military service: (-$1,445, -$500)

Relationship to Angrist, Imbens and Rubin Setup

Angrist, Imbens and Rubin (1996) define an IV as valid if it is 1. effectively randomly assigned (ignorable) ( , ) independent of |

ij ijT C ij ijr r Z X

2. no direct effect (exclusion restriction) The model

( )ij ij ij ijT C T Cr r w w

assumes the exclusion restriction: encouragement has no direct effect. Side note: Angrist, Imbens also consider situation of heterogeneous treatment effects. They show that under an additional assumption (monotonicity), a valid IV identifies the average treatment effect for the subjects who would receive treatment if and only if encouraged to do so (the compliers).

IV Applications in Health ResearchOutcome (Y ) Treatment (W ) IV ( Z ) Reference Birth weight Maternal smoking State cigarette taxes Evans and Ringel

(1999) Birth weight Maternal smoking Random assignment

of free smoker’s counseling

Permutt and Hebel (1989)

Mortality Premature baby delivered at high level NICU vs. local hospital

Mother’s differential distance between high level NICU and local hospital

Baiocchi, Small, Lorch and Rosenbaum (2010)

Gastrointestinal Complications

Non-steroidal anti-inflammatory drug (NSAID) vs. non-NSAID drug

Physician’s last prescription type

Brookhart et al. (2006)

Mortality Breast cancer surgery treatment vs. non-surgery treatment

Proportion receiving surgery in health referral region

Brooks et al. (2003)

Coronary Heart Disease

HDL Cholesterol Polymorphisms that affect HDL cholesterol

Voight et al. (2012)

Sensitivity Analysis

IV method assumes that the IV (encouragement) is effectively randomly assigned:

1 2 1 2

1 1( 1, 0) , ( 0, 1)

2 2i i i iP Z Z P Z Z

There is often concern about whether this is true. In WWII Study, there are gradual long term trends in apprenticeship, education, employment and nutrition that might bias comparisons of workers born two years apart. A sensitivity analysis asks how departures from random assignment of the IV of various magnitudes might alter a study’s conclusion.

Model for Sensitivity AnalysisFor subject ij, let ij denote the probability that ij is encouraged,

( 1)ij ijP Z .

Suppose that two subjects ij and ik may differ in their odds of being encouraged by at most a factor of 1 because they differ in terms of an unobserved covariate, ij iku u ,

(1 )1 , ,

(1 )ij ik

ik ij

i j k

.

If 1 , IV is randomly assigned. If 1 , then distribution of treatment assignments is unknown but magnitude of departure from random assignment controlled by .

Carrying out Sensitivity Analysis

Let 11 12 1 2( , , , , )I I denote the probabilities of being encouraged

for each subject. For each fixed value of , we can test 0 0:H using permutation

inference. For a given value of , we compute the minimum and maximum p-values for testing 0 0:H for all that satisfy

(1 )1 , ,

(1 )ij ik

ik ij

i j k

.

Rosenbaum (Observational Studies, 2002) provides a simple method to compute these minimum and maximum p-values.

95% CI for effect of military service when 1 : (-$1,445, -$500) 95% CI for effect of military service when 1.2 : (-$3,745, $1,735)

Sensitivity Analysis for WWII Study

Upper Bound on One-Sided Significance Level for 1926 vs. 1928 IV 0 : 0H 0 : 1,000H 0 : 4,500H 0 : 10,000H

1 0.001 0.001 0.001 0.001 1.2 1.000 0.860 0.001 0.001 1.5 1.000 1.000 0.027 0.001 1.6 1.000 1.000 0.904 0.001 2.2 1.000 1.000 1.000 0.016 2.3 1.000 1.000 1.000 0.476

Strength of IV

• An IV is strong if encouragement has a strong effect on treatment received;

An IV is weak if encouragement has only a weak effect on treatment received.

• Effects of Weak IVs

1. Increased Variance

2. Increased Sensitivity to Bias

Study Strong IV Weak IV World War II Study 1926 vs. 1928 1924 vs. 1926 Maternal Smoking Study Random assignment of

free counseling State cigarette taxes

Effect of Weak IVs I: Increased Variance

If Z is a weak IV, then the variance of the IV estimate will be higher because less variation in W from Z can be extracted.

W|X

Z|X

Y

UnobservedVariables

X

X

95% CI for effect of military service using 1926 vs. 1928 IV: (-$1,445, -$500).95% CI for effect of military service using 1924 vs. 1926 IV: (-$10,130, $10,750)

Effect of Weak IVs II: Increased Sensitivity to Bias

Power of a Sensitivity Analysis (Rosenbaum, 2004)

Suppose Z were in fact a valid IV so that 1 2 1 2

1( 1, 0) ( 0, 1)

2i i i iP Z Z P Z Z ,

but we didn’t know this and wanted to allow for some sensitivity to bias measured by Suppose also that 0 was large, so that 0 0:H was substantially in error.

We would like to be able to reject 0 0:H for all that satisfy

(1 )1 , ,

(1 )ij ik

ik ij

i j k

. (1)

Power of a sensitivity analysis: Probability that we will reject 0 0:H for all

that satisfy (1) assuming that Z is a valid IV and a given value of 0 .

Model for Power Analysis

Let 1 2

2~ (0, )i iC Cr r N .

Subjects have random compliance patterns with zero probability of being a defier and equal probabilities of being a never taker or always taker. Effect size is 0( ) /

Strength of instrument is ( 1| 1) ( 1| 0)ij ij ij ijP W Z P W Z .

(probability of being a complier)

Effect size: 0( ) / 1 Number of pairs I

Strength of IV:

100 1000 10,000 100,000 limI

1 1 1.00 1.00 1.00 1.00 1 0.5 1 0.99 1.00 1.00 1.00 1 0.1 1 0.12 0.73 1.00 1.00 1 1 1.2 1.00 1.00 1.00 1.00 1 0.5 1.2 0.92 1.00 1.00 1.00 1 0.1 1.2 0.03 0.03 0.04 0.10 1 1 2 1.00 1.00 1.00 1.00 1 0.5 2 0.18 0.97 1.00 1.00 1 0.1 2 0.00 0.00 0.00 0.00 0 When the IV is valid ( 1 ), the power is of course greater for stronger IVs but there is good power for all cases with sample size of 10,000 pairs. Valid but weak IVs eventually get it right. But when 1 , the power can tend to 1 or 0 depending on the strength of the IV. Weak IVs are quite sensitive to small biases.

Effect size: 0( ) / 0.5 Number of pairs I

Strength of IV: Proportion of compliers

100 1000 10,000 100,000 limI

1 1 1.00 1.00 1.00 1.00 1 0.5 1 0.64 1.00 1.00 1.00 1 0.1 1 0.07 0.32 1.00 1.00 1 1 1.2 0.98 1.00 1.00 1.00 1 0.5 1.2 0.32 1.00 1.00 1.00 1 0.1 1.2 0.01 0.00 0.00 0.00 1 1 2 0.38 1.00 1.00 1.00 1 0.5 2 0.01 0.00 0.00 0.00 0 0.1 2 0.00 0.00 0.00 0.00 0 For strong IVs, the sensitivity to unobserved biases is meaningfully affected by the effect size (e.g., for 2,

1000I , proportion of compliers = 0.5, power is 0.97 when 0( ) / 1 but 0.00 when 0( ) / 0.5 ).

But for weak IVs, there is barely any difference between 0( ) / 1

versus 0( ) / 0.5 .

Practical Consequences1. Weak IVs that might have small bias are dangerous to use. Weak IVs are sensitive to quite small biases ( 1 yet close to 1), even when the effect size 0( ) / is quite large.

Unless one is confident that a weak IV is perfectly valid ( 1 ), its extreme sensitivity to small biases is likely to limit its usefulness to the study of enormous effects, 0( ) / 1 .

2. Strong IVs that might be moderately biased are useful. A strong IV may provide useful information even if moderate biases are plausible. 3. Strength of IV important in choosing a study design. Consider two studies, a small study with a strong IV and a large study with a weak IV, which would have the same power if both IVs are unbiased. When there is concern that the IVs might be biased, the small study with a strong IV has considerable advantages.

Practical Consequences Continued

4. Strategies for increasing effect size more useful for strong IVs. For strong IVs, the sensitivity to unobserved biases is meaningfully affected by the effect size 0( ) / whereas for weak IVs, the

effect size makes little difference. Sensitivity to unobserved biases can sometimes be reduced by increasing the effect size say by reducing the unexplained heterogeneity of subjects (Rosenbaum, 2005). For instance Ashenfelter and Rouse (1998) studied the effects of additional education on earnings using identical twins and Kim (2007) studied the earnings of veteran siblings to estimate the effect of being drafted Strategies of this sort may be helpful with strong IVs but largely ineffective with weak IVs.

Extended IV Methods for Addressing Violation of Exclusion Restriction

• Angrist, Imbens and Rubin (1996): two key conditions for valid IV are :– IV effectively random assigned conditional on

measured covariates X– No direct effect on Y (exclusion restriction).

• We consider situations in which the random assignment is plausible but the exclusion restriction is not.

Instrumental Variables Strategy

Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y.

Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome.

W: Veteran Status

Z: Year of Birth

Y:Earnings

UnobservedVariables

X

X

Y=OutcomeW=TreatmentZ=IV

Graph is conditional on measured confounders(race, education up to 8years, location of birth)

Vascular access in hemodialysis

• Hemodialysis – One of main treatment options in end-stage

renal disease (ESRD)– Requires access to vascular system

• Three main types– Catheter– Synthetic material– Native arteriovenous fistula (AVF)

Vascular access (cont’d)

• Type of VA (A) partially determines dose of dialysis (DD; S)– Native AVF allows larger doses than

catheter– S may affect outcomes (e.g., mortality)

• VA may have effects on outcome (Y) not mediated by dose (e.g., infection)

• Incomplete directed acyclic graph (DAG) of key variables

A

S

Y

Estimand of interest

• To gauge impact of type of VA, interested in overall effect– Involves both

• Direct effect (A->Y)

• Indirect effect (A->S->Y)

• Formulate in terms of potential outcomes:

A

S

Y

01

0101

singly indexed

doubly indexedaa

aa

a Sa S

Y Y

Y Y

aY

0001

011 1

0101

direct effect:

indirect effect:

overall effect:

aa

aa

aa

a Sa S

a S a S

a Sa S

Y Y

Y Y

Y Y

Confounding by indication• AVFs given preferentially to healthier subjects• Results in confounding by indication

– Often difficult to control using standard methods based on ignorable treatment assignment

– Variety of treatments of dialysis patients in which standard approaches based on ignorability lead to implausible results

• Dose of dialysis choice (S) also nonignorable

Instrumental variables

• Alternative approach for estimation• Need to find instrumental variable (R)

– Associated with treatment of interest (A)– Independent of unmeasured confounders, i.e., shares no

unmeasured common cause with outcome Y.– Has no direct effect on outcome (exclusion restriction)

• Practice at which dialysis provided reasonable candidate– Used for various analyses in Dialysis Outcomes and Practice

Patterns Study (DOPPS)• Large, international study with hundreds of practices

• Will assume that practice (R) shares no unmeasured common causes with S or Y.

Revise DAG

• Need to elaborate DAG• Include

– instrument/center (R)– Measured (X) and

unmeasured (U) common causes of variables of interest

• Is R a valid instrument for the overall effect of A on Y?

A

S

Y

RX

U

Graphical criteria for instrument

• Remove effect of treatment of interest

• Check whether R independent of/D-separated from Y

• Directed path R->S->Y• Criterion not satisfied• R not a valid instrument for

overall effect of A• In Angrist, Imbens & Rubin

framework, the problem is that R has direct effect on Y through S and hence violates the exclusion restriction.

A

S

Y

RX

U

Second Example: Return to Schooling

• Y=Earnings, A=Years of Education• Unmeasured confounders: Ability, Motivation.• Card (1993) proposes as an IV, R= distance person grew

up from nearest four year college.• Problem:

– R also affects whether person lives in an SMSA as an adult (S) conditional on A and measured confounders X (whether lived in an SMSA growing up, region where grew up and family background variables).

– There is a wage premium to living in an SMSA as an adult.

Return to Schooling DAG

• R (living near college growing up) is not a valid instrument for the overall effect of A (years of schooling) on Y (earnings) because it has direct effect on Y through S (lives in SMSA as an adult).

A

S

Y

RX

U

Estimation

• For estimating overall effects of A in these two problems, can’t use– Standard methods based on ignorability– Standard instrumental variables methods

• Idea: Look for interactions between R and X that can serve as instruments.

Extended Instruments• Look for component of X that

interacts with R to affect A but not Y directly.

• Card proposes family income as component of X that – Interacts with R to affect A : college

proximity is a factor that lowers costs of higher education, consequently it has a bigger effect on a poorer family

– Does not directly effect S nor Y: the direct earnings effect of living near a college or the direct effect on living in an SMSA does not vary by family background.

A

S

Y

RX

UR*X

Two-step approach

• Estimate joint effect of A, S on Y• Estimate effect of A on S• Combine to obtain overall effect• In systems of linear models, overall

effect is sum of – Direct effect of A: ψA

– Indirect effect of A: ψSΦA

A

S

Y

A

A S

Two-step approach (1st step)• Yas potential outcome

• Model for joint effect:– Yas=Y00+aψA+sψS

– Rank-preserving/deterministic formulation

• Model for observables– E*=Best Linear Predictor

– E*(Y|X,R)=E*(YAS|X,R)= E*(Y00|X,R,X*R)+E*(A|X,R,X*R)ψA+E*(S|X,R,X*R)Ψs

– Identifiability requires that E*(Y00|X,R,X*R), E*(A|X,R,X*R) and E*(S|X,R) not collinear.

• One way: Assume E*(Y00|X,R,X*R) only depends on X. Then we need one component of X that interacts with R to affect A.

• Another way: Assume E*(Y00|X,R,X*R) depends on X and R but not X*R. Then we need at least two components of X that interacts with R to affect.

– Estimation by two stage least squares. Regress A and S on X, R and X*R. Regress Y on

ˆ ˆ, , ,A S X R

Two-step approach (2nd step)

• Under assumptions– Effect of A on S confounded

– R not instrument for effect of A on S

• Consider alternative– Linear model for joint effect of R, A

– Sra=S00+rΦR+aΦA

• Model for observables– E*(S|X,R)=E*(S00|X,R,X*R)+RΦR+

E*(A|X,R,X*R)ΦA

• Can estimate by 2SLS under the assumption that E*(S00|X,R,X*R) does not depend on X*R (uncheckable) and that X*R affects A.

• Regress A on X, R, X*R. Regress S on , X, R.

A

S

Y

RX

UR*X

A

Results for Card’s Data Estimate of

Overall Effect of A SE

Path Analysis (OLS) 0.0762 0.0004 Two Step Extended IV 0.1503 0.0462

Y= log earnings A= years of schooling S = lives in SMSA as an adult R = lived near 4 year college growing up X = experience, experience squared, black indicator, indicator for living in SMSA growing up, indicators for region growing up, mother and father’s education

Summary• The IV method can be a powerful strategy for

observational studies when there are confounders that are hard to measure and there is a “random” encouragement to receive treatment.

• When encouragement is not actually random, it is important to do a sensitivity analysis.

• Strong IVs are much less sensitive to bias.

• When the exclusion restriction might be violated, developed extended IV methods that use X*R as IVs.

Papers• Small, D.S. and Rosenbaum, P.R. (2008), “War

and Wages: The Strength of Instrumental Variables and Their Sensitivity to Unobserved Biases,” Journal of the American Statistical Association, 103, 924-933.

• Joffe, M. M., Small, D.S., Brunelli, S., Ten Have, T.R., and Feldman, H. I. (2008), "Extended Instrumental Variables Estimation for Overall Effects," International Journal of Biostatistics, 4.

• Baiocchi, M., Small, D.S., Lorch, S.A. and Rosenbaum, P.R. (2010), “Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants,” Journal of the American Statistical Association, 105, 1285-1296

• e-mail: dsmall@wharton.upenn.edu

Alternative estimands

• Assumed that interested in overall effect– Vascular Access (VA) inevitably affects Dose

of Dialysis (DD)• Type of VA limits possible dose

• However, may be possible to alter DD

• Interested in– Effect of DD– Effect of VA if affects DD in different fashion

from under current practice

Alternative estimands (cont’d)

• Show altered effect, new intervention on DAG

• Formulate in terms of potential outcomes

• Contrast for different levels of treatment

A

S

Y

RX

U

,

, target level of

under treatment , plan

( ) expected of level

under treatment , plan

g a

g a

aS

S S

a g

E Y Y

a g

Alternative estimands (cont’d)

• Defining intervention on S– Individualize target levels of S

• e.g., base on maximum tolerated DD• Insufficient information in established databases

(e.g, DOPPS)

– Set target level of S based on A, covariates X• Currently little information to set target levels• Available covariate information may be insufficient

to determine whether particular DD feasible for individual

Alternative estimands (cont’d)

• Defining intervention on S– Speculate about feasible interventions on S at

aggregate level• Consider effects of A on S under those

interventions; i.e., propose value for ΦA*

• Compute overall effect from component effects: ψA+ψSΦA

*

• Perform sensitivity analysis for values of ΦA*

One-step approach

• Estimator of effect of A on S does not require either standard ignorability or IV

• Can we do same for overall effect of A on Y?

• Remove S from graph, redraw diagram• Graph identical to original graph

removing Y• Use same methods of estimation for

effect of A on S

A

Y

RX

U

A

S

RX

U

R*X

R*X

Results for Card’s Data Estimate of

Overall Effect of A SE

Path Analysis (OLS) 0.0762 0.0004 Two Step Extended IV 0.1503 0.0462 One Step Extended IV 0.1500 0.0462

Y= log earnings A= years of schooling S = lives in SMSA as an adult R = lived near 4 year college growing up X = experience, experience squared, black indicator, indicator for living in SMSA growing up, indicators for region growing up, mother and father’s education

top related