what's new in causal inference: from propensity scores and mediation to external validity

WHAT'S NEW IN CAUSAL INFERENCE:

From Propensity Scores And Mediation To External Validity

Judea PearlUniversity of California

Los Angeles(www.cs.ucla.edu/~judea/)

1. Unified conceptualization of counterfactuals,

structural-equations, and graphs

2. Propensity scores demystified

3. Direct and indirect effects (Mediation)

4. External validity mathematized

OUTLINE

TRADITIONAL STATISTICALINFERENCE PARADIGM

Inference

Q(P)(Aspects of P)

PJoint

Distribution

e.g.,Infer whether customers who bought product Awould also buy product B.Q = P(B | A)

Inference

Q(M)(Aspects of M)

Data Generating

M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis.

JointDistribution

THE STRUCTURAL MODELPARADIGM

“Think Nature, not experiment!”•

INPUT OUTPUT

FAMILIAR CAUSAL MODELORACLE FOR MANIPILATION

STRUCTURALCAUSAL MODELS

Definition: A structural causal model is a 4-tupleV,U, F, P(u), where• V = {V1,...,Vn} are endogeneas variables• U = {U1,...,Um} are background variables• F = {f1,..., fn} are functions determining V,

vi = fi(v, u)• P(u) is a distribution over UP(u) and F induce a distribution P(v) over observable variables

Yuxy e.g.,

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “Y would be y (in situation u), had X been x,”

denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations

for X replaced by X = x) with input U=u, is equal to y.

)()( uYuY xMx

The Fundamental Equation of Counterfactuals:

READING COUNTERFACTUALSFROM SEM

Data shows: = 0.7, = 0.5, = 0.4A student named Joe, measured X=0.5, Z=1, Y=1.9Q1: What would Joe’s score be had he doubled his study time?

X Y7.0

5.0 4.03

TimeStudy

Treatment

9.1)(2 uYz

Q1: What would Joe’s score be had he doubled his study time?Answer: Joe’s score would be 1.9Or,In counterfactual notaion:

5.00.1

1 5.04.0

READING COUNTERFACTUALS

X Y7.0

5.0 4.03

5.0X 5.1

0 25.1

5.0X Y 25.2

4.075.03

1 5.04.0

Q2: What would Joe’s score be, had the treatment been 0 and had he studied at whatever level he would have studied had the treatment been 1?

READING COUNTERFACTUALS

X Y7.0

5.0 4.03

5.0X 5.1

25.1Z4.0

1 25.2

In particular:

),|(),|'(

)()()|(

')(':'

yxuPyxyYP

uPyYPyP

yuxYux

CAUSAL MODELS AND COUNTERFACTUALS

Definition: The sentence: “Y would be y (in situation u), had X been x,”

denoted Yx(u) = y, means:

The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.

)(),()(,)(:

uPzZyYPzuZyuYu

Joint probabilities of counterfactuals:

Define:

Assume:

Identify:

Estimate:

THE FIVE NECESSARY STEPSOF CAUSAL ANALYSIS

Express the target quantity Q as a function Q(M) that can be computed from any model M.

Formulate causal assumptions A using some formal language.

Determine if Q is identifiable given A.

Estimate Q if it is identifiable; approximate it, if it is not.

Test the testable implications of A (if any).

))(|())(|( 01 xdoYExdoYE ATEe.g.,

CAUSAL MODEL

A - CAUSAL ASSUMPTIONS

Q Queries of interest

Q(P) - Identified estimands

Data (D)

Q - Estimates of Q(P)

Causal inference

T(MA) - Testable implications

Statistical inference

Goodness of fit

Model testingProvisional claims

),|( ADQQ )(Tg

A* - Logicalimplications of A

CAUSAL MODEL

THE LOGIC OF CAUSAL ANALYSIS

IDENTIFICATION IN SCM

Find the effect of X on Y, P(y|do(x)), given the

causal assumptions shown in G, where Z1,..., Zk are auxiliary variables.

Can P(y|do(x)) be estimated if only a subset, Z, can be measured?

ELIMINATING CONFOUNDING BIASTHE BACK-DOOR CRITERION

P(y | do(x)) is estimable if there is a set Z ofvariables such that Z d-separates X from Y in Gx.

• Moreover,

(“adjusting” for Z) Ignorability

z z zxP

zyxPzPzxyPxdoyP

)|(),,(

)(),|())(|(

Front Door

EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008)

No, no!

Watch out!

Warm-up Exercises (X) Injury (Y)

PROPENSITY SCORE ESTIMATOR(Rosenbaum & Rubin, 1983)

),,,,|1(),,,,( 5432154321 zzzzzXPzzzzzL

Adjustment for L replaces Adjustment for Z

lLPxlLyPzPxzyP )(),|()(),|(Theorem:

P(y | do(x)) = ?

WHAT PROPENSITY SCORE (PS)PRACTITIONERS NEED TO KNOW

1. The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z).

2. Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others.

)|1()( zZXPzL

lPxlyPzPxzyP )(),|()(),|(

3. In particular, instrumental variables tend to amplify bias.4. Choosing sufficient set for PS, requires knowledge of the

model.

ZX YX Y X Y

SURPRISING RESULT:Instrumental variables are Bias-Amplifiers in linear models (Bhattarcharya & Vogt 2007; Wooldridge 2009)

“Naive” bias

Adjusted bias

11))(|(),|( cc

cccxdoYE

2102100 ))(|()|( ccccccxdoYEx

(Unobserved)

INTUTION:When Z is allowed to vary, it absorbs (or explains) some of the changes in X.

When Z is fixed the burden falls on U alone, and transmitted to Y (resulting in a higher bias)

WHAT’S BETWEEN AN INSTRUMENT AND A CONFOUNDER?Should we adjust for Z?

ANSWER:

CONCLUSION:

Yes, if

No, otherwise

Adjusting for a parent of Y is safer than a parent of X

CONCLUSIONS

• The prevailing practice of adjusting for all covariates, especially those that are good predictors of X (the “treatment assignment,” Rubin, 2009) is totally misguided.

• The “outcome mechanism” is as important, and much safer

• As X-rays are to the surgeon, graphs are for causation

REGRESSION VS. STRUCTURAL EQUATIONS(THE CONFUSION OF THE CENTURY)

Regression (claimless, nonfalsifiable): Y = ax + Y

Structural (empirical, falsifiable): Y = bx + uY

Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx

Q. When is b estimable by regression methods?A. Graphical criteria available

The mothers of all questions:Q. When would b equal a?A. When all back-door paths are blocked, (uY X)

TWO PARADIGMS FOR CAUSAL INFERENCE

Observed: P(X, Y, Z,...)Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...

How do we connect observables, X,Y,Z,…to counterfactuals Yx, Xz, Zy,… ?

N-R modelCounterfactuals areprimitives, new variables

Super-distribution

Structural modelCounterfactuals are derived quantities

Subscripts modify the model and distribution )()( yYPyYP xMx ,...,,,

,...),,...,,(*

zxZYZYX

constrain

“SUPER” DISTRIBUTIONIN N-R MODEL

Xy=0 0

inconsistency: x = 0 Yx=0 = Y Y = xY1 + (1-x) Y0

xyxzyx

ZYZYZYXP

...),...,...,,...,,(*

:Defines

ARE THE TWO PARADIGMS EQUIVALENT?

• Yes (Galles and Pearl, 1998; Halpern 1998)

• In the N-R paradigm, Yx is defined by consistency:

• In SCM, consistency is a theorem.

• Moreover, a theorem in one approach is a theorem in the other.

• Difference: Clarity of assumptions and their implications

01 )1( YxxYY

AXIOMS OF STRUCTURAL COUNTERFACTUALS

1. Definiteness

2. Uniqueness

3. Effectiveness

4. Composition (generalized consistency)

5. Reversibility

xuXtsXx y )( ..

')')((&))(( xxxuXxuX yy

xuX xw )(

)()()( uYuYxuX wwxw

yuYwuWyuY xxyxw )())((&))((

Yx(u)=y: Y would be y, had X been x (in state U = u)

(Galles, Pearl, Halpern, 1998):

FORMULATING ASSUMPTIONSTHREE LANGUAGES

},{),()(

),()()()(

XYZuYuY

uXuXuXuX

2. Counterfactuals:

1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)

3. Structural:

1. Expressing scientific knowledge2. Recognizing the testable implications of

one's assumptions 3. Locating instrumental variables in a system

of equations4. Deciding if two models are equivalent or

nested5. Deciding if two counterfactuals are

independent given another6. Algebraic derivations of identifiable

estimands

COMPARISON BETWEEN THEN-R AND SCM LANGUAGES

GRAPHICAL – COUNTERFACTUALS SYMBIOSIS

Every causal graph expresses counterfactuals assumptions, e.g., X Y Z

consistent, and readable from the graph.

• Express assumption in graphs• Derive estimands by graphical or algebraic

methods

)()(, uYuY xzx 1. Missing arrows Y Z

2. Missing arcs Y Z yx ZY

EFFECT DECOMPOSITION(direct vs. indirect effects)

1. Why decompose effects?

2. What is the definition of direct and indirect effects?

3. What are the policy implications of direct and indirect effects?

4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?

WHY DECOMPOSE EFFECTS?

1. To understand how Nature works

2. To comply with legal requirements

3. To predict the effects of new type of interventions:

Signal routing, rather than variable fixing

LEGAL IMPLICATIONSOF DIRECT EFFECT

What is the direct effect of X on Y ?

(averaged over z)

))(),(())(),( 01 zdoxdoYEzdoxdoYECDE ||(

(Qualifications)

(Hiring)

(Gender)

Can data prove an employer guilty of hiring discrimination?

Adjust for Z? No! No!

FISHER’S GRAVE MISTAKE(after Rubin, 2005)

Compare treated and untreated lots of same density

No! No!

))(),(())(),( 01 zdoxdoYEzdoxdoYE ||(

(Plant density)

(Yield)

(Soil treatment)

What is the direct effect of treatment on yield?

(Latent factor)

Proposed solution (?): “Principal strata”

z = f (x, u)y = g (x, z, u)

NATURAL INTERPRETATION OFAVERAGE DIRECT EFFECTS

Natural Direct Effect of X on Y:The expected change in Y, when we change X from x0 to x1 and, for each u, we keep Z constant at whatever value it attained before the change.

In linear models, DE = Natural Direct Effect

][001 xZx YYE

);,( 10 YxxDE

Robins and Greenland (1992) – “Pure”

)( 01 xx

DEFINITION AND IDENTIFICATION OF NESTED COUNTERFACTUALS

Consider the quantity

Given M, P(u), Q is well defined

Given u, Zx*(u) is the solution for Z in Mx*, call it z

is the solution for Y in Mxz

Can Q be estimated from data?

Experimental: nest-free expressionNonexperimental: subscript-free expression

)]([ )(*uYEQ uxZxu

entalnonexperim

alexperiment

)()(*uY uxZx

z = f (x, u)y = g (x, z, u)

DEFINITION OFINDIRECT EFFECTS

Indirect Effect of X on Y:The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have attained had X changed to x1.

In linear models, IE = TE - DE

][010 xZx YYE

);,( 10 YxxIE

No Controlled Indirect Effect

POLICY IMPLICATIONS OF INDIRECT EFFECTS

GENDER QUALIFICATION

HIRING

What is the indirect effect of X on Y?

The effect of Gender on Hiring if sex discriminationis eliminated.

IGNORE

Deactivating a link – a new type of intervention

1. The natural direct and indirect effects are identifiable in Markovian models (no confounding),

2. And are given by:

3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form.

MEDIATION FORMULAS

))](|())(|())[,(|(

)).(|())],(|()),(|([

revIEDETE

xdozPxdozPzxdoYEIE

xdozPzxdoYEzxdoYEDE

IEDETE WHY

IEDETE

In linear systems

xzzmxy

mIEDETE

mmmTELinear + interaction

MEDIATION FORMULASIN UNCONFOUNDED MODELS

)|()|(

])|()|()[,|([

)|()],|(),|([

xYExYETE

xzPxzPzxYEIE

xzPzxYEzxYEDE

mediation to owed responses of Fraction

mediationby explained responses of Fraction

)(revIEDETE

DETEmmIE

mediation

disablingby prevented Effect

alone mediationby sustained Effect

IEDETE WHY

IErevIE )(

Disabling mediation

Disabling direct path

TE - DE

In linear systems

Is NOT equal to:

MEDIATION FORMULAFOR BINARY VARIABLES

)()1)((

000101

0011100010gghhIE

hgghggDE

XX ZZ YY EE((Y|x,zY|x,z))==gxz EE((Z|xZ|x))==hhxx

nn11 00 00 00

nn22 00 00 11

nn33 00 11 00

nn44 00 11 11

nn55 11 00 00

nn66 11 00 11

nn77 11 11 00

nn88 11 11 11

43 hnnnn

87 hnnnn

RAMIFICATION OF THEMEDIATION FORMULA

• DE should be averaged over mediator levels,

IE should NOT be averaged over exposure levels.

• TE-DE need not equal IETE-DE = proportion for whom mediation is necessary

IE = proportion for whom mediation is sufficient

• TE-DE informs interventions on indirect pathways

IE informs intervention on direct pathways.

TRANSPORTABILITY -- WHEN CANWE EXTRPOLATE EXPERIMENTAL FINDINGS TO

DIFFERENT POPULATIONS?

Experimental study in LAMeasured:

Problem: We find

(LA population is younger)

What can we say about Intuition:

)),(|(),,(

zxdoyPzyxP

))(|(* xdoyP

Observational study in NYCMeasured: ),,(* zyxP

)(*)( zPzP

zPzxdoyPxdoyP )(*)),(|())(|(*

Z = age

TRANSPORT FORMULAS DEPEND ON THE STORY

(a)X Y

a) Z represents age

b) Z represents language skill

c) Z represents a bio-marker

???))(|(* xdoyP

X Y(c)

TRANSPORTABILITY(Pearl and Bareinboim, 2010)

Definition 1 (Transportability)Given two populations, denoted and *, characterized by models M = <F,V,U> and M* = <F,V,U+S>, respectively, a causal relation R is said to be transportable from to * if

1. R() is estimable from the set I of interventional studies on , and

2. R(*) is identified from I, P*, G, and G + S.

S = external factors responsible for M M*

TRANSPORT FORMULAS DEPEND ON THE STORY

a) Z represents age

b) Z represents language skill

c) Z represents a bio-marker

))(|(* xdoyP

X Y(c)

))(|( xdoyP

xzPzxdoyP )|(*)),(|())(|(* xdoyP

WHICH MODEL LICENSES THE TRANSPORT OF THE CAUSAL EFFECT

X Y(f)

X Y(c)

W X Y(e)

(c)X YZ

X Y(c)

W X YZ

W X Y(f

X Y(d)

W X YZ

DETERMINE IF THE CAUSAL EFFECT IS TRANSPORTABLE

The transport formula

What measurements need to be taken in the study and in the target population?

tPtxdowPwzPzxdoyP

)(*)),(|()|(*)),(|(

))(|(*

I TOLD YOU CAUSALITY IS SIMPLE

• Formal basis for causal and counterfactual

inference (complete)

• Unification of the graphical, potential-outcome

and structural equation approaches

• Friendly and formal solutions to

century-old problems and confusions.

• No other method can do better (theorem)

CONCLUSIONS

Thank you for agreeing with everything I said.

what's new in causal inference: from propensity scores and mediation to external validity

Documents

a nationwide causal mediation analysis of survival...

implementation and reporting of causal mediation analysis...

chapter 8 causal mediation analysis using...

causal mediation analysis. - department of data analysis ·...

propensity score matching for causal inference: ...

understanding causal pathways within health systems policy...

what's new in causal inference: from propensity scores and...

mediation: multiple variables david a. kenny. 2 mediation...

process evaluation and causal mediation analysis using...

propensity scores and causal inference learning...

a general approach to causal mediation...

causal mediation analysis short course...causal mediation...

mediation: r package for causal mediation...

mediation effect of locus of control on the causal ... ·...

semiparametric estimation for causal mediation analysis

mediation: r package for causal mediation analysis2....

mediation: r package for causal mediation analysis...

propensity score weighting for causal inference with...

paper sas1991:2018 causal mediation analysis …...paper...

applying propensity score and mediation analyses to ... ·...