machine learning for healthcare4~2(4))*+,! two common approaches for counterfactual inference...

43
Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag Acknowledgement: adapted from slides by Uri Shalit (Technion)

Upload: others

Post on 18-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

MachineLearningforHealthcareHST.956,6.S897

Lecture15:CausalInferencePart2

DavidSontag

Acknowledgement:adaptedfromslidesbyUriShalit (Technion)

Page 2: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Reminder:PotentialOutcomes

• Eachunit(individual)𝑥" hastwopotentialoutcomes:– 𝑌$(𝑥") isthepotentialoutcomehadtheunitnotbeentreated:“controloutcome”

– 𝑌'(𝑥") isthepotentialoutcomehadtheunitbeentreated:“treatedoutcome”

• Conditionalaveragetreatmenteffectforunit𝑖:𝐶𝐴𝑇𝐸 𝑥" = 𝔼/0~2(/0|45)[𝑌'|𝑥"] − 𝔼/:~2(/:|45)[𝑌$|𝑥"]

• AverageTreatmentEffect:𝐴𝑇𝐸 = 𝔼4~2(4) 𝐶𝐴𝑇𝐸 𝑥

Page 3: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Twocommonapproachesforcounterfactualinference

CovariateadjustmentPropensityscores

Page 4: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

𝑥'

𝑥;

𝑥<

𝑇

… 𝑓(𝑥, 𝑇)𝑦

Regressionmodel

OutcomeCovariates(Features)

Covariateadjustment(reminder)

Explicitlymodeltherelationshipbetweentreatment,confounders,andoutcome:

Page 5: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Covariateadjustment(reminder)

• Underignorability,𝐶𝐴𝑇𝐸 𝑥 =𝔼4~2 4 𝔼 𝑌' 𝑇 = 1, 𝑥 − 𝔼 𝑌$ 𝑇 = 0, 𝑥

• Fitamodel𝑓 𝑥, 𝑡 ≈ 𝔼 𝑌D 𝑇 = 𝑡, 𝑥 ,then:𝐶𝐴𝑇𝐸J 𝑥" = 𝑓 𝑥", 1 − 𝑓(𝑥", 0).

Page 6: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Covariateadjustmentwithlinearmodels

• Assumethat:

• Then:𝐶𝐴𝑇𝐸(𝑥): = 𝔼[𝑌' 𝑥 − 𝑌$ 𝑥 ] =

𝔼[(𝛽𝑥 + 𝛾 + 𝜖') − 𝛽𝑥 + 𝜖$ ] = 𝛾

age medicationBloodpressure

𝑌D 𝑥 = 𝛽𝑥 + 𝛾 ⋅ 𝑡 + 𝜖D𝔼 𝜖D = 0

Page 7: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

• Assumethat:

• Then:𝐶𝐴𝑇𝐸(𝑥): = 𝔼[𝑌' 𝑥 − 𝑌$ 𝑥 ] =

𝔼[(𝛽𝑥 + 𝛾 + 𝜖') − 𝛽𝑥 + 𝜖$ ] = 𝛾

age medication

𝐴𝑇𝐸:= 𝔼2 4 𝐶𝐴𝑇𝐸 𝑥 = 𝛾

Bloodpressure

𝑌D 𝑥 = 𝛽𝑥 + 𝛾 ⋅ 𝑡 + 𝜖D𝔼 𝜖D = 0

Covariateadjustmentwithlinearmodels

Page 8: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

• Assumethat:

• Forcausalinference,needtoestimate𝛾 well,not𝑌D 𝑥 - Identification,notprediction

• MajordifferencebetweenMLandstatistics

age medication

𝐴𝑇𝐸:= 𝔼2 4 𝐶𝐴𝑇𝐸 𝑥 = 𝛾

Bloodpressure

𝑌D 𝑥 = 𝛽𝑥 + 𝛾 ⋅ 𝑡 + 𝜖D𝔼 𝜖D = 0

Covariateadjustmentwithlinearmodels

Page 9: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Whathappensiftruemodelisnotlinear?

• Truedatageneratingprocess,𝑥 ∈ ℝ:

𝐴𝑇𝐸 = 𝔼 𝑌' − 𝑌$ = 𝛾• Hypothesizedmodel:

𝑌D 𝑥 = 𝛽𝑥 + 𝛾 ⋅ 𝑡 + 𝛿 ⋅ 𝑥;

𝑌DT 𝑥 = 𝛽U𝑥 + 𝛾V ⋅ 𝑡

𝛾V = 𝛾 + 𝛿𝔼 𝑥𝑡 𝔼 𝑥; − 𝔼[𝑡;]𝔼[𝑥;𝑡]𝔼 𝑥𝑡 ; − 𝔼[𝑥;]𝔼[𝑡;]

Dependingon𝜹,canbemadetobearbitrarilylargeorsmall!

Page 10: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Covariateadjustmentwithnon-linearmodels

• RandomforestsandBayesiantreesHill(2011),Athey &Imbens (2015),Wager&Athey (2015)

• GaussianprocessesHoyeretal.(2009),Zigler etal.(2012)

• NeuralnetworksBecketal.(2000),Johanssonetal.(2016),Shalitetal.(2016),Lopez-Pazetal.(2016)

Page 11: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Example:Gaussianprocesses

10 20 30 40 50 60

8090

100

110

120

GP−Independent

●●

●●

●●

●●

● ●

●●

●●

●●

10 20 30 40 50 60

8090

100

110

120

GP−Grouped

●●

●●

●●

●●

● ●

●●

●●

●●

Figures:VincentDorie&JenniferHill

Separatetreatedandcontrolmodels

Jointtreatedandcontrolmodel

𝑌' 𝑥

𝑌$ 𝑥

𝑌' 𝑥

𝑌$ 𝑥

𝑥𝑥

𝑦

Treated

Control

Page 12: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Example:Neuralnetworks

Shalit,Johansson,Sontag.EstimatingIndividualTreatmentEffect:GeneralizationBoundsandAlgorithms.ICML,2017

" Φ…

… %&

%' (

)

*

Covariates Shared representation

Predicted potential outcomes

Learning objective Outcome

InterventionNeural network layers

Page 13: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matching• Findeachunit’slong-lostcounterfactualidenticaltwin,checkuponhisoutcome

Page 14: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matching• Findeachunit’slong-lostcounterfactualidenticaltwin,checkuponhisoutcome

Obama,hadhegonetolawschool Obama,hadhegonetobusinessschool

Page 15: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matching• Findeachunit’slong-lostcounterfactualidenticaltwin,checkuponhisoutcome

• UsedforestimatingbothATEandCATE

Page 16: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matchtonearestneighborfromoppositegroup

Treated

Control Age

Charlesoncomorbidityindex

Page 17: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matchtonearestneighborfromoppositegroup

Treated

Control Age

Charlesoncomorbidityindex

Page 18: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

1-NNMatching

• Let𝑑 ⋅,⋅ beametricbetween𝑥’s• Foreach𝑖,define𝑗 𝑖 = argmin

_`.D.DabD5𝑑(𝑥_, 𝑥")

𝑗 𝑖 isthenearestcounterfactualneighborof𝑖• 𝑡" = 1,unit𝑖 istreated:

𝐶𝐴𝑇𝐸J 𝑥" = 𝑦" − 𝑦_ "• 𝑡" =0,unit𝑖 iscontrol:

𝐶𝐴𝑇𝐸J 𝑥" = 𝑦_(") − 𝑦"

Page 19: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

1-NNMatching

• Let𝑑 ⋅,⋅ beametricbetween𝑥’s• Foreach𝑖,define𝑗 𝑖 = argmin

_`.D.DabD5𝑑(𝑥_, 𝑥")

𝑗 𝑖 isthenearestcounterfactualneighborof𝑖

• 𝐶𝐴𝑇𝐸J 𝑥" = (2𝑡" − 1)(𝑦"−𝑦_ " )

• 𝐴𝑇𝐸J = 'd∑ 𝐶𝐴𝑇𝐸J 𝑥"d"f'

Page 20: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Matching

• Interpretable,especiallyinsmall-sampleregime• Nonparametric• Heavilyreliantontheunderlyingmetric• Couldbemisledbyfeatureswhichdon’taffecttheoutcome

Page 21: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Covariateadjustmentandmatching

• Matchingisequivalenttocovariateadjustmentwithtwo1-nearestneighborclassifiers:𝑌g' 𝑥 = 𝑦hh0 4 ,𝑌g$ 𝑥 = 𝑦hh: 4where𝑦hhi 4 isthenearest-neighborof𝑥amongunitswithtreatmentassignment

𝑡 = 0,1

• 1-NNmatchingisingeneralinconsistent,thoughonlywithsmallbias(Imbens 2004)

Page 22: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Twocommonapproachesforcounterfactualinference

CovariateadjustmentPropensityscores

Page 23: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores

• ToolforestimatingATE• Basicidea:turnobservationalstudyintoapseudo-randomizedtrialbyre-weightingsamples,similartoimportancesampling

Page 24: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Inversepropensityscorere-weighting

𝑥' = 𝑎𝑔𝑒

𝑥; =Charlsoncomorbidityindex

Treated

Control

𝑝(𝑥|𝑡 = 0) ≠ 𝑝 𝑥 𝑡 = 1control treated

Page 25: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

𝑝 𝑥 𝑡 = 0 ⋅ 𝑤$(𝑥) ≈ 𝑝 𝑥 𝑡 = 1 ⋅ 𝑤'(𝑥)reweightedcontrolreweightedtreated

Inversepropensityscorere-weighting

𝑥' = 𝑎𝑔𝑒

𝑥; =Charlsoncomorbidityindex

Treated

Control

Page 26: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscore• Propensityscore:𝑝 𝑇 = 1 𝑥 ,usingmachinelearningtools

• Samplesre-weightedbytheinversepropensityscoreofthetreatmenttheyreceived

Page 27: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores– algorithmInverseprobabilityoftreatmentweightedestimator

HowtocalculateATEwithpropensityscoreforsample 𝑥', 𝑡', 𝑦' , … , (𝑥d, 𝑡d, 𝑦d)

1. UseanyMLmethodtoestimate𝑝V 𝑇 = 𝑡 𝑥

2. ˆATE =1

n

X

i s.t. ti=1

yip̂(ti = 1|xi)

� 1

n

X

i s.t. ti=0

yip̂(ti = 0|xi)

Page 28: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores– algorithmInverseprobabilityoftreatmentweightedestimator

HowtocalculateATEwithpropensityscoreforsample 𝑥', 𝑡', 𝑦' , … , (𝑥d, 𝑡d, 𝑦d)

1. Randomizedtrial𝑝(𝑇 = 𝑡|𝑥) = 0.5

2. ˆATE =1

n

X

i s.t. ti=1

yip̂(ti = 1|xi)

� 1

n

X

i s.t. ti=0

yip̂(ti = 0|xi)

Page 29: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores– algorithmInverseprobabilityoftreatmentweightedestimator

HowtocalculateATEwithpropensityscoreforsample 𝑥', 𝑡', 𝑦' , … , (𝑥d, 𝑡d, 𝑦d)

1. Randomizedtrial𝑝(𝑇 = 𝑡|𝑥) = 0.5

2. ˆATE =1

n

X

i s.t. ti=1

yi0.5

� 1

n

X

i s.t. ti=0

yi0.5

=

2

n

X

i s.t. ti=1

yi �2

n

X

i s.t. ti=0

yi

Page 30: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores– algorithmInverseprobabilityoftreatmentweightedestimator

HowtocalculateATEwithpropensityscoreforsample 𝑥', 𝑡', 𝑦' , … , (𝑥d, 𝑡d, 𝑦d)

1. Randomizedtrial𝑝 = 0.5

2. ˆATE =1

n

X

i s.t. ti=1

yi0.5

� 1

n

X

i s.t. ti=0

yi0.5

=

2

n

X

i s.t. ti=1

yi �2

n

X

i s.t. ti=0

yi

Page 31: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores– algorithmInverseprobabilityoftreatmentweightedestimator

HowtocalculateATEwithpropensityscoreforsample 𝑥', 𝑡', 𝑦' , … , (𝑥d, 𝑡d, 𝑦d)

1. Randomizedtrial𝑝 = 0.5

2. ˆATE =1

n

X

i s.t. ti=1

yi0.5

� 1

n

X

i s.t. ti=0

yi0.5

=

2

n

X

i s.t. ti=1

yi �2

n

X

i s.t. ti=0

yi

Sumover~𝒏𝟐terms

Page 32: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores- derivation

• Recallaveragetreatmenteffect:

• Weonlyhavesamplesfor:

Ex⇠p(x)[ E [Y1|x, T = 1]�E [Y0|x, T = 0] ]

Ex⇠p(x|T=1)[ E [Y1|x, T = 1]]

Ex⇠p(x|T=0)[ E [Y0|x, T = 0]]

Page 33: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores- derivation

• Weonlyhavesamplesfor:

Ex⇠p(x|T=1)[ E [Y1|x, T = 1]]

Ex⇠p(x|T=0)[ E [Y0|x, T = 0]]

Page 34: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores- derivation

• Weonlyhavesamplesfor:

• Weneedtoturn𝑝(𝑥|𝑇 = 1) into𝑝(𝑥):

Ex⇠p(x|T=1)[ E [Y1|x, T = 1]]

Ex⇠p(x|T=0)[ E [Y0|x, T = 0]]

p(x|T = 1) · p(T = 1)

p(T = 1|x) = p(x)?

Page 35: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores- derivation

• Weonlyhavesamplesfor:

• Weneedtoturn𝑝(𝑥|𝑇 = 1) into𝑝(𝑥):

Ex⇠p(x|T=1)[ E [Y1|x, T = 1]]

Ex⇠p(x|T=0)[ E [Y0|x, T = 0]]

p(x|T = 1) · p(T = 1)

p(T = 1|x) = p(x)

Propensityscore

Page 36: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Propensityscores- derivation

• Weonlyhavesamplesfor:

• Weneedtoturn𝑝(𝑥|𝑇 = 0) into𝑝(𝑥):

Ex⇠p(x|T=1)[ E [Y1|x, T = 1]]

Ex⇠p(x|T=0)[ E [Y0|x, T = 0]]

p(x|T = 0) · p(T = 0)

p(T = 0|x) = p(x)

Propensityscore

Page 37: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

• Wewant:

• Weknowthat:

• Thus:

• Wecanapproximatethisempiricallyas:

(similarlyforti=0)

p(x|T = 1) · p(T = 1)

p(T = 1|x) = p(x)

Ex⇠p(x)[Y1(x)]

Ex⇠p(x|T=1)

p(T = 1)

p(T = 1 | x)Y1(x)

�= Ex⇠p(x)[Y1(x)]

1

n1

X

i s.t.ti=1

n1/n

p̂(ti = 1 | xi)yi

�=

1

n

X

i s.t.ti=1

yip̂(ti = 1 | xi)

Page 38: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

ProblemswithIPW

• Needtoestimatepropensityscore(probleminallpropensityscoremethods)

• Ifthere’snotmuchoverlap,propensityscoresbecomenon-informativeandeasilymis-calibrated

• Weightingbyinversecancreatelargevarianceandlargeerrorsforsmallpropensityscores– Exacerbatedwhenmorethantwotreatments

Page 39: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Manymoreideasandmethods

• Naturalexperiments&regressiondiscontinuity

• Instrumentalvariables

Page 40: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Manymoreideasandmethods–Naturalexperiments

• Doesstressduringpregnancyaffectlaterchilddevelopment?

• Confounding:genetic,motherpersonality,economicfactors…

• Naturalexperiment:theCubanmissilecrisisofOctober1962.Manypeoplewereafraidanuclearwarisabouttobreakout.

• Comparechildrenwhowereinuteroduringthecrisiswithchildrenfromimmediatelybeforeandafter

Page 41: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Manymoreideasandmethods–Instrumentalvariables

• Informally:avariablewhichaffectstreatmentassignmentbutnottheoutcome

• Example:areprivateschoolsbetterthanpublicschools?

• Confounding:differentstudentpopulation,differentteacherpopulation

• Can’tforcepeoplewhichschooltogoto

Page 42: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Manymoreideasandmethods–Instrumentalvariables

• Informally:avariablewhichaffectstreatmentassignmentbutnottheoutcome

• Example:areprivateschoolsbetterthanpublicschools?

• Can’tforcepeoplewhichschooltogoto• Canrandomly giveoutvoucherstosomechildren,givingthemanopportunitytoattendprivateschools

• Thevoucherassignmentistheinstrumentalvariable

Page 43: Machine Learning for Healthcare4~2(4))*+,! Two common approaches for counterfactual inference Covariate adjustment ... •Confounding: different student population, different teacher

Summary

• Twoapproachestousemachinelearningforcausalinference:1. Predictoutcomegivenfeaturesandtreatment,then

useresultingmodeltoimputecounterfactuals(covariateadjustment)

2. Predicttreatmentusingfeatures(propensityscore),thenusetoreweightoutcomeorstratifythedata

• Causalgraphsimportantforthinkingthroughwhetherproblemissetupappropriatelyandwhetherassumptionshold