a simple unified approach for estimating natural direct and

28
UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS Faculty of Health Sciences A simple unified approach for estimating natural direct and indirect effects Ph.d. course in adv. stat. analysis of epi. data - 2011 Theis Lange Department of Biostatistics December 1, 2011 Slide 1/23

Upload: others

Post on 15-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Faculty of Health Sciences

A simple unified approach for estimatingnatural direct and indirect effects

Ph.d. course in adv. stat. analysis of epi. data - 2011

Theis LangeDepartment of Biostatistics

December 1, 2011Slide 1/23

Page 2: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Outline

1 Why mediation

2 Current theory

3 Current estimation in practice

4 Our proposal base on marginal structural models

5 Implementation in R

(Joint work with Stijn Vansteelandt and Maarten Bekaert, GhentUniversity)

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 2/23

Page 3: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Why mediation

A Y

M

C

Typical scientific questions are:

• Which variables mediates the effect of social depravation?

• How much of the effect of job type on long term sickness ismediated through physical work environment?

• What would the effect of blinding the sex of job applicantsbe?

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 3/23

Page 4: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Current theoretical understanding

• Builds on the counterfactual framework, see e.g.Pearl (09).

• Mediation is an active field within the causal inferenceliterature.

• Still unresolved issues regarding identification and optimalinference.

MORE ON THAT IN A FEW SLIDES

However, using this technical machinery is still a daunting task.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 4/23

Page 5: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Current theoretical understanding

• Builds on the counterfactual framework, see e.g.Pearl (09).

• Mediation is an active field within the causal inferenceliterature.

• Still unresolved issues regarding identification and optimalinference.

MORE ON THAT IN A FEW SLIDES

However, using this technical machinery is still a daunting task.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 4/23

Page 6: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Current mediation techniques in practiceSome of the challenges when using current techniques forassessing mediation are:

• There is no unified approach for assessing mediation;e.g. going from a continuous mediator to a categorical cancompletely change which techniques to employ.

• Techniques cannot be employed in standard softwarewithout doing advanced programming.

• Including interactions is tricky.

• Computation of confidence intervals are quitecumbersome.

In this paper we address many of these issues, but first youmust endure a few technical slides.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 5/23

Page 7: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Current mediation techniques in practiceSome of the challenges when using current techniques forassessing mediation are:

• There is no unified approach for assessing mediation;e.g. going from a continuous mediator to a categorical cancompletely change which techniques to employ.

• Techniques cannot be employed in standard softwarewithout doing advanced programming.

• Including interactions is tricky.

• Computation of confidence intervals are quitecumbersome.

In this paper we address many of these issues, but first youmust endure a few technical slides.Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 5/23

Page 8: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Formal setupMeasured variables:

• Y outcome.• M the mediator.

• A exposure.• C baseline covariates.

Counterfactual variables:• Ya,m the outcome we would, possibly contrary to fact, have

observed had exposure A been set to the value a and themediator M set to m.

• Ma the value of the mediator if, possibly contrary to fact,exposure A was set to a.

Nested counterfactuals:• Ya∗,Ma , the outcome that would have been observed if A

were set to a∗ and M to the value it would have taken if Awere set to a.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 6/23

Page 9: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Controlled vs. Natural Effects

In the counterfactual framework comparing of effects is doneusing difference contrasts. However, the mediator can betreated in (at least) two different ways:

• Controlled Effects: The effect of a change in theexposure from a∗ to a is measured with the mediator fixedat a specific value, say m. That is compare Yam and Ya∗m.

• Natural Effects: The effect of a change in the exposurefrom x∗ to x is measured with the mediator kept at thevalue it would have had (for that individual) if the exposurewas still x∗. That is compare Ya∗,Ma and Ya,Ma .

The rest of the talk will treat natural effects, see also the recentwork by Pearl (10).

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 7/23

Page 10: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Natural direct and indirect effects

• Natural direct effects (DE) is the effect of changing theexposure relative to the direct pathway, but keepingexposure constant relative to the indirect pathway throughthe mediator; i.e:

Ya,Ma vs. Ya∗,Ma

• Natural indirect effects (IE) is the effect of changing theexposure relative to the indirect pathway, but keepingexposure constant relative to the direct pathway; i.e:

Ya∗,Ma vs. Ya∗,Ma∗

• Total effects (TE) denotes the effect of simply changingthe exposure; that is comparing Ya,Ma with Ya∗,Ma∗ .

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 8/23

Page 11: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Marginal Structural Models - definition• Marginal structural models (MSM)1, are models for the

marginal distribution of a counterfactual outcome and arepopular for non-nested counterfactuals such as Ya.

• For instance, the total causal effect of the exposure A onthe outcome Y can be modeled in terms of a MSM:

E [Ya] = b0 + b1a,

where b1 then captures the average causal effect of A.

• In contrast, MSMs for nested counterfactuals such asYa,Ma∗ have received little attention.

• MSM models are estimated by using inverse probabilityweighting; in effect creating a pseudo-population wheretreatment was in fact randomized.

1see Robins, Hernán, and Brumback (2010)Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 9/23

Page 12: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Marginal Structural Models - definition• Marginal structural models (MSM)1, are models for the

marginal distribution of a counterfactual outcome and arepopular for non-nested counterfactuals such as Ya.

• For instance, the total causal effect of the exposure A onthe outcome Y can be modeled in terms of a MSM:

E [Ya] = b0 + b1a,

where b1 then captures the average causal effect of A.

• In contrast, MSMs for nested counterfactuals such asYa,Ma∗ have received little attention.

• MSM models are estimated by using inverse probabilityweighting; in effect creating a pseudo-population wheretreatment was in fact randomized.

1see Robins, Hernán, and Brumback (2010)Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 9/23

Page 13: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

MSMs - as used in our technique• The rest of the talk will be based on the following

generalized linear MSM:

g(E [Ya,Ma∗ ]) = c0 + c1a + c2a∗ + c3a · a∗ (1)

g is a link function specifying the requested model for theoutcome (e.g. logistic model) and c3 is an interaction term.

• When c3 = 0 and g is the logit link, then

exp[c1(a− a∗)] =odds[Ya,Ma∗ = 1]odds[Ya∗,Ma∗ = 1]

captures the natural direct effect odds ratio.

• c3 6= 0 indicates that the magnitude of the direct effect maydepend on the natural level of the mediator, and may thusbe the result of an exposure-mediator interaction.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 10/23

Page 14: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

MSMs - as used in our technique• The rest of the talk will be based on the following

generalized linear MSM:

g(E [Ya,Ma∗ ]) = c0 + c1a + c2a∗ + c3a · a∗ (1)

g is a link function specifying the requested model for theoutcome (e.g. logistic model) and c3 is an interaction term.

• When c3 = 0 and g is the logit link, then

exp[c1(a− a∗)] =odds[Ya,Ma∗ = 1]odds[Ya∗,Ma∗ = 1]

captures the natural direct effect odds ratio.

• c3 6= 0 indicates that the magnitude of the direct effect maydepend on the natural level of the mediator, and may thusbe the result of an exposure-mediator interaction.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 10/23

Page 15: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

MSMs - as used in our technique II

• The class of generalized linear MSMs encompass a widerange of models, but Cox and additive hazard models arenot included in the class.

• These models assume that the hazard of thecounterfactual survival time Ya,Ma∗ can be expressed as

λ0(t) exp(c0 + c1a + c2a∗ + c3a · a∗) (2)γ0(t) + c1a + c2a∗ + c3a · a∗ (3)

where λ0(t) and γ0(t) are unspecified baseline hazards.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 11/23

Page 16: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Our proposed approachUnder standard assumptions of no-unmeasured confoundersand no exposure dependent confounding the MSMs in (1)-(3)can be estimated by the following approach:

1 Estimate a suitable model for the exposure conditional onconfounders using the original data set.

2 Estimate a suitable model for the mediator conditional onexposure and baseline variables using the original data set.

3 Construct a new data set by repeating each observationsin the original data set twice and including an additionalvariable A∗, which is equal to the original exposure for thefirst replication and equal to the opposite of the actualexposure for the second replication.

See the paper for details regarding categorical orcontinuous exposures.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 12/23

Page 17: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Our proposed approach (cont.)Under standard assumptions of no-unmeasured confoundersand no exposure dependent confounding the MSMs in (1)-(3)can be estimated by the following approach:

4 Compute weights given by

Wi =1

P(A = Ai |C = Ci )

P(M = Mi |A = A∗i ,C = Ci )

P(M = Mi |A = Ai ,C = Ci ).

through applying the fitted models from steps 1 and 2 tothe new data set. In most software packages this can bedone using predict-functionality.

5 Fit a suitable MSM model to the outcome including only Aand A∗ (and perhaps their interaction) as covariates andweighted by the weights from the previous step.

6 Conservative confidence intervals can be obtained usingrobust standard errors.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 13/23

Page 18: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Pros and cons of our approach

Strengths:• A unified approach for estimating natural direct and indirect

effects, which is applicable in most setups.

• Interactions can easily be included.

• Confidence intervals are readily available.

Drawbacks:• Procedures based on weighting can be unstable when

either the mediator or the exposure are continuous.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 14/23

Page 19: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Pros and cons of our approach

Strengths:• A unified approach for estimating natural direct and indirect

effects, which is applicable in most setups.

• Interactions can easily be included.

• Confidence intervals are readily available.

Drawbacks:• Procedures based on weighting can be unstable when

either the mediator or the exposure are continuous.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 14/23

Page 20: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: An example

• We reexamine the relation between socioeconomic status(SES), work environment, and long term sickness absenceanalyzed in Lange and Hansen (2011).

• The data set contains the following variables:SESbinary Socioeconomic status in two levelslogphys Logarithm of the physical work environment

score (between 1 and 100) at baselineage Age at baselinecoha Binary indicator for cohabitationchild Number of childrenevent Indicator for long term sickness absence.

Question: How much of the causal effect of SES on long termsickness is mediated through physical work environment?

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 15/23

Page 21: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: Step 1

• Load the data set into the variable myData.

• Fit a simple linear regression to the mediator (logphys)conditioning on SESbinary and baseline confounders.

• As we later want to change the values of SES when doingpredictions we use a copy of the SES variable in the modelfit. We also save the residual variance as it will be neededwhen computing weights.

myData$SESbinaryTemp <- myData$SESbinary

fitM <- lm(logphys ~ age + coha + factor(child) +SESbinaryTemp, data=myData)

fitMvariance <- var(residuals(fitM))

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 16/23

Page 22: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: Step 2

• Construct new variable SESbinaryStar corresponding tothe value of the exposure relative to the indirect path.

• Next construct a new data set by replicating the originaltwo times such that SESbinaryStar takes the twodifferent possible values through the replications.

myData1 <- myDatamyData2 <- myData

myData1$SESbinaryStar <- myData$SESbinarymyData2$SESbinaryStar <- 1-myData$SESbinary

newMyData <- rbind(myData1, myData2)

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 17/23

Page 23: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: Step 3• Weights are computed from the model for the mediator

(called fitM from Step 2) first using SEPbinary and thenSEPbinaryStar as explanatory variables

newMyData$SESbinaryTemp <- newMyData$SESbinary

tempDir <- dnorm(newMyData$logphys, mean=predict(fitM,type = "response", newdata=newMyData),sd=sqrt(fitMvariance) )

newMyData$SESbinaryTemp <- newMyData$SESbinaryStar

tempIndir <- dnorm(newMyData$logphys, mean=predict(fitM,type = "response", newdata=newMyData),sd=sqrt(fitMvariance) )

newMyData$weightM <- tempIndir/tempDir

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 18/23

Page 24: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: Step 4

• As the outcome (event) is binary fit a logistic MSM modelfor direct and indirect effects controlling for age and familystatus.

fitSESbinary <- glm(event ~ age + coha + factor(child) +SESbinary + SESbinaryStar,data=newMyData, family="binomial",weights=weightM)

• To assess estimation uncertainty robust standard errorsmust be computed:

library(sandwich)sqrt(diag(sandwich(fitSESbinary)))

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 19/23

Page 25: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Implementation in R: The result

summary(fitSESbinary) gives the following output:

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.41819 0.22278 -15.343 < 2e-16 ***AGEN05 0.01994 0.00414 4.815 1.47e-06 ***COHA052Nej 0.04646 0.11702 0.397 0.69137factor(child054num)2 -0.07352 0.11554 -0.636 0.52460factor(child054num)3 -0.14542 0.11789 -1.234 0.21736factor(child054num)4 -0.66842 0.20778 -3.217 0.00130 **SESbinary 0.23053 0.08554 2.695 0.00704 **SESbinaryStar 0.33991 0.08548 3.976 7.00e-05 ***

Thus the natural direct odds ratio of changing SES from high tolow is exp(0.2305) = 1.26 and the natural indirect odds ratio isexp(0.3399) = 1.40 within stratums of baseline confounders.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 20/23

Page 26: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

More on implementation

• The paper also contains an example with a categoricalmediator and a survival outcome.

• Both examples are given for both R and SAS.

• SPSS code is available from the author, but SPSS doesnot support robust standard errors in all models. It istherefore not recommended to use SPSS.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 21/23

Page 27: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

Conclusion

• We have proposed a unified approach for estimatingnatural direct and indirect effects, which is applicable inmost setups.

• The approach can incorporate interactions in a straightforward way.

• Confidence intervals are readily available.

• We have provided implementation examples in both R andSAS.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 22/23

Page 28: A simple unified approach for estimating natural direct and

U N I V E R S I T Y O F C O P E N H A G E N D E P A R T M E N T O F B I O S T A T I S T I C S

References I

T. Lange and J.V. HansenDirect and Indirect Effects in a Survival ContextEpidemilogy 22:575-581, 2011.

J. PearlDirect and Indirect EffectsProceedings of ASA Joint Statistical Meetings, 2001.

J. PearlThe Mediation Formula: A guide to the assessment ofcausal pathways in non-linear modelsUSLA Tech. Report: R-363, 2010.

J.M. Robins, M. Hernán and B. BrumbackMarginal Structural Models and Causal Inference inEpidemiologyEpidemilogy 11:550-560, 2000.

Theis Lange — A simple unified approach for estimating natural direct and indirect effects — December 1, 2011Slide 23/23