Download - Causal eﬀects in mediation analysis with limited-dependent ...940585/FULLTEXT01.pdfthe potential outcome framework (Rubin, 1974) was suggested by Robins and Greenland (1992) and

Causal effects in mediation analysiswith limited-dependent variables

By: Marten Schultzberg

Department of Statistics

Uppsala University

Supervisor: Fan Yang-Wallentin

2016

Contents

1 Introduction 3

1.1 Mediation analysis in general . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Direct and indirect effects . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Counterfactual-based causal effects in mediation analysis . . . . . . . . . 5

1.4 Limited-dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Methodology 6

2.1 Mediation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 The simple mediation model and its motivation . . . . . . . . . . 7

2.1.2 Adding relations to the simple mediation model . . . . . . . . . . 8

2.1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Two-group regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Limited-dependent variable analysis . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 The Two-part model . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Mediation analysis with limited-dependent variable(s) . . . . . . . . . . . 13

2.5 The counterfactual framework . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1 Effect notation and calculations for mediation . . . . . . . . . . . 15

2.5.2 Assumptions for causal effect of mediation models . . . . . . . . . 17

3 Mediation, two-part M 19

3.1 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Derivation of effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.1 Conditional expected value of Y . . . . . . . . . . . . . . . . . . . 22

3.3.2 Causal effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Mediation, two-part M and two-part Y 23

4.1 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Derivation of causal effects . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 Conditional expected values . . . . . . . . . . . . . . . . . . . . . 26

1

4.3.2 Causal effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Monte Carlo simulations 30

5.1 Synthetic models and true data generating processes . . . . . . . . . . . 30

5.1.1 Weak and Moderately strong effects model . . . . . . . . . . . . . 30

5.1.2 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1.3 Model 1 - Two-part M . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1.4 Model 1 - Two-part M, Weak . . . . . . . . . . . . . . . . . . . . 33

5.1.5 Model 1 - Two-part M, Moderately strong . . . . . . . . . . . . . 34

5.1.6 Model 2 - Two-part M, Two-part Y . . . . . . . . . . . . . . . . . 34

5.1.7 Model 2 - Two-part M, Two-part Y, Weak . . . . . . . . . . . . . 34

5.1.8 Model 2 - Two-part M, Two-part Y, Moderately strong . . . . . . 34

5.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.1 Outcome variables . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.2 Two-part M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3.3 Two-part M, Two-part Y . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 Discussion 52

7 Conclusion 54

A Appendix - Derivation - Two-part M 59

B Appendix - Derivation - Twopart M, twopart Y 65

C Appendix - Mplus syntax 80

2

Abstract

Mediation is used to separate direct and indirect effects of an exposure variable on an

outcome variable. In this thesis, a mediation model is extended to account for censored

mediator and outcome variable. The two-part framework is used to account for the

censoring. The counterfactual based causal effects of this model are derived. A Monte

Carlo study is performed to evaluate the behaviour of the causal effects accounting for

censoring, together with a comparison with methods for estimating the causal effects

without accounting for censoring. The results of the Monte Carlo study show that the

effects accounting for censoring have substantially smaller bias when censoring is present.

The proposed effects also seem to have a low cost with unbiased estimates for sample

sizes as small as 100 for the two-part mediator model. In the case of limited mediator

and outcome, sample sizes larger than 300 is required for reliable improvements. A small

sensitivity analysis stresses the need of further development of the two-part models.

Keywords: counterfactuals, two-part model, potential outcome

3

1 Introduction

The introduction of this study will give a quick overview motivation of the study followed

by the research questions.

1.1 Mediation analysis in general

Mediation analysis is used to quantify the effects that an exposure variable has on an

outcome variable, mediated by some intermediate variable. For example a gene that

causes cancer, also causes increased cigarette usage that in turn causes cancer. The

effect of the gene on cancer is mediated by cigarette usage. The intermediate variable

(cigarette usage) is often called the mediator variable, or the mediator. The hypothesised

relationships of a simple mediation model is that an exposure variable (X) causes some

change in a mediator variable (M) that in turn causes a change in an outcome variable

(Y) (Hayes, 2013). The mediation analysis has become widely used in social sciences and

biomedical studies especially since the influential paper by Baron and Kenny (1986). The

claim of a possibility to open up the ”black-box”, answering question such as ”Through

what mechanism does X affect Y?” or ”How does a change in X affect Y?” is probably

an explanation of the vast usage. For a thorough overview of traditional mediation

analysis see Hayes (2013). In recent years the causal claims of these models and their

limitations has been investigated in detail. The potential outcome and the counterfactual

framework has been developed and applied contributing to general definitions of causal

effects and inference of mediation analysis. The causal mediation literature has also

focused on acknowledging and assessing the strong assumptions on which the causal

interpretations of these effects rely (Imai et al., 2010; Pearl, 2001; Robins and Greenland,

1992; VanderWeele, 2015).

1.2 Direct and indirect effects

The need to separate direct and indirect effects in mediation is essentially a tool to

make complex relations comprehensible. If a variable X affects both M and Y, but

M also affects Y, then how should the effect of X on Y be separated from X on Y

trough M? The corresponding question in the the cancer example would be ”How is

the direct effect of the gene on cancer separated from the effect of the gene through

4

cigarette usage on cancer?” As will be demonstrated several research questions can be

answered once the set of causal effects are defined. The traditional way of calculating the

indirect effect is called the product method and is credited to Baron and Kenny (1986),

oftentimes the method is even referred to as the Baron and Kenny-method. Baron and

Kenny (1986) has been one of the most influential papers in the mediation field, making

the product method commonly applied. The product method is adequate for linear

mediation models with continuous mediators and outcome. In some research areas this is

the most commonly applied mediation model (Rucker et al., 2011). However, as Robins

and Greenland (1992) and Pearl (2001) pointed out the product method is unable to

account for non-continuous mediators and outcomes, as well as mediation models with

moderation and other non-linear functional forms. General effect definitions, building on

the potential outcome framework (Rubin, 1974) was suggested by Robins and Greenland

(1992) and Pearl (2001).

1.3 Counterfactual-based causal effects in mediation analysis

The causal effects based on counterfactuals offer general causal effects definition. The

definition does not assume any functional form or model and can be applied to a wide

range of mediation models with varying complexity (Pearl, 2001). More recently, causal

effect in many special cases of mediation models has been derived from these definitions

(Muthen et al., 2016; Vanderweele, 2012; VanderWeele and Vansteelandt, 2010; Wang

and Albert, 2012).

1.4 Limited-dependent variable

In many research situations limited dependent variables are encountered. Figure 1 shows

an example of a sample from a limited variable, censored from below. This is characteristic

histogram for a censored variable with many observations at one point and no observations

below that point, as if the range of the observations was limited by something. The

importance of accounting for censoring has been pointed out by e.g Tobin (1958), Cragg

(1971), Jones (1989) and Brown et al. (2005). If limited-dependent variables are not

handled, model estimates will be biased. Thus, to estimate effects without bias in a

mediation model where some dependent variable is limited, special methods are required.

Limited dependent outcome variables in mediation is recently handled in Muthen et al.

5

(2016). However, also the mediator in a mediation analysis is dependent in the regression

on the exposure.

Figure 1: A sample of 1000 observation from a limited normal variable with mean andvariance equal to 1. Censored in the point 0.22.

If the bias found in regression analysis with limited-dependent variables transfers to

the mediation analysis this might have severe consequences on the effect estimates and

conclusions from mediation analysis. It is of interest to investigate the impact of the

ability to account for limited dependent variables as mediators and outcome variables.

1.5 Research questions

The aims of this study is to answer the following questions:

1. (a) How can the mediation model be formulated to account for a limited mediator

and/or outcome variable?

(b) What are the additional assumption(s) for the two-part mediation models

compared to the simple mediation model?

2. How are the counterfactual based causal effects for the two-part mediation models

derived?

3. Does acknowledging and accounting for limited mediators and/or outcome variable

improve the accuracy of the causal effect estimates?

4. What are the sampling behaviours of the causal effects for the two-part mediation

models?

6

The remaining parts of the thesis will have the following structure. Section 2 will give

a detailed overview of the methods and motivations for the formulation of the limited-

dependent variable models. Section 3 contains the model formulation and causal effects

derivations for the two-part M model. Section 4 contains the model formulation and

causal effects derivations for the two-part M, two-part Y model. In Section 5 Monte

Carlo simulations are performed to evaluate the small sample properties for these models.

Section 6 and 7 contain discussion and conclusions of the study.

2 Methodology

In this section all the parts necessary to construct the two-part mediation models is

introduced and motivated in detail.

2.1 Mediation analysis

In this section the development and properties of mediation analysis are presented. The

simple mediation model is presented and extended to become more suitable for this study.

2.1.1 The simple mediation model and its motivation

The simple mediation model is illustrated in Figure 2. The exposure X affects the outcome

Y, both directly and indirectly mediated by the mediator M. Rather than to focus on the

size of the total effect of an exposure on an outcome, mediation analysis directs special

attention to the ”How” part. That is, how or by what means, does the exposure affect

the outcome? Through what intermediate steps does the exposure affect the outcome?

An easy way to motivate the need of answers to this kind of questions is through the

perspective of policy makers. In many situations an exposure cannot be regulated by

policies, however some mediators might. Drawing on the example from VanderWeele

(2015), originally analysed in Vanderweele (2012), the risk of lung cancer is investigated.

A genetic variant of a chromosome (X) is believed to affect the risk of lung cancer (Y).

Moreover, evidence has shown that this genetic variant affect smoking behaviour, making

carriers of the genetic variant smoke more. It is known that smoking cigarettes increases

the risk of lung cancer. It is possible that the genetic variant of the chromosome is

causing cancer only through its effect on cigarette usage. In that case, the policy makers

7

can try to reduce the cigarette usage by laws and taxes in order to decrease the number

of lung cancer patients. It is also possible that the indirect effect of the gene through

cigarette usage on the risk of lung cancer is small, and the gene directly causes cancer. In

the latter scenario, it might be difficult for the policy makers to take effective actions to

decrease the number of patients diagnosed with lung cancer. This over-simplified example

illustrates the importance of understanding the role which mechanisms themselves play

in effective policy making. If one can quantify and compare the importance of single

mediators, resources can be directed more effectively. This way of coming at questions

transfers to a wide range of situations, in various kinds of research fields. In Equation 1

M

YX

εM

εY

Figure 2: The simple mediation model. X is the exposure variable, M the mediator andY the outcome.

the model formulation of the simple mediation model is displayed.

Mi = γ0 + γ1Xi + εMi

Yi = β0 + β1Mi + β2Xi + εY i

(1)

, Xi is the exposure variable for individual i. Mi and Yi are mediator and outcome of

individual i, both assumed continuous. εi is the error term. The error terms are most

commonly assumed to be iid normally distributed with mean zero and uncorrelated with

X and M. The model is constructed by two linear models. One model where the mediator

is modelled by the exposure, and one where the outcome is modelled by the mediator

and the exposure.

2.1.2 Adding relations to the simple mediation model

In most situations the simple mediation model is too parsimonious to capture relevant

mechanisms. For example the assumptions of no unmeasured confounder between M

8

and Y (see Section 2.5.2 for details) cannot be guaranteed to be fulfilled, but by adding

relevant covariates the violation might be substantially reduced. Moreover, interaction

between M and X is common; VanderWeele (2015) even suggest that it might generally

be better to keep interaction terms in analysis even when non-significant interaction

estimates are found. In Figure 3, a path diagram for a mediation model with a covariate

affecting M and Y is displayed. Additionally, interaction between M and X is visualized

by a path from X to the path between M and Y. In social sciences interaction is more

often referred to as moderation. The interaction term poses no problems in estimation,

however the traditional product method-based direct and indirect effects are no longer

applicable (Pearl, 2001). The model including the interaction and covariate can be written

as in Equation 2.

M

YX

C

εM

εY

Figure 3: Mediation model with covariate and interaction between M and X. C is thecovariate, X the exposure variable, M the mediator and Y the outcome.

Mi = γ0 + γ1Xi + γ2Ci + εMi

Yi = β0 + β1Mi + β2Xi + β3MiXi + β4Ci + εY i

(2)

The model formulation in Equation 2 is similar to that of the simple mediation model in

Equation 1. The covariate C and the interaction term MX is added. Xi and Ci are the

exposure and covariate variable for individual i. Mi and Yi are the mediator and outcome

for individual i. Again the error term, ε are usually assumed iid normally distributed

with mean zero, uncorrelated with X, C and M.

9

2.1.3 Estimation

The simple mediation model (Equation 1) and the extended mediation model (Equation

2) are estimated with Maximum Likelihood (ML) estimation or ordinary least square

(OLS). If the error terms are independent normally distributed, the OLS estimation

of the two regression models one by one give the same result as the ML estimation

of the whole system simultaneously. The likelihood function of these models is given in

Equation 3. The right hand expression in Equation 3 implies that if there are no common

parameters, as in typical cases, the terms can be maximized separately. The likelihood

can be expressed

logL =n∑i=1

log[yi,mi|xi, ci] =n∑i=1

log[yi|mi, xi, ci] +n∑i=1

log[mi|xi, ci] (3)

, where log[...] is the log of the conditional density function.

2.2 Two-group regression

Two-group regression is special case of multi-group regression, fitting two different regres-

sions to two subgroups within a sample. This can be compared to a single model with

a dummy variable estimating the mean differences between two subgroups in a sample.

The main difference is that two-group regression allows for different covariates in the two

regressions. For the variables that are common for the two regressions the coefficients

can be constrained to be the same, or to have different values, between the groups. The

mean difference between the subgroups that a dummy coefficient would estimate in a one

regression model, is estimated also in the two-group setting by the intercept difference.

If the two models include the same covariates and all coefficients are constrained to be

equal between the models, the intercept difference will be exactly the same as a dummy

variable coefficient in a single model. If some different covariates are included and/or

common covariates are not constrained the intercept difference will not be the same as

the dummy coefficient. Additionally the two-group regression makes it possible to use

different transformations of the same variable, between the groups.

The technical difference between two separate regressions and a two-group regression is

that common parameters constrained to be equal in the two regressions can be estimated

using all available information from both subsets of the sample. However, if no parameter

10

is set to be common, two separate regressions and the two-group regression will give the

exact same estimates as two-group. Hence, two-group regression is motivated when two

subgroups of a sample are believed to have substantially different relationships between

the covariates and the outcome for some covariates but equal for others. As in the case

with the limited mediator M (see details in Section 3), it might be believed that the

relation between the exposure and the outcome for the M=0 and the M>0 group is

similar. However, the M>0 group might have a relation with Y, that the fixed M=0

group will not have. Two-group regression makes it possible to fit one linear regression

of Y on C and X, for the M=0 part, another linear regression for the M>0 part where

the logarithm of M can be added to the independent variables C and X. The estimation

of the slope of Y on X can be constrained to be the same for both regressions, allowing

the estimates of these parameters to be based on the full data set.

The reasons mentioned above indicate that a two-group regression of the outcome in

the two-part mediator models gives a flexible model. The possibility to constrain slopes of

common variables is preserved, still allowing for different covariates in the regressions. If

all the slopes are constrained to be equal, the regression collapses back into a single linear

regression. Similarly, if it is chosen not to constrain any parameter it will simply be two

separate regressions. Two-group regression of Y will make possible general applications

of the derived causal effects.

2.3 Limited-dependent variable analysis

Limited-dependent variables are referred to as many things depending on the context

e.g. two-part, hurdle, corner solution outcome or censored variable. A limited variable

is a variable that for some reason is censored from above and/or below, having a point

mass at the limit (the case of truncated variables are beyond the scope of this study), see

Figure 1. Sometimes such variables are referred to as suffering from ceiling respectively

floor effects. This is probably due to the fact that in histograms of such variables it

looks like the observations ”hit” the ceiling and/or floor with a lot of observations on one

value and no values above/below. To describe the principle of how to handle limited-

dependent variables it is useful to consider only one type of censoring, even though all

results can be used for both censoring from above and below. For the current study only

censoring from below at zero will be considered to simplify examples and derivations,

11

without loss of generality. There are different methods of handling limited-dependent

variables. Most methods have in common that the variable is split into two parts; one

binary part handling the large number of zeros and one continuous part for the non-zero

part of the variable. Usually this is modelled by one binary regression and one standard

linear regression, using the same covariates in both regressions.

One of the first ways to handle limited-dependent variables was proposed in Tobin

(1958). His method, today known as the Tobit model, is widely applied. One limitation of

the Tobit model is that it only allows equal signs for the corresponding parameters in the

two regressions (Wooldridge, 2002). If the binary part has a substantially different data

generating process than the positive part it is in some cases also reasonable that effects

of certain independent variables has different signs on the two parts of the dependent

variable. Cragg (1971) suggested two extensions which solves the limitation of the Tobit

model, the truncated normal hurdle and the log-normal hurdle. In these models the

regressions of the binary and the continuous part of the limited-dependent variable is

estimated independently. Thus, the coefficients of the independent variables are allowed

to have different signs and sizes on the two different parts of the dependent variable.

Throughout this study these models will be referred to as two-part models. For a

thorough overview and comparison between different ways of handling limited-dependent

variables and how they differ from sample selection problems, see Wooldridge (2002) and

Greene (2012).

2.3.1 The Two-part model

Two-part modelling splits the limited-dependent variable into two parts. One binary

zero/non-zero part and one positive part. The intuition is that first a mechanism decides

if the variable will take a positive value or not, and if that value is non-zero; a second

mechanism decides what positive number it will take. For example ”Will a individual

smoke or not?”, if yes; ”How much will the individual smoke?”. Zero in this setting is

viewed as a category and not the continuous numeric value. That is, a person that smokes

zero cigarettes a day is simply a non-smoker. The zero indicates that the person belongs

to the group non-smokers, rather than the amount. It might seem as an unimportant

distinction, however to understand why it is not is crucial for the motivation of the

two-part model. The two-part model is based on the idea that there might be a more

12

substantial difference between a non-smoker and a smoker, than between a ”one cigarette

a day”-smoker and a ”two cigarettes a day”-smoker. Even though the difference in number

of smoked cigarettes between the ”zero cigarettes a day”-smoker and a ”one cigarette a

day”-smoker is the same as that between the ”one cigarette a day”-smoker and ”two

cigarette a day”-smoker, the two that actually smokes might have more characteristics in

common. It is likely that different mechanisms explain if you choose to smoke or not, and

how much you choose to smoke. This reasoning implies that there are situations where

the dependent variable has a point mass at zero but two-part analysis is not suitable. If

the group in the point mass is not viewed as a group of observations with substantial

different characteristics than the other observations, then the variable is not suitable for

two-part analysis.

In practice the zero/non-zero part will be estimated with a binary regression and

the continuous part with standard linear regression. The probit and logit model are

naturally considered for the binary part. Given the small difference in estimation result

between the two (Gill, 2000), probit is chosen to make the derivations in Appendix A

and B simpler. Even though in theory, the two-part model collapses back to the standard

regression for small amounts of censoring, there has to be a certain amount of censoring

for the estimation procedure to work well. The binary regression will behave badly if a too

small amount of the observations belongs to one group. Hence, the estimated coefficients,

and therefore casual effects, from a two-part model will never coincide exactly with the

classical estimates. The probit estimation will break down more severely the closer the

censoring gets to zero. This estimation limitation is discussed in detail in Section 5.1.2.

For the continuous part of the two-part variable a distributional assumption has to be

made. This is crucial for the derivations of the effects. The density function of the

continuous part of the two-part variables has an important role in the derivations. The

most common assumption is that the continuous part of the two-part dependent variable

is normal or lognormal distributed. The experience of the author is that this is often a

somewhat strong assumption not likely to be fulfilled. The sensitivity of this assumption

has, to the best knowledge of the author, not been investigated in detailed. Sensitivity

is discussed further in Section 5.4.

13

2.4 Mediation analysis with limited-dependent variable(s)

The mediation analysis is special regarding independent/dependent relations. As can be

seen in Figure 2, even the simple mediation model implies two dependent variables. M

is dependent of X, but Y is also dependent on X and M. This means that important

considerations normally investigated for the dependent variable should be investigated

for (at least) two variables, in a mediation setting. The focus of this study is to establish

the importance of accounting for limited-dependent variables, in mediation analysis. In

order to cover all cases for limited-dependent variables in a simple mediation setting,

three cases need to be considered. In Case 1 the outcome Y is limited, in Case 2 the

mediator M is limited and in Case 3 both Y and M are limited. Case 1 is the most

obvious since the outcome Y is what would be viewed as the (only) dependent variable

in most regression settings. There are many ways suggested in literature to handle Case

1 in regression analysis (Cragg, 1971; Duan et al., 1983). Limited-dependent variables

in mediation analysis is recently handled in Muthen et al. (2016), where causal effects

for the two-part approach for mediation with limited outcome are derived. A related

approach is given in Wang and Albert (2012) where causal effects in mediation with

limited counts is handled. The second and third case is, to the best knowledge of the

author, not investigated. The second and third case is covered in detail under Section

3 and 4. First the counterfactual framework, used to define the causal effects of these

models, is presented.

2.5 The counterfactual framework

In order to understand the counterfactual framework it is helpful to use an simple exam-

ple, where the exposure variable is a dichotomous treatment variable. An individual can

be given the treatment, or not given the treatment at the arbitrary time point t. The

outcome, say health on a continuous scale, is measured after the exposure at time point

t+1. The desired effect to measure is the difference between the individuals health at time

point t+1 if given the treatment, and the individuals health at time point t+1 if not

be given the treatment. This is of course impossible to retrieve since on person can at

time t only receive one treatment, and can thus at time t+1, only have received either the

treatment or not. This is the effect of interest since this effect is the true treatment effect

14

i.e. true in the sense that the healing effect of time would not distort the measure of the

treatment effect. Unfortunately, this cannot be resolved by giving both treatments after

each other to one individual, due to to carry-over effects, the time points would also not

be the same. Rubin (1974) started with a similar setup as above and suggested that since

only one outcome can be observed for each individual, the unobserved outcome could be

called the potential outcome. That is, the health that an individual would potentially

have had at time t+1 if given the other treatment. Rubin suggested that focus should be

shifted from the, by logic impossible to retrieve, individual effect, to instead look at the

effects on group level. The expected value for an individual conditioned on being given

treatment or not, could then be calculated. The difference between the expected value of

health given treatment and the expected value of health not given treatment, can then be

used as an estimate of the desired treatment effect. This was soon adopted and refined

by a large number of researchers in different fields (Imbens and Angrist, 1994; Pearl,

1995, 2001; Robins and Greenland, 1992; Spirtes et al., 1993) see Wooldridge (2002) and

VanderWeele (2015) for recent overviews. Attempting to generalize the causal effect def-

initions in the mediation field, Robins and Greenland (1992) and Pearl (2001) suggested

counterfactual-based effect definitions as a complement to the traditionally used product

method (Baron and Kenny, 1986).

2.5.1 Effect notation and calculations for mediation

To present the effect definition from Robins and Greenland (1992) and Pearl (2001) some

convenient notation is first defined. Consider the mediation model in Figure 2, and for

simplicity let X be dichotomous. Let Y0 be the outcome of an individual who was exposed

to X=0 and Y1 the outcome of X=1 respectively. Additionally let Y1m be the outcome

of an individual that was exposed to X=1 and where M was set to the value m. Now let

M(0) be M conditioned on X=x0. This means that Y1M(0) or short Y1,0 is the outcome of

a individual with X=1, however with M set to whatever it would have been conditioned

on X=0. This can be generalized to non-dichotomous X, where the last expression would

be Yx1,M(x0) for arbitrary chosen points x1 and x0. The Controlled Direct Effect (CDE),

for when X changes from x0 to x1, is defined CDE(m)=Yx1m − Yx0m. However, since

two different values can never be observed for one individual,the average effect is being

15

considered for all effects presented below. The Average of CDE is defined as

CDE(m) = E[Yx1m − Yx0m] (4)

where Yx1m is Y conditioned on X=x1 and M=m, Yx0m is Y conditioned on X=x0 and

M=m. CDE can be interpreted as the effect X has on Y when it changes from x0 to

x1 when M is fixed to m. If returning to the example of lung cancer in 2.1.1 from

VanderWeele (2015), then CDE(10)= Y1,10 − Y2,10, corresponds to the effect of ”moving”

from not having, to having the genetic variant of chromosome (gene), if smoking 10

cigarettes per day. Even though this kind of effect is often interesting, the fixed m=10

does not correspond to a natural situation. A more natural situation might be considered

if instead of fixing M letting it take the value it would have taken conditioned on the x0

considered. The Pure Natural Direct Effect (PNDE) is defined as

PNDE(m) = E[Yx1M(x0) − Yx0M(x0)] (5)

and can be interpreted as effect X has on Y when it changes from x0 to x1 when M

takes the value it would take on average for X = x0. Applying this to the lung cancer

example will give the effect of ”moving” from not having, to having the genetic variant,

given smoking the amount of cigarettes the average individual does in the absence of the

gene. This effect is natural in the sense that M takes a value it would naturally do on

average for one of the values of X considered. A corresponding Total Natural Indirect

Effect (TNIE), with the average TNIE being defined as

TNIE(m) = E[Yx1M(x1) − Yx1,M(x0)] (6)

and can be interpreted as the effect X have trough M on Y when X changes from x0 to

x1. In the lung cancer example this would correspond to the effect of ”moving” from not

having to having the gene has on the risk of lung cancer only by affecting the number

of cigarettes smoked per day. In addition to the Pure Natural effects there is also Total

Natural effects; the Total Natural Direct Effect and the Total Natural Indirect Effect. The

16

difference is on which value of X, Y or M is conditioned.

TNDE(m) = E[Yx1M(x1) − Yx0,M(x1)] (7)

PNIE(m) = E[Yx0M(x1) − Yx0,M(x0)] (8)

In linear mediation models there is no difference between the Pure Natural and the Total

Natural effects, however for non-linear models the difference can be substantial. The

counterfactual based causal effects of course also covers the usual treatment effect. That

is, the difference on the outcome if given the treatment or not, or in our continuous

exposure case: The difference on the outcome if exposed to x0 or x1. This is the total

effect of the exposure on the outcome, the sum of all indirect and direct effects of X on

Y. The Total effect is defined

TE(m) = E[Yx1M(x1) − Yx0,M(x0)] (9)

One important aspect of these counterfactual based effects is the fact that they always

fulfil the relation TE = TNIE+PNDE. This property is obvious from the definition and

is an important key to why these effects does not rely on any specific functional form.

The product method effects only fulfil this relation for linear models.

2.5.2 Assumptions for causal effect of mediation models

The review of the assumptions is based on that off VanderWeele (2015), which offers a

thorough overview. There are four assumptions for establishing causal interpretations of

all the effects.

• Assumption 1 - No unmeasured confounding of the exposure-outcome relationship

• Assumption 2 - No unmeasured confounding of the mediator-outcome relationship

• Assumption 3 - No unmeasured confounding of the exposure-mediator relationship

• Assumption 4 - No mediator-outcome confounder that is dependent on the exposure

The first two assumptions implies that the covariates included in the model have to be

sufficient to control for the confounding relations between the exposure and the outcome,

17

and between the mediator and the outcome. Assumption 1 can be fulfilled by random-

ization in assignment of exposure however this is not always the case for assumption 2.

Assumption 1 and 2 to are necessary and sufficient for controlled causal effects. Assump-

tion 1 and 2 are also necessary for the natural effects, however two additional assumptions

are needed to ensure the causal interpretations of the natural effects. The third assump-

tion means that the variables influencing the level of both the exposure and the mediator

must be controlled. The final fourth assumption is often viewed as a strong assumption

since it means that all confounders of the mediator and the outcome must be indepen-

dent of the exposure. It is important to recognize that randomization does not make

all assumptions in mediation fulfilled. This was emphasised by Judd and Kenny (1981)

and James and Brett (1984), but not in Baron and Kenny (1986), thus being a notion

less widely spread. This also implies that data collection and caution about controlling

for confounders is particularly important for causal mediation models to be reliable. If

all four assumptions are fulfilled, the effects defined above are said to have causal in-

terpretations. However, the causality relies on some additional implicit assumptions of

temporal ordering. The temporal ordering maybe implied by the word ”causal”, but is

worth pointing out as mediation analysis is often preformed on cross sectional data. Even

though causal interpretations in some cases can be made with cross sectional data this is

heavily relying on assumptions. With that said, Hayes (2013, pp. 89) makes a statement

regarding assumptions interpretations and analysis in general, and even though it is not

covering counterfactuals, is is still worth quoting: ”Sometimes theory and solid arguments

is the only foundation upon which a causal claim can be built given limitations of our data.

But I see no problem conduction the kind of analysis I describe in the following chapters

even when causal claims rest on shaky grounds. It is our brains that interpret the place

and meaning on the mathematical procedures used, not the procedures themselves.”. The

importance of assumptions is, according to Hayes, not necessarily to always fulfil them

but to understand and acknowledge the limitations they impose on the interpretations.

Assumption 1-4 are not testable, and of course an analyst can never know if all

relevant confounding relations are captured. In order to make these effects useful without

too much doubt, sensitivity analysis is suggested by many (Imai et al., 2010; Pearl, 2001;

VanderWeele, 2015). The idea is to explicitly display ”how much” the result of a mediation

analysis relies on the assumption. This is asserted e.g. by showing how large the effect

18

an unmeasured confounder on the exposure and the outcome must be, to fully explain

the effect of the exposure on the outcome. This can be done by trying different effect

sizes of the unmeasured confounder on the exposure and the mediator. Although these

values are arbitrary in some sense, they give a good indication of how strong the estimated

effects are relative to the assumption. If it only takes a weak confounder to account for the

indirect effects then the assertion of the assumptions is crucial for reliable interpretations.

On the other hand, if it takes a huge, non-plausible, effect of the confounder on the

exposure and the mediator to account for the indirect effect then the interpretations

may be more reliable. This kind of sensitivity analysis procedures are available for all

four assumptions (VanderWeele, 2015). In his book from 2015, VanderWeele strongly

promotes that sensitivity analysis should always be presented together with mediation

analysis and causal effect interpretations. It would arguably create a good standard for

reporting mediation results as well as making mediation analysis less prone to accusations

of relying on unreasonable assumptions. For a recent intuitive, less technical, review of

mediation with causal effects and assumptions see also Keele (2015).

3 Mediation, two-part M

3.1 Model formulation

If the mediator M is limited, the regression of M on the exposure X is affected. If this

would have been a single regression outside the mediation, all the limitations discussed

in Section 2.3, would apply. The question is whether the gains found accounting for cen-

soring in regression transfers into the case of a mediation model with a limited mediator.

In Figure 4 a mediation model similar to that in Figure 3 is expanded to a two-part M

model, to account for a limited mediator. The mediator M is separated into a binary

zero/non-zero part measured with a dummy (M*), and one non-zero continuous part (M).

That is, if M* = 1 then the observation is not censored and also has a continuous value,

if M*=0 the observation is censored and has no continuous value. The two-part model

of M on X and C is constructed from one probit model of M*, modelling the probability

of being in the censoring point zero or not (recall that a floor effect at zero is assumed

without loss of generality). M* is assumed to be generated from a dichotomized normal

distribution. Additionally one linear model accounts for the continuous part of the vari-

19

able. Both the binary and the continuous variable for M are then brought into the linear

regression model of Y. Moreover, a two-group regression model of Y is used, one for the

censored group and one for the uncensored group. In the group where M=0, M is not

a covariate since M is fixed. Since distributional assumptions have to be made (usually

normal) for the continuous part of M, the logarithm transformation is often suggested to

better meet this assumption. Many times the two-part variable has a long right tail and

have better resemblance with a normal curve after taking logarithms i.e. the continuous

part of M is assumed lognormally distributed. The lognormal assumption case will be

the focus of this study. The main reason for this is to make comparisons with earlier two-

part model easier. However, the derivations of the causal effects apply also for normal

distributed M (see details in Appendix A). The implied model formulation is displayed

in Equation 10.

Yi|Mi>0 = β(1)0 + β1 log(Mi) + β

(1)2 Xi + β3 log(Mi)Xi + β

(1)4 Ci + εyi (10a)

Yi|Mi=0 = β(2)0 + β

(2)2 Xi + β

(2)4 Ci + εyi (10b)

log(Mi|Mi>0) = γ0 + γ1Xi + γ2Ci + εmi (10c)

probit(Pr(Mi > 0)) = κ0 + κ1Xi + κ2Ci (10d)

, where by assumption εyi ∼ N(0, σ2y) and εmi ∼ N(0, σ2

m). Equation 10a and 10b are

the two-group regression of Y. The two-group regression is motivated in detail in Section

2.2. One benefit from using the two-group model is that it allows M and M* to have

different intercepts and different slopes for X and C. The interaction between X and the

positive part of M is quantified by β3. The interaction between the zero/non-zero part

of the mediator, M*, is somewhat less obvious. If β12 and β2

2 are allowed to be different

in the two-group regression, their difference is a measure of the interaction effect of the

binary M*. In the effect derivations β12 and β2

2 will be unconstrained until the final

simplifications, so that the full interaction model with interactions for M and M* can be

obtained. However, the focus of the final effect calculations is where only X is allowed to

interact with the effect of M on Y through β3, thus β12set= β2

2 . The difference in β10 and β2

0

will capture the mean difference in Y for the two parts of M. This is discussed in detail in

Section 2.2. Equation 10b and 10d corresponds to the two-part regression of M on X and

20

C. Both the exposure and the covariate are allowed to have different effects on M and

M*. The probit model will create a non-linear mediation model. The functional form

is of importance since the main objective in mediation analysis is often to estimate the

direct and indirect effects of the model. As was shown by Robins and Greenland (1992)

and Pearl (2001) the classical product method (Baron and Kenny, 1986) for calculating

effects from the mediation analysis does not apply for non-linear models. Instead the

counterfactual framework will be used to correctly define these effects.

Figure 4 shows the path diagram implied by Equation 10. The exposure variable X

affects the mediator, where the mediator is two-part and therefore divided into the binary

zero/non-zero part (M*) and the non-zero continuous part (M). X is allowed to moderate

the effect of M on Y. The covariate C affects M, M* and Y.

M∗

M

X Y|M>0

Y|M=0

C

εM

εY

εYTwo-part

Two-group

Figure 4: Path diagram for a mediation model with a two-part mediator and two-groupoutcome, with interaction between the exposure and the mediator. M* is the observedbinary variable coding for a observation being censored or not censored.

3.2 Estimation

Maximum likelihood (ML) estimation will be used for all models. The two-part media-

tor model above has a more complicated likelihood function since the mediator M is a

21

combination of a binary variable and a continuous variable.

For a sample i = 1, ..., N

L(Mi|xi, ci) =n∏i=1

Pr(M > 0|xi, ci)× f(Mi|Mi > 0, xi, ci)×

N∏i=n+1

(1− Pr(M > 0|xi, ci)) (11a)

L(Yi,Mi|xi, ci) =n∏i=1

Pr(M > 0|xi, ci)× f(Mi|Mi > 0, xi, ci)f(Yi|Mi > 0, xi, ci)×

N∏i=n+1

(1− Pr(M > 0|xi, ci))f(Yi|Mi = 0, xi, ci) (11b)

In Duan et al. (1983) the likelihood of a two-part dependent variable is shown to be Equa-

tion 11a, which implies the full likelihood of Equation 11b. The expressions f(Mi|...) and

f(Mi|...) are the conditional densities of M and Y. Note that in the two-group modelling

of Y the conditional density of Y is not restricted to be the same in the first and the

second product in Equation 11b.

3.3 Derivation of effects

In this study the effect of X and C on Y are assumed equal for both groups of M. Thus

β(1)2 = β

(2)2 and β(1)

4 = β(2)4 , indicated by dropped superscript. Results to form the effects

without these restrictions can be found in Appendix A.

3.3.1 Conditional expected value of Y

One of the conditional expectations used to define the causal effects are shown in Equation

12. For a detailed explanation see Section 2.5.

E[Y (x1, log(M(x0)))] =

= β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c) + (β2x1 + β4c)× (1− Φ(κ0 + κ1x0 + κ2c)+

Φ(κ0 + κ1x0 + κ2c)) + Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c) =

= β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c) + (β2x1 + β4c)+

Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c)(12)

22

3.3.2 Causal effects

The complete derivation is displayed in Appendix A. The simplified effects are given inEquation 13-17.The Total Natural Indirect Effect

TNIE = E[Y (x1, log(M(x1))|C = c]− E[Y (x1, log(M(x0))|C = c] =

= (β(1)0 − β(2)

0 )(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+

(β1 + β3x1)(Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)− Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)

)(13)

The Pure Natural Direct Effect

PNDE = E[Y (x1, log(M(x0))|C = c]− E[Y (x0, log(M(x0))|C = c] =

= β2 × (x1 − x0) + Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)× β3 × (x1 − x0)(14)

The Pure Natural Indirect Effect

PNIE = E[Y (x0, log(M(x1))|C = c]− E[Y (x0, log(M(x0))|C = c]

= (β(1)0 − β(2)

0 )×(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+

(β1 + β3x0)×(Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)− Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)

)(15)

The Total Natural Direct Effect

TNDE = E[Y (x1, log(M(x1))|C = c]− E[Y (x0, log(M(x1))|C = c] =

= β2 × (x1 − x0) + Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)× β3 × (x1 − x0)(16)

The Total effect

TE = E[Y (x1, log(M(x1))|C = c]− E[Y (x0, log(M(x0))|C = c] =

= (β(1)0 − β(2)

0 )×(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+ β2 × (x1 − x0)+

+ Φ(κ0 + κ1x1 + κ2c)× (β1 + β3x1)× (γ0 + γ1x1 + γ2c)−

Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x0)× (γ0 + γ1x0 + γ2c)

(17)

23

4 Mediation, two-part M and two-part Y

In this section the mediation model where both the mediator and the outcome are limited

is considered.

4.1 Model formulation

If the mediator M and the outcome Y are limited, both the regression of M on X and the

regression of Y on M and X are affected. Again the two-part model will be used to account

for the censoring in both M and Y. The two-group regression setup for Y implies that two

more regressions will be added due to the combination with the two-part model. Both

the continuous part of M and the continuous part of Y rely on distributional assumptions.

In this study both dependent variables are assumed to follow the lognormal distribution.

The model formulation of a two-part M, two-part Y mediation model is displayed in

Equation 18.

log(Yi|M>0) = β(1)0 + β1 log(Mi) + β


(1)4 Ci + εyi (18a)

probit(Pr(Yi|M>0 > 0)) = θ(1)0 + θ1 log(Mi) + θ

(1)2 Xi + θ3 log(Mi)Xi + θ

(1)4 Ci (18b)

log(Yi|M=0) = β(2)0 + β

(2)2 Xi + β

(2)4 Ci + εyi (18c)

probit(Pr(YiM=0 > 0)) = θ(2)0 + θ

(2)2 Xi + θ

(2)4 Ci (18d)

log(Mi|M > 0) = γ0 + γ1Xi + γ2Ci + εmi (18e)

probit(Pr(Mi > 0)) = κ0 + κ1Xi + κ2Ci (18f)

, where by assumption εyi ∼ N(0, σ2y) and εmi ∼ N(0, σ2

m). This is a non-linear model

and the counterfactual framework will be used to define the effects. The path diagram of

Equation 18 is shown in Figure 5. The mediator M and the outcome Y are both separated

into one zero/non-zero part (M* and Y*), and one non-zero corresponding continuous

24

part (M and Y). If M∗=1 then the observation is not censored and the observation has a

corresponding continuous value M. If M∗=0 then the observation is censored and has no

continuous value. The same goes for Y∗ and Y. M* and Y* are assumed to be generated

from dichotomized normal distributions. The exposure X affects M, M∗, Y and Y∗. M

affects only the two-part Y belonging to the group M>0. Moreover, X are allowed to

moderate the effect of M on Y. Additionally, a covariate measured at the same time point

as X is allowed to affect both parts of the mediator and the outcome. The full model

implied by Figure 5 is kept throughout the derivations, however in the last step some

parameters will be restricted to limit the scope of the Monte Carlo simulation study in

Section 5. Expressions for calculating unrestricted effects are available in Appendix B.

This model allows for moderation between the zero/non-zero part of the mediator M*,

and the effect of X on Y, since the slopes of the two group analysis of Y is not restricted.

That is, the difference in between the slope of Y on X in the two groups is a measure

of the interaction effect. The detailed motivation of this model formulation is discussed

in Section 2. As in the two-part M model in Equation 10, the zero/non-zero parts are

modelled with probit regression, and all the linear parts are modelled with standard linear

regressions.

4.2 Estimation

The likelihood function in Equation 11 is extended in Equation 19 to account for two-

part modelling of both M and Y. The four combinations of zero/non-zero M and Y is

represented by one product each in Equation 19. The expressions f(Mi|...) and f(Mi|...)

25

M∗

M

X Y ∗|M>0

Y|M>0

Y|M=0

Y ∗|M=0

C

εM

εY

εY

Two-part

Two-part, M > 0

Two-part, M = 0

Two-group

Figure 5: Path diagram for a mediation model with a two-part mediator M and two-partoutcome Y, combined with a two-group model of Y. M* and Y* are the binary observedvariable coding for a observation being censored or not censored.

are the conditional densities of M and Y.

Let n be a random sample such that n = ng1 + ng2 + ng3 + ng4 and i = 1, ..., n

L(Yi,Mi|xi, ci) =∏i∈g1

Pr(Mi > 0|xi, ci)Pr(Yi > 0|Mi > 0, xi, ci)f(Mi|Mi > 0, xi, ci)f(Yi|Yi > 0,Mi > 0, xi, ci)×

∏i∈g2

(1− Pr(M > 0|xi, ci))Pr(Yi > 0|Mi = 0, xi, ci)f(Yi|Yi > 0,Mi = 0, xi, ci)×

∏i∈g3

Pr(Mi > 0|xi, ci)(1− Pr(Yi > 0|Mi > 0, xi, ci))f(Mi|Mi > 0, xi, ci)×

∏i∈g4

(1− Pr(Mi > 0|xi, ci))(1− Pr(Yi > 0|Mi > 0, xi, ci))

(19)

26

4.3 Derivation of causal effects

In this study the effect of X and C on Y are assumed equal for both groups of M. Thus

β(1)2 = β

(2)2 , β(1)

4 = β(2)4 , θ(1)

2 = θ(2)2 and θ

(1)4 = θ

(2)4 indicated by dropped superscript. Re-

sults to form the unrestricted effects can be found in the detailed derivation in Appendix

B.

4.3.1 Conditional expected values

One of the conditional expectations used to define the causal effects are shown in Equation

20. For detailed explanation of these conditional expected values see Section 2.5.

E[Y (x0, log(M(x0)))] =

φ× exp(β2x0 + β4c

)×

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(

1− Φ (κ0 + κ1x0 + κ2c))

+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

(20)

4.3.2 Causal effects

The complete derivation is displayed in Appendix B. The simplified effects given the re-strictions mentioned above are given in Equation 21 - 25. Note that b and µ in Equation20 are substituted (see details in Appendix B). Some further simplifications are possible,however without gaining simplicity and with loss of intuition.

27

The Total Natural Indirect Effect


= φ× exp(β2x1 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)

(21)



=φ× exp(β2x1 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

((

1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(22)

28




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

((

1 + (β1 + β3x0)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1x0 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(23)




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(24)

29

The Total effect



)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x0 + γ2c)2

2σ2M

(1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1x0 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(25)

30

5 Monte Carlo simulations

In this section a Monte Carlo study is performed to investigate the sample performance

of the two-part causal effect estimates. The study contains different sample sizes, and

percentage of limited observations. All simulation results are from simulations with 1000

iterations. The sample sizes covered are 100, 150, 200, 250, 300, 400, 500, 1000. The

simulations are performed on two synthetic models created from criteria on the effects

size, R-square and the log-odds of the models.

5.1 Synthetic models and true data generating processes

The models are completely synthetic and are arbitrarily chosen only to have certain R-

square and log-odds and standardized effect sizes. All models have some constrained

parameters, e.g. the interaction is set to zero. These constraints are done only to limit

the scope of the Monte Carlo simulation. The effects for the full models can be extracted

from the derivations in Appendix A and B.

5.1.1 Weak and Moderately strong effects model

To make the simulation study resemble empirical situations the characteristics of the

models used as data generating processes are discussed and specified in detail below.

There are several important aspects to consider; the size of the (true) effects, the amount

of explained variance, the mean difference (in Y) between the zero/non-zero groups of

the two-part variables and the amount of censoring. Censoring amounts are covered in

Section 5.1.2. The effect size is relevant to consider to establish for how strong effects

the difference between two-part effects and classical effects is relevant. To formalize the

Table 1: True TNIE effect sizes for different censoring amounts.

Effect Model Censor amount Size (sd of Y)TNIE Weak 10% 0.23TNIE Weak 25% 0.29TNIE Weak 50% 0.33TNIE Mod. Strong 10% 0.33TNIE Mod. Strong 25% 0.51TNIE Mod. Strong 50% 0.73

effect sizes, the framework suggested by Cohen (1992) is used. A weak effect is around

31

0.3 standard deviations (sd) change in Y for a one sd increase in X. A moderately

strong effect is around 0.5 sd change in Y for a one sd increase in X. Note that the

effect definitions are used to create the 25% censoring. When the censoring is changed

to 10% and 50% respectively, the effect sizes change since the effects are functions of the

censoring amount. To achieve the same effect size for different censoring amounts several

parameters would have to be changed. However if several parameters are altered it does

no longer give a isolated comparison between the censoring amounts. To avoid confusion

only the censoring amount within each effect size model is varied. Figure 1 shows the true

effect sizes within the weak and moderately strong effects models with the variations of

censoring amounts. As can be seen, the weak model’s effect size is close to accurate for all

censoring amounts, the moderately strong effects model does however get a substantial

increase in true effect size (0.73 instead of 0.5) of TNIE for 50% censoring.

The amount of explained variance, formalized by R2, is not covered in Cohen (1992).

The weak models are set to have R2 between 30 and 50% and the moderately strong

models between 40-65%. The size of the difference between the zero/non-zero groups is

formalized by log-odds, i.e the log-odds from the logit model of the binary variable(s)

regressed on the covariates. Note that the logit model is not used in estimation, but only

as a tool to evaluate and quantify the difference between the two groups. The weak

models have log-odds for all independent variables around 2. The independent variables

of the moderately strong models have log-odds around 4. The choice of R2 and

log-odds are chosen to mimic empirical data sets as close as possible. Since the effect

size, R2 and log-odds are related there are some variation between the R2 of the models.

5.1.2 Censoring

Three levels of censoring are studied; 10%, 25% and 50%. Two examples of generated

random samples of 10 000 observations of log(M) are illustrated in Figure 6. As can be

seen, the positive part of log(M) is normally distributed in both cases without censoring

to any practical degree, which is desired to fulfil the distributional assumption. It is

worth pointing out that in most empirical examples the point mass and the continuous

part are not separated as clearly as in this histogram. However, to evaluate the behaviour

of the two-part effects under fulfilled distributional assumptions this is necessary. This

strong distributional assumption will probably make the difference between the classical

32

estimates and the two-part estimates larger than in most empirical situations. The sen-

sitivity of the crucial distributional assumptions is discussed further in Section 5.4. The

only difference between the two histograms in Figure 6 is the amount of censoring; 25%

and 10%. In the model where both M and Y are generated as two-part, samples from

log(Y) would look similar but with different mean and scale. Note that theoretically the

(a) 10% censoring. (b) 25% censoring.

Figure 6: Histograms over 10000 observations of log(M), generated with 10% and 25% ofthe observations are censored at zero.

two-part effects are reduced to the classical effects if the amount of censoring gets close

enough to zero. However, in practise the probit estimation breaks down if there are few

observations in one group of the dependent variable. The number of observations in each

group is dependent on the sample size. Censoring of less than 10% would mean less than

ten observation on average in the censored group with n=100. Ten observations in each

category is suggested as a useful approximate ”smallest number rule” for binary regres-

sion, discussed in Agresti (2007). However when n=100 is used in the weak model setting

of this study the estimation of the slope of X in the probit regression of M* becomes

extremely biased. In Table 2 the slope estimates for the weak models probit regression of

M* is displayed. As can be seen the extreme bias of n=100 is reduced substantially when

the sample size is increased to n=150. Keeping the 10% censoring as one of the cases

will give a broad spectrum of situations, from the lower limit of what the ML estimation

procedure of the probit regression can handle, to a large samples situations with many

observations in each group.

33

Table 2: Slope estimates from probit regressions of M* on X for different sample sizesand amount of censoring. The weak model is used in these Monte Carlo simulations

M∗ % of obs. Population Average EmpiricalON n in group Z=0 value estimate s.dX 100 10 0.400 6.4461 190.0507X 150 10 0.400 0.4186 0.1753

5.1.3 Model 1 - Two-part M

The restrictions of parameters in our study (see Section 4.3) makes it possible to use one

model of Y as opposed to the two-group model in the derivations. A dummy variable is

used to estimate the intercept shift. No interaction effects are included in the simulations

to limit the scope. To obtain a normally distributed Mi|M > 0, an auxiliary variable Z∗1is created as a function of X. The allocation to the M=0 group is based on the threshold

on Z∗1 . That is, for all observations where Z∗1 < τ the Mi|M > 0 is set to missing and

a dummy variable M* is set to zero. If Z∗1 ≥ τ , M* is set to one and the Mi|M∗i=1 is

set to M. In practise the dummy M* is created in the DEFINE command in Mplus, see

Section C for details. The probability of being in the M*=1 group given a value on X

can be obtained from the estimates of a probit regression. No additional covariate is

brought into the synthetic models for simplicity. Note that the missingness on M and Y

are missing at random (MAR) by construction.

5.1.4 Model 1 - Two-part M, Weak

X ∼ N(5, 1)

Yi = 2 + 0.5M∗ + 0.25 log(M) + 0.25X + ηyi

Mi = exp(4 + 0.5X + ηmi)

Z∗1i = 0.4Xi + ηz∗i

(26)

, where ηyi ∼ N(0, 1), ηmi ∼ N(0, 1) and ηz∗i ∼ N(0, 1). The coefficient of M* is the

difference between the intercept in the M*=1 and M*=0 group, thus the β(2)0 is identified

since β0diff = β(2)0 − β

(1)0 → β

(2)0 = β0diff + β

(1)0 . Z∗1i is the continuous normal variable

that is dichotomized into M* according to the censoring amount.

34

5.1.5 Model 1 - Two-part M, Moderately strong

X ∼ N(5, 1)

Yi = 2 + 0.5M∗ + 0.3 log(M) + 0.3X + ηyi

Mi = exp(4 + 0.7X + ηmi)

Z∗1i = 0.8Xi + ηz∗i

(27)

Again, ηyi ∼ N(0, 1), ηmi ∼ N(0, 1) and ηz∗i ∼ N(0, 1). The coefficient of M* is the

difference between the intercept in the M=1 and M=0 group, thus the β(2)0 is identified,

just as for the weak effects two-part M model. Z∗1i is the continuous normal variable that

is dichotomized into M∗ according to the censoring amount.

5.1.6 Model 2 - Two-part M, Two-part Y

Based on the reasoning for the two-part M model (Section 5.1.3) the restrictions allow

us to generate data from a one group model of Y. The intercept shift is captured with

the dummy variable M∗. Since Y is also two-part an additional normal variable Z2* is

generated to be dichotomized into Y* according to the censoring amount. More details

about how the variables are generated with the help of Mplus can be found in Section C.

5.1.7 Model 2 - Two-part M, Two-part Y, Weak

X ∼ N(5, 1)

Yi = exp(2 + 0.55M∗ + 0.25 log(M) + 0.25X + ηyi)

Z∗2i = 0.3M∗ + 0.35 log(M) + 0.32X + ηz∗2i

Mi = exp(4 + 0.5X + ηmi)

Z∗1i = 0.4Xi + ηz∗i

(28)

5.1.8 Model 2 - Two-part M, Two-part Y, Moderately strong

X ∼ N(5, 1)

Yi = exp(2 + 0.55M∗ + 0.3 log(M) + 0.2X + ηyi)

Z∗2i = 0.28M∗ + 0.84 log(M) + 0.57X + ηz∗2i

Mi = exp(4 + 0.7X + ηmi)

Z∗1i = 0.8Xi + ηz∗i

(29)

35

5.2 Estimation

The simulations and estimations of the models considered in this Monte Carlo are per-

formed with ML estimation in Mplus version 7.4 (Muthen and Muthen). Note that there

are observations missing at random (MAR) since the Monte Carlo models are run with

one model for Y with a dummy for group belonging. The continuous part of log(M) is

only observed when M∗ = 1. The missingness does not affect the estimation except that

numerical procedures that can handle this has to be chosen in Mplus. For details about

the simulation study setup in Mplus see Appendix C.

5.3 Results

To limit the scope of this study the Monte Carlo results of the two-part mediator will be

given special attention. The two-part mediator effect estimates will be compared to the

corresponding estimates without accounting for the censoring, referred to as the classical

estimates. The classical estimates coincide with the product method estimates in

the two-part M setting, since the restricted models considered in this simulation study

is linear if the the two-part structure of M is disregarded. The Monte Carlo results of

the two-part M, two-part Y estimates will be reduced, showing only the small sample

behaviour without comparison with the classical effect estimates.

5.3.1 Outcome variables

The definitions of the outcome variables presented in the Monte Carlo result (Section

5.3.2) are shown in Equation 30. The bias will be negative if the average estimate

is below the true effect. The proportion of the empirical standard deviation and the

average standard error (SE) estimate is larger than 1 if the average SE is smaller than

36

the empirical standard deviation.

Bias = Average estimate - True effect (30a)

Relative Bias (%) = 100× Average estimate - True effectTrue effect (30b)

95% CI coverage = Number of 95% CIs covering true effectk

(30c)

Percentage significant coefficients = Number of 95% CIs not covering zerok

(30d)

Relative standard error = Empirical standard deviationAverage standard error estimate (30e)

, where i=1,...,k. k is number of iterations. Usually a bias larger than 10% of the true

effect be regarded as a substantial bias. The percentage of significant effect estimates is

a variable related to the power of the estimator. However, power interpretations are only

meaningful if the bias is small enough. If close to unbiased, a significance rate of above

0.8 is often regarded as strong power for an estimator.

5.3.2 Two-part M

Table 3 to Table 8 show the detailed results from the Monte Carlo simulation for the

weak effects model. The outcome variables given in the tables are defined in Equation

30.

Figure 7 shows the biases of the average effect estimates for the classical and the

two-part M models. Both the indirect effect TNIE bias and the direct effect PNDE

bias for the weak effects model are displayed. The two-part model’s average estimates

of TNIE and PNDE are close to unbiased for all sample sizes, and converge to zero

with increasing sample size. The classical estimates are consistently underestimating

the TNIE and overestimating the PNDE for all sample sizes, with larger bias for larger

amount of observations censored. The overestimation of the PNDE is smaller than the

underestimation of the TNIE for all three censoring amounts. The pattern is similar for

37

the moderately strong effects model, however with accordingly larger biases. Note that

the variance of Y is not 1. The size of the bias for TNIE estimated with classical effects

in terms of standard deviations (sd) of Y is ranging from around -0.1 for 10% censoring

to around -0.17 for 50% in the weak effects model. In the moderately strong effects

model corresponding standardized effects are between around -0.14 for 10% censoring to

around -0.55 for 50% censoring for the. In percentage the bias of the classical TNIE

estimates range from -40% to -59% for the weak effects model and from -41% to 74%

for the moderately strong effects model. These large biases for all classical estimates

are likely due to the distributional assumptions discussed in Section 5.1.2, investigated

further in Section 5.4. In short the clear separation of the point mass and the continuous

part of the two-part variable(s) makes the classical approach especially biased.

Figure 8 gives special attention to the bias of the two-part estimates of TNIE. The

bias, as shown above, is small for all censoring amounts and converges to zero as sample

size increase. There is a drastic change in pattern of the bias between the 10% and 25%

censored in contrast to the 50% censored. The pattern of the the bias, first decreasing

with increased censoring amount, then increasing, is explained by the lower graphs of

Figure 8. The bias of TNIE seem to be close to a linear function of censoring amount

with zero bias at around 35% censoring. This unexpected pattern is due to the behaviour

displayed in the probit slope bias graph in Figure 8. Since the sample size is small (n=100)

the probit regression is performing poorly due to few observations in one specific group.

When the sample size of the smaller group increases the bias becomes smaller. However,

when the zero-group becomes to large (large censoring amount) the linear regression has

a smaller sample to fit and becomes less accurate. For optimal estimation behaviour of

small sample sizes the following is needed; large enough censoring amount to give a good

probit regression fit , but small enough censoring amount to get a good linear regression

fit. The censoring amount for small sample sizes is a trade off between a good probit

fit and a good linear regression fit. The censoring amount becomes less important when

the sample size increases since both regressions gets good fit even for small censoring

amounts.

Figure 9 shows the 95% confidence interval coverage. The percentage of confidence

intervals that cover the true effect are stable around 95% for the two-part estimates and

decreasing with sample size for the classical estimates. The coverage for the classical

38

Figure 7: Bias of the TNIE and PNDE estimates for the weak effects model, with cen-soring amounts 10%, 25% and 50%.

estimates is in line with the bias plot in Figure 7. The classical models point estimates

are equally biased for all sample sizes and the standard error; therefore the width of

the confidence interval, decreases with increasing sample size. The coverage rate for the

confidence intervals of the classical estimates of PNDE goes to zero slower than for the

TNIE, in line with the smaller bias of PNDE.

Figure 10 shows the ratio between the empirical standard deviation and the average

standard errors. It seems to be a small underestimation of the standard error for the

PNDE, for both the classical and the two-part estimates. The SE of estimates for two-

part TNIE is overestimated for small censoring amounts, whereas the SE for estimates of

39

Figure 8: Bias in percentage of true effect for two-part estimates of TNIE for censoredamounts of 10%, 25% and 50%. Bias of TNIE for sample size 100 as a function ofcensoring amount. Bias of the slope of the probit regression as a function of censoringamount.

the classical is underestimated. For larger censoring amounts the ratio is close to one for

both the classical and the two-part estimates. Overall the average SE seems to estimate

the sample variation well.

Figure 11 shows the rate of classical and two-part TNIE estimates significantly differ-

ent from zero. The classical estimates have higher or equally high significance rate as the

two-part estimates for all settings included in this study. That might seem contradictory

since the TNIE is shown to be underestimated in Figure 7. However, in Figure 12 it can

be seen that the two-part estimates are about 2.5 times as large average standard error

estimates as the classical estimates, which is in line with the empirical counterpart dis-

played in the same plot. The results from the Monte Carlo simulations of the moderately

strong effects model are very similar to the weak effects model for all outcome variables,

however with corresponding larger differences between the classical and the two-part es-

40

Figure 9: Percentage of 95% confidence intervals that covered the true effects value forTNIE and PNDE, with censoring amounts 10%, 25% and 50%.

timates performance. Because of the similar behaviour, the results of the moderately

strong effects model are not displayed.

41

Figure 10: The empirical standard deviation divided by the average SE estimate for TNIEand PNDE, with censoring amounts 10%, 25% and 50%.

Table 3: Monte Carlo results for two-part estimated TNIE with 10% censoring.

Population Average Empir. st. SE 95% CI % sig. Empir. sd/n effect estimate Bias deviation Average coverage coeff. SE average

100 0.172 0.165 -0.007 0.055 0.056 0.934 0.916 0.977150 0.172 0.168 -0.004 0.045 0.046 0.935 0.995 0.974200 0.172 0.169 -0.003 0.039 0.040 0.944 1.000 0.980250 0.172 0.171 -0.001 0.035 0.036 0.946 1.000 0.997300 0.172 0.171 -0.001 0.033 0.032 0.929 1.000 1.025400 0.172 0.171 -0.001 0.028 0.028 0.944 1.000 0.982500 0.172 0.171 -0.001 0.025 0.025 0.948 1.000 1.004

1000 0.172 0.171 -0.001 0.018 0.018 0.936 1.000 1.023

42

Figure 11: Percent of estimates significantly different from 0 with α = 0.05, for the TNIEestimate of the weak effects model.

Figure 12: Ratios of empirical standard deviation and average SE for two-part estimatesand classical estimates. The results come from the weak effects model with 25% censoring.


Population Average Empir.st. SE 95% CI % sig. Empir. sd/n effect estimate Bias deviation Average coverage coeff. SE average

100 0.226 0.222 -0.004 0.085 0.086 0.938 0.809 0.987150 0.226 0.223 -0.003 0.069 0.070 0.933 0.955 0.993200 0.226 0.224 -0.002 0.060 0.060 0.936 0.991 1.002250 0.226 0.226 -0.000 0.053 0.054 0.947 0.997 0.987300 0.226 0.226 -0.000 0.049 0.049 0.942 1.000 1.008400 0.226 0.224 -0.002 0.042 0.042 0.945 1.000 0.995500 0.226 0.224 -0.002 0.038 0.038 0.948 1.000 1.011

1000 0.226 0.225 -0.001 0.027 0.026 0.947 1.000 1.011

43


Population Average Empir. st. SE 95% CI % sig. Empir sd/n effect estimate Bias deviation Average coverage coeff. SE average

100 0.257 0.262 0.005 0.128 0.127 0.935 0.556 1.009150 0.257 0.260 0.003 0.099 0.101 0.949 0.803 0.977200 0.257 0.262 0.005 0.086 0.087 0.952 0.918 0.990250 0.257 0.260 0.003 0.076 0.077 0.948 0.971 0.982300 0.257 0.260 0.002 0.071 0.070 0.944 0.991 1.011400 0.257 0.257 -0.000 0.059 0.060 0.960 0.998 0.988500 0.257 0.257 -0.000 0.054 0.054 0.948 1.000 1.011

1000 0.257 0.255 -0.002 0.037 0.038 0.958 1.000 0.976

Table 6: Monte Carlo results for two-part estimated PNDE with 10% censoring.


100 0.250 0.251 0.001 0.066 0.063 0.950 0.967 1.043150 0.250 0.251 0.001 0.053 0.052 0.948 0.992 1.021200 0.250 0.251 0.001 0.047 0.045 0.945 0.999 1.040250 0.250 0.251 0.001 0.041 0.040 0.944 1.000 1.027300 0.250 0.250 0.000 0.038 0.037 0.932 1.000 1.041400 0.250 0.250 0.000 0.032 0.032 0.947 1.000 1.016500 0.250 0.250 0.000 0.028 0.028 0.945 1.000 1.004

1000 0.250 0.250 0.000 0.021 0.020 0.946 1.000 1.030



100 0.250 0.251 0.001 0.070 0.066 0.942 0.957 1.053150 0.250 0.251 0.001 0.056 0.054 0.941 0.995 1.031200 0.250 0.251 0.001 0.049 0.047 0.936 0.998 1.045250 0.250 0.250 0.000 0.044 0.042 0.942 1.000 1.048300 0.250 0.250 -0.000 0.040 0.038 0.935 1.000 1.050400 0.250 0.250 -0.000 0.034 0.033 0.950 1.000 1.012500 0.250 0.250 -0.000 0.030 0.030 0.948 1.000 1.017

1000 0.250 0.250 0.000 0.021 0.021 0.945 1.000 1.014

44



100 0.250 0.249 -0.001 0.075 0.073 0.944 0.913 1.034150 0.250 0.249 -0.001 0.060 0.059 0.943 0.978 1.007200 0.250 0.249 -0.001 0.052 0.051 0.947 0.994 1.021250 0.250 0.249 -0.000 0.047 0.046 0.954 0.998 1.022300 0.250 0.249 -0.001 0.043 0.042 0.943 1.000 1.029400 0.250 0.250 -0.000 0.036 0.036 0.947 1.000 1.006500 0.250 0.250 0.000 0.033 0.032 0.946 1.000 1.031

1000 0.250 0.250 0.000 0.023 0.023 0.939 1.000 1.022

45

5.3.3 Two-part M, Two-part Y

Table 9 to Table 14 show the detailed Monte Carlo results of the two-part M, two-part

Y estimated TNIE and PNDE for different sample sizes and censoring amounts. Note

that no classical effects are presented. The focus is on the small sample behaviour of

the two-part M, two-part Y estimated causal effects. Additionally, the reader may notice

that the numerical size of the effects is substantially larger than those of the two-part M

results. This is because the outcome variable Y for these models is assumed lognormal,

as compared to normal in the two-part M case. The standard deviation of Y is around

240 for the two-part M, two-part Y models.

Figure 13 shows the relative bias. For small sample sizes the bias is large, especially for

the TNIE. For 30% censoring and sample size 100 the bias is almost 20% of the true effect.

There is a steep decrease in the bias of all three censoring amounts between sample sizes

100 and 250. There is a unexpected slight bump for samples size 300, increased number

of iterations (4000 instead of 1000) does not change this pattern. For sample sizes larger

than or equal to 500 the bias is less than 5% for all censoring amounts. Note that the

bias is positive implying an overestimation of the effects.

Figure 13: Bias of the TNIE and the PNDE estimates for the weak effects model forcensoring amounts 10%, 25% and 50%. The y-axis is standard deviations of Y.

Figure 14 shows the percentage of 95% confidence intervals that covered the true effect

value. The coverage rate for TNIE is similar for all three censoring amounts, increasing

from around 0.8 for sample size 100, to around 0.9 for sample size 1000. The PNDE

follows a similar pattern with around 0.85 for sample size 100 and reaching the desired

46

coverage rate 0.95 for sample size 1000. The low coverage of TNIE implies that the SE

estimates, on which the confidence intervals are based, are underestimated which is in

line with Figure 15.

Figure 14: Percentage of 95% confidence intervals that covered the true effects value forTNIE and PNDE, with censoring amounts 10%, 25% and 50%.

Figure 15 shows that for small sample sizes the average SE estimates underestimates

the variation of the estimator. The average SE estimates seem to be stable for sample

sizes larger than or equal to 400. The three censoring amounts do not have the same

clear ordering as in the bias of the two-part M model in Figure 13.

Figure 15: The empirical standard deviation divided by the average SE estimate for TNIEand PNDE, with censoring amounts 10%, 25% and 50%.

Figure 16 shows the percent of significant effect estimates (α = 0.05). For TNIE

47

there is a steep increase in significant effects rate between sample size 100 and 300.

The significance rate of 10% and 25% censoring estimates differs substantially from the

50% censoring estimates. The estimates for the 10% and 25% censored data sets are

all significant for sample size 500 and above. The estimates for the 50% censored data

sets have a similar significance rate for small samples but a slower increase tendency

with increased sample size. The PNDE effects has similar patterns however with an even

steeper increase. The PNDE estimates for the 10% and 25% data sets are all significant

for sample sizes 400 and larger. The PNDE estimates for 50% censored data sets are

significant from sample size 500. The estimates on 50% censored data has a stronger

separation from 10% and 25% for PNDE than for TNIE.

Figure 16: Percent of estimates significantly different from 0 with α = 0.05, for the TNIEestimate of the weak effects model.

Table 9: Monte Carlo results for two-part, two-part estimated TNIE with 10% censoring.


100 36.204 43.358 7.154 56.346 43.581 0.784 0.018 1.293150 36.204 39.380 3.176 31.164 29.505 0.815 0.082 1.056200 36.204 38.471 2.267 27.061 24.339 0.845 0.241 1.112250 36.204 38.031 1.827 23.062 21.110 0.856 0.485 1.092300 36.204 37.952 1.748 20.206 19.031 0.871 0.721 1.062400 36.204 37.484 1.280 16.691 16.007 0.885 0.966 1.043500 36.204 36.910 0.706 14.503 13.924 0.897 0.999 1.042

1000 36.204 36.520 0.316 9.482 9.560 0.920 1.000 0.992

In summary the Monte Carlo results were the following. The Two-part M causal effect

48



100 43.750 50.597 6.847 57.245 50.698 0.798 0.066 1.129150 43.750 47.728 3.978 40.852 35.679 0.832 0.191 1.145200 43.750 46.050 2.300 30.455 28.474 0.860 0.367 1.070250 43.750 45.378 1.628 24.426 24.372 0.870 0.551 1.002300 43.750 45.573 1.823 23.192 22.127 0.882 0.754 1.048400 43.750 45.006 1.256 18.842 18.548 0.886 0.954 1.016500 43.750 44.208 0.458 16.016 16.011 0.882 0.999 1.000

1000 43.750 43.735 -0.015 10.693 10.937 0.913 1.000 0.978



100 38.247 48.946 10.699 87.850 66.308 0.806 0.056 1.325150 38.247 44.714 6.467 49.950 41.745 0.819 0.147 1.197200 38.247 42.159 3.912 37.742 31.445 0.828 0.267 1.200250 38.247 40.431 2.184 26.503 25.242 0.855 0.399 1.050300 38.247 40.640 2.393 25.524 22.921 0.850 0.552 1.114400 38.247 40.022 1.775 19.564 18.838 0.874 0.785 1.039500 38.247 39.321 1.074 16.741 16.167 0.878 0.929 1.036

1000 38.247 38.481 0.234 10.984 10.774 0.909 1.000 1.019

estimates are close to unbiased for small sample sizes whereas the classical estimates are

consistently underestimating the indirect effect as well as overestimating the direct effect

for all sample sizes. The two-part M, two-part Y causal effect estimates severely biased

for small sample sizes but becomes close to unbiased for sample sizes above 300.

49

Table 12: Monte Carlo results for two-part, two-part estimated PNDE with 10% censor-ing.


100 46.296 51.230 4.934 41.351 36.387 0.862 0.118 1.136150 46.296 49.039 2.743 28.378 26.763 0.884 0.468 1.060200 46.296 47.966 1.670 23.242 22.283 0.896 0.781 1.043250 46.296 47.678 1.382 20.285 19.543 0.895 0.937 1.038300 46.296 47.948 1.652 18.919 17.814 0.906 0.988 1.062400 46.296 47.569 1.273 15.284 15.150 0.921 1.000 1.009500 46.296 46.904 0.608 13.517 13.287 0.922 1.000 1.017

1000 46.296 46.693 0.397 9.122 9.234 0.940 1.000 0.988



100 40.406 45.535 5.129 35.354 33.680 0.870 0.128 1.050150 40.406 43.949 3.543 27.824 25.283 0.889 0.383 1.101200 40.406 42.691 2.285 21.066 20.794 0.909 0.702 1.013250 40.406 42.201 1.795 18.187 18.074 0.919 0.880 1.006300 40.406 42.222 1.816 16.915 16.486 0.914 0.963 1.026400 40.406 41.948 1.542 13.473 14.046 0.932 1.000 0.959500 40.406 41.077 0.671 11.956 12.230 0.931 1.000 0.978

1000 40.406 40.657 0.251 8.067 8.457 0.953 1.000 0.954



100 23.485 30.842 7.357 35.596 32.339 0.866 0.021 1.101150 23.485 28.044 4.559 23.509 21.683 0.892 0.070 1.084200 23.485 26.609 3.124 19.123 17.006 0.893 0.215 1.124250 23.485 25.730 2.245 14.470 14.079 0.906 0.408 1.028300 23.485 25.486 2.001 13.081 12.712 0.905 0.643 1.029400 23.485 25.211 1.726 10.857 10.668 0.927 0.896 1.018500 23.485 24.537 1.052 9.684 9.205 0.909 0.985 1.052

1000 23.485 23.991 0.506 6.372 6.249 0.940 1.000 1.020

50

5.4 Sensitivity analysis

To get some insight in the sensitivity of the distributional assumptions which are made

for the continuous part of the two-part variables, a small Monte Carlo simulation is

conducted. The design is simple, the intercept of M in the two-part M setting (Model 1)

is gradually decreased. Since substantial parts of the normal distribution is censored, the

dichotomization of Z∗1 is adjusted to obtain the same censoring amount. Figure 17 shows

random samples from the different M-variables that are fitted to the two-part models.

Note that the continuous part of M becomes truncated normal when the point mass starts

to connect with the tail of the normal density, thus the true effects are unknown. Since

the distribution to the right of zero is a truncated normal distribution, it not possible

to calculate the true bias without deriving the causal effects for the two-part model,

with truncated normal distributional assumption of the continuous part of the variable.

As a guiding measure the two-part estimate under the same censoring amount, with no

truncation of the continuous part, is used as ”true” effect.

Figure 18 shows the bias, presented as percentage of true effect. The bias of the

two-part estimates in this graphs is informative with regarding to the robustness of the

two-part estimator. The only reason that makes the true effects different from the es-

timated effects is that the distributional assumption is violated gradually from right to

left. Each point in Figure 18 corresponds to a sample of M, sampled from a population

as shown in Figure 18. When the intercept of M is small enough for the point mass to

connect with the density (in theory it always does, but a practical view is taken here)

the difference of the two-part estimates becomes larger. The two-part estimator does

not seem robust, as soon as the normal distribution is markedly truncated the estimates

differs substantially. In practise it is not uncommon with large deviations from normal

or lognormal continuous parts of two-part variables. The interpretation of the classical

results of Figure 18 is a bit problematic when the true value is unknown. When the nor-

mal distribution assumption of the continuous part of the two-part variables is gradually

violated the difference between the inviolated two-part estimates and the classical esti-

mates change direction. The difference between the classic and the inviolated two-part

is larger than that of the violated two-part for all intercept lower than 0.5. It is however

difficult, to interpret this in an informative way. These results should rather be viewed

as an indication that the robustness of the two-part estimator can be questioned. It is

51

Figure 17: Random samples of 10000 observations from M generated with different in-tercepts. The censored amount is adjusted to 25% for all intercepts.

also a clear indication of the need for two-part models to be implemented with other

continuous distributions than normal.

52

Figure 18: Bias of the TNIE for different intercepts of M. The sample size is 200, thecensoring amount 25%. *Note that the true values are unknown and the bias is ratherthe difference from the two-part effect with inviolated distributional assumptions.

6 Discussion

As expected there are gains to be made from accounting for censoring when the censoring

occurs in the mediator variable. For the two-part M estimation, the Monte Carlo results

for the classical approach consistently underestimated the TNIE, with larger underesti-

mation for larger censoring amounts. Given the purpose of most mediation analysis is

to establish and measure indirect effects compared to the direct effect the bias direction

makes the problem particularly severe. The bias is substantial even for small effects

and censoring amounts, indicating the importance of accounting for censoring also in

situations where the censoring is moderate.

It is clear that even though the TNIE is underestimated by the classical estimates the

TNIE is significant for smaller sample sizes. Since the estimation of the classical effects

is more parsimonious than the two-part, the average SE estimates are a lot smaller. The

cost of complexity is confirmed by the two-part M, two-part Y estimates. For example

the bias is substantial for the small sample sizes as compared to the bias of the two-part

M estimates. The bias is a lot smaller and not substantial for any sample size considered

for the two-part M estimates. The 95% coverage is lower with slower increase than for

two-part M. Thus, not surprisingly, larger sample sizes are needed to benefit from the two-

53

part M, two-part Y estimates. The high significance rate of the classical estimates should

also be contrasted with the rate of confidence interval covering the true effect. When

the latter is also considered, the rate of confidence intervals including the true effect and

not zero is drastically decreased for the classical effects. This pattern is explained by the

severe bias of the classical estimates.

It might seem to be the case that even though the two-part approach gives less un-

derestimated TNIE, the classical estimates have better abilities to establish a significant

effects for small sample sizes. The two-part estimation is more complex, involving the

non-linear probit model, and the SE estimates are considerably higher. However, the

biases of the classical estimates are so substantial that the classical estimates should not

be trusted at all under moderate or heavier censoring. The behaviour of the classical

approach under censoring with zero indirect effect is not investigated, but it seems rea-

sonable that problems and inaccurate inference might be the case. Further analysis of the

behaviour of the bias when there is censoring but no indirect effect is needed to conclude

the behaviour of the effects under zero indirect effects.

The coverage rate of confidence intervals for the classical estimates of PNDE is de-

creasing slower than for TNIE. This is reasonable since in these runs only the mediator

is two-part and therefore the largest distortion in estimates could be expected for the

indirect effect. The two-part M, two-part Y Monte Carlo results also confirm this with

smaller biases, higher coverage rate and higher significance rate for PNDE than for TNIE

in all settings.

The large biases of the classical estimates of the causal effects in the Monte Carlo

simulations are expected. The fact that the point mass of the censored dependent variable

is so clearly separated from the continuous part will make the classical approach ill-suited

for estimation of the model and the effects. The Monte Carlo setup might be criticized as

being in favour of the two-part estimates from the beginning. Such criticism is legitimate,

however, the distributional assumptions made for the censored variables in this study

are among the most common ones made in practice. There are therefore at least two

important questions to answer. The first of which this study set out to answer: Under

correctly assumed distribution of the continuous part of the censored variable, how does

the two-part approach perform? The second question, only touched upon in this study,

is directed to the two-part framework in general rather than mediation analysis: How

54

sensitive is the two-part approach to misspecification of the distributional assumption(s)?

If the censored variable does not look like a spike beside a proper normal distribution but

rather like a spike and a truncated normal, the continuous part is never exactly normal

or log-normal. The results of this study, showing gains from two-part estimated effects,

are indicating that it is important to establish the answer to the second question. The

small sensitivity analysis indicate that misspecification of the distributional assumption

influences the estimation substantially. In order to be able to claim the benefits from the

two-part causal effects the sensitivity of the assumption must be investigated in detailed,

and also other distributions than normal and lognormal might need to be considered to

better fit the data in practise.

7 Conclusion

In this study the counterfactual framework has been used to derive the causal effects of

a flexible mediation model accounting for censored mediator and outcome. Even though

the assumptions are strict, the simplicity of the definition allows us to define the effects

even when the functional forms become somewhat complex. Building on the limited-

dependent variable approach suggested by Cragg (1971), the two-part model is used to

account for censoring in form of floor effects at zero. The first part of this study motivated

and explained the model formulation, for which the causal effects were derived in detail.

A Monte Carlo simulation was performed to investigate the small sample properties of

these effects.

Referring to the research questions regarding model formulation and assumptions the

two-part framework, together with two-group regression, were used to account for censor-

ing. This gives the additional assumption(s) about the distribution of the continuous part

of the two-part variable. The two-part framework also relies on the implicit assumption

of a data generating process where the point mass differs from the continuous part, with

respect to other variables, in the model.

Detailed derivations of the causal effects of the two-part mediator, and the two-part

mediator, two-part outcome variable are presented in Appendix A and B.

The research questions concerning the accuracy and small sample behaviour of the

derived causal effects were answered with Monte Carlo simulations. The simulations

55

showed that there are large improvements to be made if correctly accounting for limited

mediator and outcome. The simulations also showed that the flexibility of these models

are not too costly in terms of estimation performance and power. The two-part M model

performed well from sample sizes as small as 100. For the two-part M, two-part Y

model samples larger than 300 is suggested for nice behaviour of the estimates. It is,

however, also discussed and pointed out throughout the study that the results of the

simulation study is under perfect distributional assumptions. The small Monte Carlo

study of the sensitivity of the normal assumption of the two-part M model indicates the

need of other distributions. A natural step would be to implement the causal effect under

truncated normal assumption of the continuous part of the two-part variables. The two-

part with truncated normal-effects would also serve as tool for evaluating the robustness

of the normal assumption accurately. It is clear that the two-part model can give much

improvement in causal effect estimation in limited-dependent variable mediation analysis.

However the two-part model it self may still benefit from some improvements.

Acknowledgements

The author would like to thank Shaobo Jin for his patient mathematical advise and

suggestions. Tihomir Asparouhov for his feedback on derivations.

56

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley Series in Prob-

ability and Statistics. Wiley.

Aitchison, J. and Brown, J. A. C. (1966). The Lognormal Distribution: With Special Ref-

erence to Its Uses in Economics. Cambridge. University. Dept. of Applied Economics.

Monographs, 5. University Press.

Baron, R. M. and Kenny, D. (1986). The Moderator-Mediator Variable Distinction in

Social The Moderator-Mediator Variable Distinction in Social Psychological Research:

Conceptual, Strategic, and Statistical Considerations. Journal of Personality and Social

Psychology, 51(6):1173–1182.

Brown, E., Catalano, R., Fleming, C., Haggerty, K., and Abbott, R. (2005). Adolescent

substance use outcomes in the Raising Healthy Children Project: A two-part latent

growth curve analysis. Journal of Consulting and Clinical Psychology, 73:699.

Cohen, J. (1992). A Power Primer. Psychological Bulletin, 112(July):155–9.

Cragg, J. (1971). Some Statistical Models for Limited Dependent Variables with Appli-

cation to the Demand for Durable Goods. Econometrica, 39(5):829–844.

Duan, N., Manning, W. G., Morris, C. N., and Newhouse, J. P. (1983). A Comparison of

Alternative Models for the Demand for Medical Care. Journal of Business & Economic

Statistics, 1(2):115–126.

Gill, J. (2000). Generalized Linear Models: A Unified Approach. Quantitative Applica-

tions in the Social Sciences. SAGE Publications.

Greene, W. H. (2012). Econometric Analysis. The Pearson series in economics. Pearson.

Hayes, A. F. (2013). Introduction to Mediation, Moderation, and Conditional Process

Analysis: A Regression-Based Approach. Methodology in the Social Sciences Series.

Guilford Press.

Imai, K., Keele, L., and Tingley, D. (2010). A general approach to causal mediation

analysis. Psychological Methods, 15(4):309–334.

57

Imbens, G. W. G. and Angrist, J. D. J. (1994). Identification and Estimation of Lo-

cal Average Treatment Effects. Econometrica: Journal of the Econometric Society,

62(2):467–475.

James, L. R. and Brett, J. M. (1984). Mediators, moderators, and tests for mediation.

Journal of Applied Psychology, 69(2):307–321.

Jones, A. M. (1989). A double-hurdle model of cigarette consumption. 4(August 1988):23–

39.

Judd, M. C. and Kenny, D. (1981). Estimating Mediation in Treatment Evaluations.

Evaluation Review, 5(5):602–619.

Keele, L. (2015). Causal Mediation Analysis: Warning! Assumptions Ahead. American

Journal of Evaluation, pages 1–14.

Muthen, B. (1979). Probit Model With Latent Variables. 74(368):807–811.

Muthen, B., Muthen, L., and Asparouhov, T. (2016). Regression and Mediation Analysis

Using Mplus. Muthen & Muthen, Los Angeles.

Muthen, L. and Muthen, B. Mplus User’s Guide. Seventh Edition. Muthen & Muthen,

Los Angeles.

Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669–688.

Pearl, J. (2001). Direct versus Total Effects. Proceedings of the Seventeenth Conference

on Uncertainty in Artificial Intelligence, (1992):411–420.

Robins, J. M. and Greenland, S. (1992). Identifiability and exchangeability for direct and

indirect effects. Epidemiology, pages 143–155.

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonran-

domized studies. Journal of Educational Psychology, 66(5):688–701.

Rucker, D. D., Preacher, K. J., Tormala, Z. L., and Petty, R. E. (2011). Mediation

Analysis in Social Psychology: Current Practices and New Recommendations. Social

and Personality Psychology Compass, 5(6):359–371.

58

Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search.

Technometrics, 45(3):272–273.

Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables ESTIMA-

TION OF RELATIONSHIPS FOR LIMITED DEPENDENT VARIABLES’. Source:

Econometrica, 26(1):24–36.

VanderWeele, T. (2015). Explanation in Causal Inference: Methods for Mediation and

Interaction. Oxford University Press, Incorporated.

Vanderweele, T. J. (2012). Mediation analysis with multiple versions of the mediator.

Epidemiology (Cambridge, Mass.), 23(3):454–63.

VanderWeele, T. J. and Vansteelandt, S. (2010). Odds Ratios for Mediation Analysis for

a Dichotomous Outcome. American Journal of Epidemiology, 172(12):1339–1348.

Wang, W. and Albert, J. M. (2012). Estimation of mediation effects for zero-inflated

regression models. Statistics in Medicine, 31(26):3118–3132.

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Econo-

metric Analysis of Cross Section and Panel Data. MIT Press.

59

A Appendix - Derivation - Two-part M

ModelsYi = β

(1)0 + β1 log(Mi) + β


(1)4 Ci + εyi ,M > 0

Yi = β(2)0 + β

(2)2 Xi + β

(2)4 Ci + εyi ,M = 0

log(Mi|M > 0) = γ0 + γ1Xi + γ2Ci + εmi

probit(Pr(Mi > 0)) = κ0 + κ1Xi + κ2Ci

(31)

Where by assumption εyi ∼ N(0, σ2y) and εmi ∼ N(0, σ2

m). This model is a combination of a two-part model for M and a two group model for Y.

The expected values needed to define the causal effects are of the kind in Equation 32. Since it is a sum the terms can be calculated sepa-

rately.E[Y (x1, log(M(x0)))] = E[y|X = x1,M = 0, C = c]× P (M = 0|X = x0)+∫ ∞−∞

E[Y |X = x1,M > 0, C = c]× P (M > 0|X = x0, C = c)× f(log(M)|M > 0, X = x0, C = c)∂ log(M)(32)

First term

Which gives the first term the simple form

E[Y |X = x1,M = 0, C = c]× P (M = 0|X = x0) = (β(2)0 + β

(2)2 x1 + β

(2)4 c)× (1− Φ(κ0 + κ1x0 + κ2c)) (33)

60

Second term

In the second case the randomness of M must be accounted for, since the conditioning is not on a fixed value of M.

∫ ∞−∞

E[y|X = x1,M > 0, C = c]× P (M > 0|X = x0, C = c)× f(log(M)|M > 0, X = x0, C = c)︸︷︷︸Normal density by assumption

∂ log(M) =

=∫ ∞−∞

(β

(1)0 + β1 log(M) + β

(1)2 x1 + β3 log(M)x1 + β

(1)4 c

)× Φ(κ0 + κ1x0 + κ2c)× f(log(M); γ0 + γ1x0 + γ2c, σ

2)∂ log(M) =

= Φ(κ0 + κ1x0 + κ2c)×∫ ∞−∞

(β

(1)0 + β

(1)2 x1 + β

(1)4 c+ log(M)(β1 + β3x1)

)× f(log(M); γ0 + γ1x0 + γ2c, σ

2)∂ log(M)︸︷︷︸Part 1

(34)

looking only at the integral , using the sum rule of integration, and for simplicity let µm = γ0 + γ1x0 + γ2c

Part 1 =∫ ∞−∞

(β(1)0 + β

(1)2 x1 + β

(1)4 c)× f(log(M);µm, σ2)∂ log(M) +

∫ ∞−∞

log(M)(β1 + β3x1)× f(log(M);µm, σ2)∂ log(M) =

= (β(1)0 + β

(1)2 x1 + β

(1)4 c)×

∫ ∞−∞

f(log(M);µm, σ2)∂ log(M)︸︷︷︸=1

+(β1 + β3x1)×∫ ∞−∞

log(M)× f(log(M);µm, σ2)∂ log(M)︸︷︷︸=E[log(M)]=µm

=

= (β(1)0 + β

(1)2 x1 + β

(1)4 c) + (β1 + β3x1)(µm)

(35)

Combining this with the parts outside the integral gives

∫ ∞−∞

E[y|X = x1,M > 0, C = c]× P (M > 0|X = x0, C = c)× f(M |M > 0, X = x0, C = c)∂M =

= Φ(κ0 + κ1x0 + κ2c)×((β(1)

0 + β(1)2 x1 + β

(1)4 c) + (β1 + β3x1)(µm)

) (36)

61

Full expression

Adding the first and the second term the full expression is given by

E[Y (x1, log(M(x0)))] =

= (β(2)0 + β

(2)2 x1 + β

(2)4 c)× (1− Φ(κ0 + κ1x0 + κ2c)) + Φ(κ0 + κ1x0 + κ2c)×

((β(1)

0 + β(1)2 x1 + β

(1)4 c) + (β1 + β3x1)(γ0 + γ1x0 + γ2c)

)=

= (β(2)0 + β

(2)2 x1 + β

(2)4 c)− (β(2)

0 + β(2)2 x1 + β

(2)4 c)× Φ(κ0 + κ1x0 + κ2c) + Φ(κ0 + κ1x0 + κ2c)× (β(1)

0 + β(1)2 x1 + β

(1)4 c)+

Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c)

(37)

At this point β(1)2 = β

(2)2 and β

(1)4 = β

(2)4 is assumed, indicated by dropped superscript below. If this is not desired the above expression can be

used to define the effects.

E[Y (x1, log(M(x0)))] =

β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c) + (β2x1 + β4c)× (1− Φ(κ0 + κ1x0 + κ2c) + Φ(κ0 + κ1x0 + κ2c))+

Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c) =

= β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c) + (β2x1 + β4c) + Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c)

(38)

From the expression it can be seen that the parts dependent on M has the probit of M being larger than 1 multiplied with them. This also means

that if Pr(M>0) is close to one the effects are exactly the usual effects without two-part M.

62

Effects

For convenience some compact notation will be used for defining the counterfactuals based effects. E[Y (x1, log(M(x0))] is to be understood as the

expected value of Y given that Y is conditioned on x1 and log(M) is conditioned on x0.



= β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x1 + κ2c) + (β2x1 + β4c) + Φ(κ0 + κ1x1 + κ2c)× (β1 + β3x1)× (γ0 + γ1x1 + γ2c)−

− β(2)0 − (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c)− (β2x1 + β4c)− Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x1)× (γ0 + γ1x0 + γ2c) =

= (β(1)0 − β(2)

0 )(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+ (β1 + β3x1)

(Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)− Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)

)(39)



= β(2)0 + (β(1)


− β(2)0 − (β(1)


= β2 × (x1 − x0) + Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)× β3 × (x1 − x0)

(40)

63



= β(2)0 + (β(1)


− β(2)0 − (β(1)


= (β(1)0 − β(2)

0 )×(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+

(β1 + β3x0)×(Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)− Φ(κ0 + κ1x0 + κ2c)× (γ0 + γ1x0 + γ2c)

)(41)



= β(2)0 + (β(1)


− β(2)0 − (β(1)


= β2 × (x1 − x0) + Φ(κ0 + κ1x1 + κ2c)× (γ0 + γ1x1 + γ2c)× β3 × (x1 − x0)

(42)

64

The Total effect


= β(2)0 + (β(1)


− β(2)0 + (β(1)

0 − β(2)0 )× Φ(κ0 + κ1x0 + κ2c) + (β2x0 + β4c) + Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x0)× (γ0 + γ1x0 + γ2c) =

= (β(1)0 − β(2)

0 )×(Φ(κ0 + κ1x1 + κ2c)− Φ(κ0 + κ1x0 + κ2c)

)+ β2 × (x1 − x0)+

+ Φ(κ0 + κ1x1 + κ2c)× (β1 + β3x1)× (γ0 + γ1x1 + γ2c)− Φ(κ0 + κ1x0 + κ2c)× (β1 + β3x0)× (γ0 + γ1x0 + γ2c)

(43)

For which it holds that

TE = TNIE + PNDE (44)65

B Appendix - Derivation - Twopart M, twopart Y

Modelslog(Yi|Yi > 0) = β

(1)0 + β1 log(Mi) + β


(1)4 Ci + εyi ,M > 0

probit(Pr(Yi > 0)) = θ(1)0 + θ1 log(Mi) + θ

(1)2 Xi + θ3 log(Mi)Xi + θ

(1)4 Ci ,M > 0

log(Yi|Yi > 0) = β(2)0 + β

(2)2 Xi + β

(2)4 Ci + εyi ,M = 0

probit(Pr(Yi > 0)) = θ(2)0 + θ

(2)2 Xi + θ

(2)4 Ci ,M = 0

log(Mi|Mi > 0) = γ0 + γ1Xi + γ2Ci + εmi

probit(Pr(Mi > 0)) = κ0 + κ1Xi + κ2Ci

(45)

Where by assumption εyi ∼ N(0, σ2y) and εmi ∼ N(0, σ2

m). This is a combination of a two-part model for M, a two-part model for Y together with

a two group model for Y.

The expected values needed to define the causal effects are of the kind displayed Equation 46. The terms where Y=0 is not brought into the

expression since they will be zero regardless of the corresponding probability. Since the expected value is a sum the terms can be calculated

66

separately.

E[Y (x1, log(M(x0)))] =

P (M = 0|X = x0, C = c)× P (Y > 0|X = x1,M = 0, C = c)× E[Y |Y > 0, X = x1,M = 0, C = c]+∫ ∞−∞

P (M > 0|X = x0, C = c)× P (Y > 0|X = x1,M = m,C = c)× f(log(M)|M > 0, X = x0, C = c)× E[Y |Y > 0, X = x1,M > 0, C = c] ∂ log(M)

(46)

First termI = E[Y |Y > 0, X = x1,M = 0, C = c]× P (M = 0|X = x0)× P (Y > 0|X = x1,M = 0, C = c) =

= φ× exp(β

(2)0 + β

(2)2 x1 + β

(2)4 c

)× Φ

(θ

(2)0 + θ

(2)2 x1 + θ

(2)4 c

)×(1− Φ

(κ0 + κ1x0 + κ2c

)) (47)

Where φ = exp(σ2εy

2

)from the definition of the expected value of a lognormal distribution (Aitchison and Brown, 1966).

67

Second termII =

∫ ∞−∞

E[Y |Y > 0, X = x1,M = m,C = c]× P (Y > 0|X = x1,M = 0, C = c)×

P (M > 0|X = x0, C = c)× f(log(M)|M > 0, X = x0, C = c)︸︷︷︸Normal density by assumption

∂ log(M) =

=∫ ∞−∞

φ× exp(β

(1)0 + β1 log(m) + β

(1)2 x1 + β3 log(m)x1 + β

(1)4 c

)×

Φ(θ

(1)0 + θ1 log(m) + θ

(1)2 x1 + θ3 log(m)x1 + θ

(1)4 c

)× Φ

(κ0 + κ1x0 + κ2c

)× f(log(M), γ0 + γ1xo + γ2c︸︷︷︸

=µM

;σ2M )∂ log(M) =

= φ× exp(β

(1)0 + β

(1)2 x1 + β

(1)4 c

)× Φ

(κ0 + κ1x0 + κ2c

)×∫ ∞

−∞exp (β1 log(m) + β3 log(m)x1)× f(log(M), µM ;σ2

M )︸︷︷︸Part1

×Φ(θ

(1)0 + θ1 log(m) + θ

(1)2 x1 + θ3 log(m)x1 + θ

(1)4 c

)∂ log(M)

(48)

68

Again φ = exp(σ2εy

2

)(Aitchison and Brown, 1966). Note that µM and σ2

M is short for µlog(M) and σ2log(M) respectively. A closer look at part 1 by

expanding the density function of log(M)

Part1 = exp(

log(m) (β1 + β3x1))× (2πσ2

M )−12 exp

(− 1

2σ2M

(log(m)− µM )2)

=

= (2πσ2M )−

12 × exp

(− 1

2σ2M

(log(m)− µM )2 + log(m) (β1 + β3x1))

=

= (2πσ2M )−

12 × exp

(− log(m)2

2σ2M

+ 2 log(m)µM2σ2

M

− µ2M

2σ2M

+ log(m) (β1 + β3x1) 2σ2µM2σ2

MµM

)=

= (2πσ2M )−

12 × exp

(− log(m)2

2σ2M

+ 2 log(m)µM2σ2

M

(1 + (β1 + β3x1)σ2

M

µM

)− µ2

M

2σ2M

)=

= (2πσ2M )−

12 × exp

(− log(m)2

2σ2 + 2 log(m)µM2σ2

M

(1 + (β1 + β3x1)σ2

M

µM

)− µ2

M

2σ2M

−

− µ2M

2σ2M

(1 + (β1 + β3x1)σ2

M

µM

)2

+ µ2M

2σ2M

(1 + (β1 + β3x1)σ2

M

µM

)2)=

(49)

Set b =(

1 + (β1+β3x1)σ2M

µM

), this gives

Part1 = exp(µ2M

2σ2M

(b2 − 1

))× (2πσ2

M )−12 × exp

(− 1

2σ2M

(log(m)− bµM )2)

︸︷︷︸Normal density

(50)

Part 1 is inserted in IIII = φ× exp

(β

(1)0 + β

(1)2 x1 + β

(1)4 c

)× Φ

(κ0 + κ1x0 + κ2c

)× exp

(µ2M

2σ2M

(b2 − 1

))×∫ ∞

−∞Φ(θ

(1)0 + θ1 log(m) + θ

(1)2 x1 + θ3 log(m)x1 + θ

(1)4 c

)× f(log(M); bµM , σ2

M )∂ log(M)︸︷︷︸Part2

(51)

69

Part 2 is expandedPart2 =

∫ ∞−∞

Φ(θ

(1)0 + θ1 log(m) + θ

(1)2 x1 + θ3 log(m)x1 + θ

(1)4 c

)× f(log(M); bµM , σ2

M )∂ log(M) =

=∫ ∞−∞

∫ θ(1)0 +θ1 log(m)+θ(1)

2 x1+θ3 log(m)x1+θ(1)4 c

−∞f(Z; 0, 1)∂Z × f(log(M); bµM , σ2

M )∂ log(M) =

=∫ ∞−∞

∫ θ(1)0 +θ(1)

2 x1+θ(1)4 c+log(m)(θ1+θ3x1)

−∞f(Z; 0, 1)∂z × f(log(M); bµM , σ2

M )∂ log(M)

(52)

By integral transformations

Part2 =∫ ∞−∞

∫ θ(1)0 +θ(1)

2 x1+θ(1)4 c

−∞f(Z| log(M);− log(m) (θ1 + θ3x1) , 1

)∂Z × f

(log(M); bµM , σ2

M

)∂ log(M) = (53)

Since the support of the integral no longer is a function of M or Z, the orders of the integrals may be changed

Part2 =∫ θ

(1)0 +θ(1)

2 x1+θ(1)4 c

−∞

∫ ∞−∞

f(Z| log(M);− log(m) (θ1 + θ3x1) , 1

)× f

(log(M); bµM , σ2

M

)∂ log(M)∂Z (54)

By the appendix in Muthen (1979) the inner integral can be simplified further

Part2 =∫ θ

(1)0 +θ(1)

2 x1+θ(1)4 c

−∞f(Z;− (θ1 + θ3x1) bµ︸︷︷︸

Mean

, (θ1 + θ3x1)2 σ2M + 1︸︷︷︸

Variance

)∂Z (55)

Again integral transformations gives

Part2 =∫ θ

(1)0 +θ(1)

2 x1+θ(1)4 c+(θ1+θ3x1)bµ√

(θ1+θ3x1)2σ2M

+1

−∞f(z; 1, 0)∂Z = Φ

(θ

(1)0 + θ

(1)2 x1 + θ

(1)4 c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)(56)

70

Part 2 is inserted into II

II = φ× exp(β

(1)0 + β

(1)2 x1 + β

(1)4 c

)× Φ

(κ0 + κ1x0 + κ2c

)× exp

(µ2

2σ2M

(b2 − 1

))× Φ

(θ

(1)0 + θ

(1)2 x1 + θ

(1)4 c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)=

= exp(β

(1)0 + β

(1)2 x1 + β

(1)4 c+ µ2

2σ2M

(b2 − 1

))× Φ

(κ0 + κ1x0 + κ2c

)× Φ

(θ

(1)0 + θ

(1)2 x1 + θ

(1)4 c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

) (57)

Adding I and II together yields

E[Y (x1, log(M(x0)))] =

= φ× exp(β

(2)0 + β

(2)2 x1 + β

(2)4 c

)× Φ

(θ

(2)0 + θ

(2)2 x1 + θ

(2)4 c

)×(1− Φ

(κ0 + κ1x0 + κ2c

))+

φ× exp(β

(1)0 + β

(1)2 x1 + β

(1)4 c+ µ2

2σ2M

(b2 − 1

))× Φ

(κ0 + κ1x0 + κ2c

)× Φ

(θ

(1)0 + θ

(1)2 x1 + θ

(1)4 c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

) (58)

71

Common parameters

Set β(1)2 = β

(2)2 and β

(1)4 = β

(2)4 . That is, the slope of Y on X and the covariate C is assumed to be the same for both groups of M. If that is not

desired the expression above can be used to calculate the effects. This restriction gives us the simplified expression

E[Y (x1, log(M(x0)))] = φ× exp(β2x1 + β4c

)×(

exp(β

(2)0

)Φ(θ

(2)0 + θ

(2)2 x1 + θ

(2)4 c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ

(1)2 x1 + θ

(1)4 c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)(59)

Similar restrictions are in this derivation made to the coefficients of the probit model, that is θ(1)2 = θ

(2)2 and θ(1)

4 = θ(2)4 . Again, if these restrictions

are not desired the above expression can be used to calculate the effects. These restrictions gives


)×(

exp(β

(2)0

)Φ(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)(60)

Although not as easy to see as for the two-part M derivation this again reduces to the usual counterfactual based effects when the Pr(M>0) and

Pr(Y>0) gets close to 1.

72

The conditional expectations

There are four conditional expected values to define all causal effects. The one derived was arbitrarily chosen since Y and M is conditioned on

different x-values, which makes the generalization easier at the next step.


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) (61)


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

) (62)


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) (63)


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

) (64)

73

Causal effects

For convenience lets set up some notation for when defining the counterfactuals based effects. E[Y (x1, log(M(x0))] is to be understood as the

expected value of Y given that Y is conditioned on x1 and log(M) is conditioned on x0.

At the end of each expression b and µ are substituted with their full expression, note that both are functions of X and are thus different from case

to case.

(b|x1,M(x0)) = 1 + (β1 + β3x1)σ2M

γ0 + γ1x0 + γ2c

(b|x1,M(x1)) = 1 + (β1 + β3x1)σ2M

γ0 + γ1x1 + γ2c

(b|x0,M(x0)) = 1 + (β1 + β3x0)σ2M

γ0 + γ1x0 + γ2c

(b|x0,M(x1)) = 1 + (β1 + β3x0)σ2M

γ0 + γ1x1 + γ2c

(65)

(µM |M(x1) = γ0 + γ1x1 + γ2c

(µM |M(x0) = γ0 + γ1x0 + γ2c(66)

74




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)−φ× exp

(β2x1 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

) =

= φ× exp(β2x1 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)

(67)

75




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)−φ× exp

(β2x0 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) =


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

((

1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1xo + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(68)

76




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

)−φ× exp

(β2x0 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) =


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

((

1 + (β1 + β3x0)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1xo + γ2c)2

2σ2M

(1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1x0 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(69)

77




)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)−φ× exp

(β2x0 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) =


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(70)

78

The Total effectTE = E[Y (x1, log(M(x1))|C = c]− E[Y (x0, log(M(x0))|C = c] =


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x1 + κ2c)× Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1) bµ√

(θ1 + θ3x1)2 σ2M + 1

)−φ× exp

(β2x0 + β4c

)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp(β

(1)0 + µ2

2σ2M

(b2 − 1))× Φ (κ0 + κ1x0 + κ2c)× Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0) bµ√

(θ1 + θ3x0)2 σ2M + 1

) =


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x1 + θ4c

)×(1− Φ (κ0 + κ1x1 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x1 + γ2c)2

2σ2M

(1 + (β1 + β3x1)σ2M

(γ0 + γ1x1 + γ2c)

)2

− 1

× Φ (κ0 + κ1x1 + κ2c)×

Φ

θ(1)0 + θ2x1 + θ4c+ (θ1 + θ3x1)

(1 + (β1+β3x1)σ2

M(γ0+γ1x1+γ2c)

)(γ0 + γ1x1 + γ2c)√

(θ1 + θ3x1)2 σ2M + 1

)−


)×(

exp(β

(2)0

)× Φ

(θ

(2)0 + θ2x0 + θ4c

)×(1− Φ (κ0 + κ1x0 + κ2c)

)+

exp

β(1)0 + (γ0 + γ1x0 + γ2c)2

2σ2M

(1 + (β1 + β3x0)σ2M

(γ0 + γ1x0 + γ2c)

)2

− 1

× Φ (κ0 + κ1x0 + κ2c)×

Φ

θ(1)0 + θ2x0 + θ4c+ (θ1 + θ3x0)

(1 + (β1+β3x0)σ2

M(γ0+γ1x0+γ2c)

)(γ0 + γ1x0 + γ2c)√

(θ1 + θ3x0)2 σ2M + 1

)

(71)

79

C Appendix - Mplus syntax

In the subsections below some examples of input syntax for Mplus is displayed. The

examples are arbitrary chosen from the 192 runs that was performed for this study.

The code of the runs not displayed here was of the same structure, only with different

parameters values of the true models.

Internal Monte Carlo simulation Mplus syntaxInput syntax for the internal Monte Carlo simulation of the weak effects model. Allgenerated variables are independent normal variables.

MONTECARLO:NAMES ARE X Mres1 Zres Yres ;NREPS=1000;NOBS=100;REPSAVE=ALL;SAVE = twopartm . 1 0 0∗ . dat ;SEED=11;

MODEL POPULATION:[X@5 ] ; X@1;[ Mres1@0 ] ; Mres1@1 ;[ Zres@0 ] ; Zres@1 ;[ Yres@0 ] ; Yres@0 . 3 ;

MODEL:[X@5 ] ; X@1;[ Mres1@0 ] ; Mres1@1 ;[ Zres@0 ] ; Zres@1 ;[ Yres@0 ] ; Yres@0 . 3 ;

External Monte Carlo simulation, two-part M two-part Y, syntaxInput syntax for the external Monte Carlo simulation of the weak effects model. TheTrue model is two-part M, two-part Y. The effects are estimated with a two-part M,two-part Y model. The Censoring amount is 25%.

DATA:FILE = twopartMandY .100 l i s t . dat ;TYPE = MONTECARLO;

Var iab le :NAMES = X Mres Zres1 Zres2 Yres ;USEVAR = X Zm Zy logYpos logMpos ;

81

CATEGORICAL = Zm Zy ;

DEFINE: M = exp (4 + 0.5∗X + Mres ) ;Zstar1 = 0.4∗X + Zres1 ;IF ( Zstar1 <=1.273552) THEN Zm true = 0 ;IF ( Zstar1> 1 .273552) THEN Zm true = 1 ;logM = log (M) ;Zstar2 = 0.3∗ Zm true + 0.35∗ logM + 0.32∗X + Zres2 ;Y = exp(2+ 0.55∗ Zm true + 0.25∗ logM + 0.25∗X + Yres ) ;logY=log (Y) ;IF ( Zstar2 <=3.313404) THEN logY = 0 ;IF ( Zstar1 <=1.273552) THEN logM = 0 ;IF ( logM<=0) THEN Zm = 0 ;IF ( logM> 0) THEN Zm = 1 ;IF ( logM> 0) THEN logMpos = logM ;IF ( logY<=0) THEN Zy = 0 ;IF ( logY >0) THEN Zy = 1 ;IF ( logY >0) THEN logYpos = logY ;

ANALYSIS :ESTIMATOR = ML;LINK = PROBIT;INTEGRATION = MONTECARLO;PROCESSORS = 2 ;

MODEL:logMpos ON X∗0 . 5 (gamma1 ) ;[ logMpos ∗4 ] (gamma0 ) ;logMpos ( sigM ) ;

Zm ONX∗0 . 4 ( kappa1 ) ;[ Zm$1∗1 . 273552 ] ( kappa0 ) ;

logYpos ONZm∗0 .55 ( b 0 d i f f )logMpos ∗0 .25 ( bet1 )X∗0 .25 ( bet2 ) ;

[ logYpos ∗2 ] ( bet01 ) ;logYpos ∗1( sigY ) ;

Zy ONZm∗0 .3 ( t h e t 0 d i f f )logMpos ∗0 .35 ( thet1 )X∗0 .32 ( thet2 ) ;

[ Zy$1 ∗3 .313404 ] ( thet01 ) ;

82

MODEL CONSTRAINT:new( bet02 bet3 thet02 thet3 b11 b00 b01 b10 mu0 mu1 x0 x1e00 e11 e01 e10 f i bet30 bet31 Pk0 Pk1 sq0 sq1 Pt020 Pt021TNIE∗43.75021PNDE∗40.40601PNIE∗33.08004TNID∗51.07618TE∗84.15622CONTROL∗84 . 15622 ) ;x1 = 6 ;x0 = 5 ;bet3 = 0 ;thet3 = 0 ;bet02 = b 0 d i f f + bet01 ;thet02 = t h e t 0 d i f f + (− thet01 ) ;f i = EXP( sigY / 2 ) ;mu0 = (gamma0+gamma1∗x0 ) ;mu1 = (gamma0+gamma1∗x1 ) ;Pk0 = PHI(−kappa0+kappa1∗x0 ) ;Pk1 = PHI(−kappa0+kappa1∗x1 ) ;Pt020 = PHI( thet02+thet2 ∗x0 ) ;Pt021 = PHI( thet02+thet2 ∗x1 ) ;sq0 = SQRT( ( thet1+thet3 ∗x0 )ˆ2∗ sigM +1);sq1 = SQRT( ( thet1+thet3 ∗x1 )ˆ2∗ sigM +1);bet30 = ( bet1+bet3∗x0 ) ;bet31 = ( bet1+bet3∗x1 ) ;b00 = (1+bet30∗sigM/mu0 ) ;b11 = (1+bet31∗sigM/mu1 ) ;b10 = (1+bet31∗sigM/mu0 ) ;b01 = (1+bet30∗sigM/mu1 ) ;

e00 = f i ∗EXP( bet2∗x0 )∗ (EXP( bet02 )∗Pt020∗(1−Pk0)+EXP( bet01+(mu0ˆ2/2∗ sigM )∗ ( b00ˆ2−1))∗Pk0∗PHI((− thet01+thet2 ∗x0+(thet1+thet3 ∗x0 )∗b00∗mu0)/ sq0 ) ) ;



e01 = f i ∗EXP( bet2∗x0 )∗ (

83

EXP( bet02 )∗Pt020∗(1−Pk1)+EXP( bet01+(mu1ˆ2/2∗ sigM )∗ ( b01ˆ2−1))∗Pk1∗PHI((− thet01+thet2 ∗x0+(thet1+thet3 ∗x0 )∗b01∗mu1)/ sq0 ) ) ;

TNIE = e11−e10 ;PNDE = e10−e00 ;PNIE = e01−e00 ;TNID = e11−e01 ;TE = e11−e00 ;CONTROL = TNIE+PNDE;

External Monte Carlo simulation, classical, syntaxInput syntax for the external Monte Carlo simulation of the weak effects model. The truemodel is two-part M. The effects are estimated with a classical model, without accountingfor the censored M. The Censoring amount is 25%.

DATA:FILE = twopartm .100 l i s t . dat ;TYPE = MONTECARLO;

VARIABLE:NAMES = X Mres1 Zres Yres ;USEVAR = X Y M;

DEFINE:M = 4 + 0.5∗X + Mres1 ;Zstar = 0.4∗X + Zres ;IF ( Zstar <=1.273552) THEN Ztrue = 0 ;IF ( Zstar> 1 .273552) THEN Ztrue = 1 ;Y = 2 + 0.5∗ Ztrue+ 0.25∗M +0.25∗X +Yres ;IF ( Zstar <=1.273552) THEN M = 0 ;

ANALYSIS :ESTIMATOR = ML;

MODEL:M ON X∗0 . 5 (gamma1 ) ;[M∗4 ] ( gamma0 ) ;

M∗1 ;

Y ONM∗0 .25 ( bet1 )X∗0 .25 ( bet2 ) ;

[Y∗2 ] ( bet01 ) ;

84

Y∗1 ;

MODEL CONSTRAINT:NEW( x1 x0 e00 e01 e10 e11 bet02 bet3TNIE∗0.2255199PNDE∗0 .25PNIE∗0.2255199TNID∗0 .25TE∗0.4755199CONTROL∗0 .4755199 ;bet02 = 0 ;x1 = 6 ;x0 = 5 ;bet3 = 0 ;e00 = bet01+bet1 ∗(gamma0+gamma1∗x0)+bet2∗x0 ;e11 = bet01+bet1 ∗(gamma0+gamma1∗x1)+bet2∗x1 ;e01 = bet01+bet1 ∗(gamma0+gamma1∗x1)+bet2∗x0 ;e10 = bet01+bet1 ∗(gamma0+gamma1∗x0)+bet2∗x1 ;TNIE = e11−e10 ;PNDE = e10−e00 ;PNIE = e01−e00 ;TNID = e11−e01 ;TE = e11−e00 ;CONTROL = TNIE+PNDE;

External Monte Carlo simulation, two-part M, syntaxInput syntax for the Monte Carlo simulation for the weak effects model. The true modelis two-part M. The effects are estimated with a two-part M model. The Censoring amountis 25%.

DATA:FILE = twopartm .100 l i s t . dat ;TYPE = MONTECARLO;

Var iab le :NAMES = X Mres1 Zres Yres ;USEVAR = X Y Z Mpos ;CATEGORICAL = Z ;

DEFINE:M = 4 + 0.5∗X + Mres1 ;Zstar = 0.4∗X + Zres ;

IF ( Zstar <=1.273552) THEN Ztrue = 0 ;IF ( Zstar >1.273552 ) THEN Ztrue = 1 ;Y = 2 + 0.5∗ Ztrue+ 0.25∗M +0.25∗X +Yres ;IF ( Zstar$<=$1 .273552) THEN M = 0 ;

85

IF (M<=0) THEN Z = 0 ;IF (M> 0) THEN Z = 1 ;IF (M> 0) THEN Mpos = M;

ANALYSIS :ESTIMATOR = ML;LINK = PROBIT;

MODEL:Mpos ON X∗0 . 5 (gamma1 ) ;[ Mpos∗4 ] ( gamma0 ) ;Mpos∗1 ;

Z ONX∗0 . 4 ( kappa1 ) ;[ Z$1 ∗1 . 273552 ] ( kappa0 ) ;

Y ONZ∗0 . 5 ( b 0 d i f f )Mpos∗0 .25 ( bet1 )X∗0 .25 ( bet2 ) ;

[Y∗2 ] ( bet01 ) ;Y∗ 0 . 3 ;Mpos ;

MODEL CONSTRAINT:NEW( x1 x0 e00 e01 e10 e11 bet02 Pk0 Pk1 bet3TNIE∗0.2255199PNDE∗0 .25PNIE∗0.2255199TNID∗0 .25TE∗0.4755199CONTROL∗0 .4755199 ;bet02 = b 0 d i f f + bet01 ;x1 = 6 ;x0 = 5 ;Pk0 = PHI(−kappa0+kappa1∗x0 ) ;Pk1 = PHI(−kappa0+kappa1∗x1 ) ;bet3 =0;

e00 = bet02 + ( bet01−bet02 )∗Pk0 + bet2∗x0 +Pk0∗( bet1+bet3∗x0 )∗ (gamma0+gamma1∗x0 ) ;e11 = bet02 + ( bet01−bet02 )∗Pk1 + bet2∗x1 +Pk1∗( bet1+bet3∗x1 )∗ (gamma0+gamma1∗x1 ) ;e01 = bet02 + ( bet01−bet02 )∗Pk1 + bet2∗x0 +Pk1∗( bet1+bet3∗x0 )∗ (gamma0+gamma1∗x1 ) ;

86

e10 = bet02 + ( bet01−bet02 )∗Pk0 + bet2∗x1 +Pk0∗( bet1+bet3∗x1 )∗ (gamma0+gamma1∗x0 ) ;

TNIE = e11−e10 ;PNDE = e10−e00 ;PNIE = e01−e00 ;TNID = e11−e01 ;TE = e11−e00 ;CONTROL = TNIE+PNDE;

87

Download - Causal eﬀects in mediation analysis with limited-dependent ...940585/FULLTEXT01.pdfthe potential outcome framework (Rubin, 1974) was suggested by Robins and Greenland (1992) and

Top Related