reading notes of causal inference - ustc

54
Reading Notes of Causal Inference Chen Xu February 24, 2018 Contents I Causal inference without models 5 1 Selection Bias 5 1.1 The structure of selection bias ...................... 5 1.2 Example of selection bias ......................... 5 1.3 Selection bias and confounding ...................... 5 1.4 Selection bias and identifiability of causal effects ............ 6 1.5 How to adjust for selection bias ..................... 6 1.6 Selection without bias ........................... 7 2 Measurement Bias 7 2.1 Measurement error ............................ 7 2.2 The structure of measurement error ................... 8 2.3 Mismeasured confounders ........................ 8 2.4 Adherence to treatment in randomized experiment .......... 8 2.5 The intention-to-treat effect and the per-protocol effect ........ 8 3 Random Variability 9 3.1 Identification versus estimation ..................... 9 3.2 Estimation of causal inference ...................... 10 3.3 The myth of the super-population .................... 10 3.4 The conditionality ”principle” ...................... 11 3.5 The curse of dimensionality ........................ 12 1

Upload: others

Post on 08-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reading Notes of Causal Inference - USTC

Reading Notes of Causal Inference

Chen Xu

February 24, 2018

Contents

I Causal inference withoutmodels 5

1 Selection Bias 51.1 The structure of selection bias . . . . . . . . . . . . . . . . . . . . . . 51.2 Example of selection bias . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Selection bias and confounding . . . . . . . . . . . . . . . . . . . . . . 51.4 Selection bias and identifiability of causal effects . . . . . . . . . . . . 61.5 How to adjust for selection bias . . . . . . . . . . . . . . . . . . . . . 61.6 Selection without bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Measurement Bias 72.1 Measurement error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 The structure of measurement error . . . . . . . . . . . . . . . . . . . 82.3 Mismeasured confounders . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Adherence to treatment in randomized experiment . . . . . . . . . . 82.5 The intention-to-treat effect and the per-protocol effect . . . . . . . . 8

3 RandomVariability 93.1 Identification versus estimation . . . . . . . . . . . . . . . . . . . . . 93.2 Estimation of causal inference . . . . . . . . . . . . . . . . . . . . . . 103.3 The myth of the super-population . . . . . . . . . . . . . . . . . . . . 103.4 The conditionality ”principle” . . . . . . . . . . . . . . . . . . . . . . 113.5 The curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 12

1

Page 2: Reading Notes of Causal Inference - USTC

II Causal inference withmodels 14

4 WhyModel? 144.1 Data cannot speak for themselves . . . . . . . . . . . . . . . . . . . . 144.2 Parametric estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Nonparameter estimators . . . . . . . . . . . . . . . . . . . . . . . . . 144.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 The bias-variance trade-off . . . . . . . . . . . . . . . . . . . . . . . . 14

5 IPweighting andmarginal structural models 155.1 The causal question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Estimating IP weights via modeling . . . . . . . . . . . . . . . . . . . 155.3 Stabilized IP weighting . . . . . . . . . . . . . . . . . . . . . . . . . . 155.4 Marginal structural models . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Effect modification and marginal structural models . . . . . . . . . . . 165.6 Censoring and missing data . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Standardization and the parametric g-formula 176.1 Standardization as an alternative to IP weighting . . . . . . . . . . . . 176.2 Estimating the mean outcome via modeling . . . . . . . . . . . . . . . 186.3 Standardizing the mean outcome to the confounder distribution . . . 196.4 IP weighting or standardization . . . . . . . . . . . . . . . . . . . . . 196.5 How seriously do we take our estimates? . . . . . . . . . . . . . . . . . 19

7 G-estimation of structural nestedmodels 207.1 The causal question revisited . . . . . . . . . . . . . . . . . . . . . . . 207.2 Exchangeability revisited . . . . . . . . . . . . . . . . . . . . . . . . . 207.3 Structural nested mean model . . . . . . . . . . . . . . . . . . . . . . 217.4 Rank preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.5 G-estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.6 Structural nested models with two or more parameters . . . . . . . . . 23

8 Outcome regression and propensity scores 238.1 Outcome regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248.2 Propensity scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248.3 Propensity stratification and standardization . . . . . . . . . . . . . . 248.4 Propensity matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258.5 Propensity models, structural models, predictive models . . . . . . . . 25

2

Page 3: Reading Notes of Causal Inference - USTC

9 Instrumental variable estimation 269.1 The three instrumental conditions . . . . . . . . . . . . . . . . . . . . 269.2 The usual IV estimand . . . . . . . . . . . . . . . . . . . . . . . . . . 269.3 A fourth identifying condition: homogeneity . . . . . . . . . . . . . . 279.4 An alternative fourth condition: monotonicity . . . . . . . . . . . . . 279.5 The three instrumental condition revisited . . . . . . . . . . . . . . . 299.6 Instrumental variable estimation versus other methods . . . . . . . . 30

10 Causal survival analysis 3010.1 Hazards and risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3010.2 From hazards to risks . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

III Causal inference from complex longitudinal data 34

A Fine Points 35A.1 8.1 Selection bias in case-control studies . . . . . . . . . . . . . . . . . 35A.2 8.2 The strength and direction of selection bias . . . . . . . . . . . . . 35A.3 9.1 The strength and direction of measurement bias . . . . . . . . . . 35A.4 9.2 Pseudo-intention-to-treat analysis . . . . . . . . . . . . . . . . . . 36A.5 9.3 Effectiveness versus efficacy . . . . . . . . . . . . . . . . . . . . . . 36A.6 10.1Honest confidence interval . . . . . . . . . . . . . . . . . . . . . 36A.7 10.2 Quantitative bias analysis . . . . . . . . . . . . . . . . . . . . . . 37A.8 11.1 Model dimensionality and the relation between frequentist and

Bayesian intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.9 12.1 Setting a bad example . . . . . . . . . . . . . . . . . . . . . . . . 37A.10 12.2 Checking positivity . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.11 13.1 Structural positivity . . . . . . . . . . . . . . . . . . . . . . . . . 38A.12 14.2 Sensitivity analysis for unmeasured confounding . . . . . . . . . . 38A.13 15.1Nuisance parameter . . . . . . . . . . . . . . . . . . . . . . . . . 39A.14 15.2 Effect modification and the propensity score . . . . . . . . . . . . 39A.15 16.1 Candidate instruments in observational studies . . . . . . . . . . 39A.16 17.1 Competing events . . . . . . . . . . . . . . . . . . . . . . . . . . 40A.17 17.2Models for survival analysis . . . . . . . . . . . . . . . . . . . . . 40

B Technical Points 40B.1 8.1 The built-in selection bias of hazard ratio . . . . . . . . . . . . . . 40B.2 8.2Multiplicative survival model . . . . . . . . . . . . . . . . . . . . . 41B.3 9.1 Independence and nondifferentiality . . . . . . . . . . . . . . . . . 41

3

Page 4: Reading Notes of Causal Inference - USTC

B.4 9.2 The exclusion restriction . . . . . . . . . . . . . . . . . . . . . . . 41B.5 10.1 Bias and consistency in statistical inference . . . . . . . . . . . . 41B.6 10.2 A formal statement of the conditionality principle . . . . . . . . 42B.7 10.3 Comparison between adjusted and unadjusted estimators . . . . . 42B.8 11.1 A taxonomy of commonly used models . . . . . . . . . . . . . . . 42B.9 12.1Horvitz-Thompson estimators . . . . . . . . . . . . . . . . . . . 44B.10 12.2More on stabilized weights . . . . . . . . . . . . . . . . . . . . . . 44B.11 13.1 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45B.12 13.2 Doubly robust methods . . . . . . . . . . . . . . . . . . . . . . . 45B.13 14.1Relation betweenmarginal structural models and structural nested

models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45B.14 14.1Multiplicative structural nested mean models . . . . . . . . . . . 46B.15 14.2 G-estimation of structural nested mean models . . . . . . . . . . 46B.16 15.1 Balancing scores and prognostic scores . . . . . . . . . . . . . . . 47B.17 16.1 The instrumental conditions, formally . . . . . . . . . . . . . . . 47B.18 16.2 Bounds: Partial identification of causal effects . . . . . . . . . . . 47B.19 16.3 Additive structural mean models and IV estimation . . . . . . . . 48B.20 16.4Multiplicative structural mean models and IV estimation . . . . 49B.21 16.5More general structural mean models . . . . . . . . . . . . . . . . 49B.22 16.6Monotonicity and the effect in the compliers . . . . . . . . . . . 50

C Newwords 51

D Questions 51

E Possible Print errors 53

4

Page 5: Reading Notes of Causal Inference - USTC

Part I

Causal inferencewithoutmodels

1 Selection Bias

1.1 The structure of selection bias

1.2 Example of selection bias

Arising from controlling on common effects.

• retrospective studies: data on treatment A are controlled after the outcomeY occurs.

• prospective studies: data on treatment A are controlled before the outcomeY occurs.

Confounding only occurs in observational studies.

Selection biasmay occur both in observational studies and in randomized experiments,while arising in randomized experiments when selection occurs after the randomiza-tion.

1.3 Selection bias and confounding

Two reasons why the treated and untreated are not exchangeable.

1) the presence of common causes of treatment and outcome.

2) conditioning on common effects of treatment and outcome.

Although on Page 100, the first paragraph set a example that a structural classificationof bias does not have consequences for the analysis of a study.

However, there are advantages of adopting a structural approach to the classificationof sources of exchangeability

• First: guiding the choice of analytical methods to reduce or avoid the bias.

5

Page 6: Reading Notes of Causal Inference - USTC

• Second: helping study design.• Third: selection bias resulting from conditioning on pre-treatment variablescould explain why certain variables behave as ”confounders” in some studies butnot others.

• Fourth: decreasing the occurrence of misunderstanding.

For selection on pre-treatment factors, the choice if terminology usually has no practi-cal consequences. However, disregard for the causal structure when there is selectionon post-treatment factors may lead to a apparent paradoxes like the so-called Simp-son’s paradox.

1.4 Selection bias and identifiability of causal effects

There might be some others factors affect the causal effects, such as lost to follow-up,or censored.

Thus, we need to adjust for confounding for the effect of treatment C.

1.5 How to adjust for selection bias

• When both confounding and selection bias exist, the product weight WAWC

can be used to adjust simultaneously for both biases under assumption describedin Chapter 12 and Part III.

The association measure in pseudo-population equals the effects measure in the orig-inal population if the following three identifiability conditions are met.

• 1) exchangeability: the probability of selection is calculated conditional ontreatment A and on all additional factors that independently predict both selec-tion and the outcome. That is, if the variables in A and L are sufficient to blockall backdoor paths between C and Y.

• 2) positivity: all conditional probabilities of being uncensored given the vari-ables in L must be greater than zero.

• 3) consistency: well-defined interventions, may be relatively well defined whencensoring is the result of loss to follow up or nonresponse, but not when censor-ing is the result of competing events.

6

Page 7: Reading Notes of Causal Inference - USTC

A competing event is an event that prevents the outcome of interest from happen-ing. Such as death.

IP weighting method is not necessary to adjust for selection bias. We could find thatstratification can also work for Figure 8.3.

Meanwhile, there are some situations in which stratification cannot be used to validlycompute the causal effect of treatment, even if the three conditions of exchangeability,positivity, and consistency hold, such as Figure 8.4-8.6. Wewill discuss other situationswith a similar structure in Part III when estimating direct effects and the effect oftime-varying treatments.

1.6 Selectionwithout bias

Special situation under which A and E remain conditionally independent within theother stratum. (A and E are treatments, Y is a collider).

It is theoretically possible that selection on a common effect does not result in selec-tion bias when the analysis is restricted to a single level of the common cause.

• Figure 8.13 A and E are independent.• Figure 8.14-8.15 A and E are dependent.

2 Measurement Bias

Measurement bias can happen both in randomized experiments and observationalstudies.

2.1 Measurement error

UA–measurement error for A which was omitted in Chapter 7 and Chapter 8.

In the presence of measurement bias, the identifiability conditions of exchangeability,positivity and consistency are insufficient to compute the causal effect of treatment Aon outcome Y.

7

Page 8: Reading Notes of Causal Inference - USTC

2.2 The structure ofmeasurement error

Two properties: independence and nondifferentiality

Note: A represents the treatment, Y represents the outcome.

• If the path betweenUA andUY are blocked, they are independent, and vice verse.• If treatment UA is independent of the true value of the outcome Y, and the errorfor the outcome UY is independent of the true value of treatment. We say thatthe measurement error for the treatment is nondifferential with respect to thetreatment.

recall bias: recall may be affected by the outcome.

reverse causation bias: replace the accurate value of treatment and outcome by themeasured value in recall bias.

The particular structure of the measurement error determines the methods that canbe used to correct it. In general, methods for measurement error correction rely ona combination of modeling assumptions and validation samples.The best way to fightbias due to mismeasurement is to improve the measurement procedures.

2.3 Mismeasured confounders

The particular choice of terminology–unmeasured confounding versus bias due tomis-measurement of the confounders–is irrelevant for particular purposes.

2.4 Adherence to treatment in randomized experiment

Noncompliance and adherenceDouble-blind placebo-controlled randomized experimentsometimes is unfeasible and unrealistic.

2.5 The intention-to-treateffectandtheper-protocoleffect

Per-protocol effect: the causal effect of treatment if all individuals had adhered totheir assigned treatment as indicated the protocol of the randomized experiment.

8

Page 9: Reading Notes of Causal Inference - USTC

Intention-to-treateffect(ITTeffect): the causal effect of randomized assignmentZ.

In general, the pre-protocol effect cannot be validly estimated via a naive ”per-protocol”analysis.

In general, the intention-to-treat effect can be validly estimated via a naive ”intention-to-treat analysis.

ITT effect is usually the primary because ITT effect can be interpreted as a lowerbound for the pre-protocol effect.

Three Problems

• First, the answer assumes monotonicity of effects. If not, the ITT effect wouldbe anti-conservative.

• Second, ITT effect makes it a dangerous effect measure when the goal is evalu-ating a treatment’s safety.

• Third, the magnitude of adherence is different.

Computing the pre-protocol effect requires adjustment for confounding under theassumption of exchangeability conditional on the measured covariates, or via instru-mental variable estimation (a particular case of g-estimation, see Chapter 16) underalternative assumptions.

In summary, in the analysis of randomized experiments there is trade-off between biasdue to potential unmeasured confounding–when choosing the per-protocol effect–andmisclassification bias–when choosing the ITT effect.

3 RandomVariability

Systematic bias and random variability

3.1 Identification versus estimation

Identification problems in which we can assume the size of the study populationis effectively infinite.

9

Page 10: Reading Notes of Causal Inference - USTC

An estimator is a rule that takes the data from any sample from super-populationand produce a numerical value for the estimand.

Calibrated: if the estimand contained in the interval in 95% of the random sam-ples.

Conservative: if the estimand is contained in more than 95% of samples, and anti-conservative if contained in less than 95%.

We will say that a confidence interval is valid if it is either calibrated or conserva-tive.

Confidence interval versus Credible interval

A small-sample valid (conservative or calibrated) confidence interval is one that is validat all sample sizes for which it is defined. Sometimes called exact confidence inter-val.

A large-sample valid confidence interval is one that is valid only in large samples.

3.2 Estimation of causal inference

Two possible targets

• First, randomization-based inference inwhich investigatorsmay be agnostic aboutthe existence of super-population and restriction their inference to the samplethat was actually randomized.

• Second, investigatorsmay still be interested inmaking inferences about the super-population from which the study sample was randomly drawn. That is, random-ization followed by sampling is equivalent to sampling followed by randomiza-tion.

In many cases we are not interested in the first target.

3.3 Themyth of the super-population

Two scenarios under p has a binomial sampling distribution.

• Scenario 1: The study population is sampled at random from an essentially infinitesuper-population.

10

Page 11: Reading Notes of Causal Inference - USTC

• Scenario 2:Not sampled from any super-population.Rather

(1) each subject i among the 13 treated subjects has an individual nondetermin-istic (stochastic) counterfactual probability pa=1

i .

(2) the observed outcome Yi = Y a=1i for subject i occurs with probability pa=1

i .

(3) pa=1i takes the same value, say p, for each of the 13 treated subjects.

Scenario 2 is untenable. Investigator must be implicitly assuming Scenario 1

3.4 The conditionality ”principle”

The result of Table 10.1 and Table 10.2 are disturbing to the investigators.

One investigator says that they should not retract the article. He think it is likelythat imbalances on other unmeasured factors U cancelled out the effect of the chanceimbalance on L, so that the unadjusted estimator is still the closer to the true value inthe super-population.

Another investigator argues that we should adjust for L because the strong associationbetween L andA introduces confounding in our effect estimate. Within the level of L,we have mini randomized trials and the confidence intervals around the correspondingpoint estimates will reflect the uncertainty due to the possible U − A associationsconditional on L.

As a consequence, the adjusted estimate of the treatment effect is unbiased but theunadjusted estimate is greatly biased when averaged over these runs. Unconditionally–over all the runs of the experiment–both the unadjusted and adjusted estimates are un-biased but the variance of the adjusted estimate is smaller than that of the unadjustedestimate. That is, the adjusted estimator is both conditional unbiased and uncondi-tionally more efficient. Hence the second investigator is correct.

The idea that one should condition on the observed L − A association is an exam-ple of what referred to in the statistical literature as the conditionality principle, whichstates that inference on a parameter should be performed conditional on all ancillarystatistics.

From now on, we will say that an estimator is unbiased if and only if it can center avalid Wald interval conditional on all ancillary statistics.

11

Page 12: Reading Notes of Causal Inference - USTC

When the number ofmeasured variables is larger however, following the conditionalityprinciple is no longer a wise strategy.

3.5 The curse of dimensionality

high dimensionality Finding a solution to the curse of dimensionality is not straightfor-ward. One approach is to reduce the dimensionality of the data by excluding somevariables from the analysis. Many procedures to eliminate variables from the analysisare ad hoc.

The statistical theory to provide correct (honest) confidence intervals for high-dimensionaldata is still under development.

12

Page 13: Reading Notes of Causal Inference - USTC

Options for conditioning on potential confounders

• 1. Restriction – restrict the population to a single stratum of the potential con-founder.

• 2. Matching – enforce by design similar levels of the confounder between ex-posed and unexposed, or between diseased and nondiseased.

• 3. Adjustment – commonly used in regression modeling to statistically “holdconstant” the level of the confounder while looking at another association.

• 4. Weighting – use weighting schemes such as standardization or inverse proba-bility weighting.

13

Page 14: Reading Notes of Causal Inference - USTC

Part II

Causal inferencewithmodels

4 WhyModel?

4.1 Data cannot speak for themselves

4.2 Parametric estimators

4.3 Nonparameter estimators

”Models” which do not impose restrictions a saturatedmodelswhich have the same num-ber of unknowns in both sides of the equal sign.

When a model has only a few parameters but it is used to estimate many populationquantities, we say that the model is parsimonious

4.4 Smoothing

The fewer parameters in the model, the smoother the prediction (response) surfacewill be.

4.5 The bias-variance trade-off

Although less smooth models may yield a less biased estimate, they also result in alarger variance.

The bias-variance trade-off is at the heart of all data analysis.

14

Page 15: Reading Notes of Causal Inference - USTC

5 IPweighting andmarginal structural models

Allowing us to tackle high-dimensional problems with many covariates and nondi-chotomous treatments.

5.1 The causal question

5.2 Estimating IPweights viamodeling

The weighted least squares estimates θ0 and θ1 with weight W of θ0 and θ1 are theminimizers of ∑

i

Wi[Yi − θ0 + θ1Ai]2. (1)

The estimate [Y |A = a] = θ0a is equal to∑

i=1 YiWi∑i=1 Wi

where the sum is over all subjectswith A = a.

To obtain a 95% confidence interval around the point estimate θ1 = 3.4 we need amethod that takes IP weighting into account.

• 1. Using statistical theory to derive the corresponding variance estimator, whichis not always available in standard statistical software.

• 2. Approximating the variance by nonparametric bootstrapping. This approachrequires appropriate computing resources, or lots of patience, for large databases.

• 3. Using the robust variance estimator that is a standard option inmost statisticalsoftware packages.

5.3 Stabilized IPweighting

The IP weightsWA = 1/f(A|L) are referred to as nonstabilized weights.

The IP weights SWA = f(A)/f(A|L) are referred to as stabilized weights.

Stabilized weights typically result in narrower 95% confidence intervals than nonsta-bilized weights. However, the statistical superiority of the stabilized weights can onlyoccur when the (IP weighted) model is not saturated.

15

Page 16: Reading Notes of Causal Inference - USTC

5.4 Marginal structural models

Linear model for the average outcome under treatment level a.

E[Y a] = β0 + β1a. (2)

The outcome variable of this model is counterfactual—-and hence generally unob-served. Therefore the model cannot be fit to the data of any real-world study. Whenthe structural mean model does not include any covariates we refer to it as an uncon-ditional or marginal structural mean model.

5.5 Effectmodification andmarginal structural models

Marginal structural models do not include covariates when the target parameter isthe average causal effect in the population. However, one may include covariates in amarginal structural model to assess effect modification.

For example, we add the covariates V to our marginal structural mean model:

E[Y a|A, V ] = β0 + β1a+ β2V a+ β3V. (3)

The parameter β3 does not generally have a causal interpretation as the effect of V .Remember that we are assuming exchangeability, positivity, and consistency for treat-ment A, not for sex V .

We refer to a marginal structural model that conditions on all variables L needed forexchangeability as a faux marginal structural model.

Effect modification and confounding are two logically distinct concepts.

Methods for confounder adjustment (IP weighting) are distinct from methods for de-tection of effect modification (adding treatment-covariate product terms to a marginalstructural model).

If we were interested in the interaction between 2 treatments A and B, we wouldinclude parameters for both A and B in the marginal structural model, and wouldestimate IP weights with the joint probability of both treatments in the denominator.We would assume exchangeability, positivity, and consistency for A and B.

16

Page 17: Reading Notes of Causal Inference - USTC

5.6 Censoring andmissing data

The causal effect can be estimated by using IP weightsWA,C = WA ×WC under ex-changeability for the joint treatment (A,C) conditional onL, that is, Y a=1,c=0Π(A,C)|L.If some of the variables in L are affected by treatment A so that conditional indepen-dence will not generally hold.

Alternatively, one can use stabilized IP weights SWA,C = SWA × SWC . The cen-soring weights SWC = Pr[C = 0|A]/Pr[C = 0|L,A] create a pseudo-population ofthe same size as the original study population after censoring, and in which there isno arrows from L into C. The stabilized weights do not eliminate censoring in thepseudo-population, they make censoring occur at random with respect to the mea-sured covariates L. Therefore, under the assumption of conditional exchangeabilityof conditional exchangeability of censored and uncensored individuals given L(andA),the proportion of censored individuals in the pseudo-population is identical to that inthe study population. That is, there is selection but no selection bias.

The IP weights for censoring and treatment are WA,C = 1/f(A,C = 0|L), wherethe joint density of A and C is factored as f(A,C = 0|L) = f(A|L) × Pr[C =0|L,A].

Some variables in Lmay have zero coefficients in the model for f(A|L) but not in themodel for Pr[C = 0|L,A], or vice versa. Nonetheless, in large samples, it is alwaysmore efficient to keep all variables L that independently predict the outcome in bothmodels.

6 Standardizationandtheparametricg-formula

6.1 Standardization as an alternative to IPweighting

As in the previous chapter, we will assume that the components of L required to ad-justed for C are unaffected by A. Otherwise, we would need to use the more generalapproach described in Part III.

Under exchangeability and positivity conditional on the variables in L, the standard-ized mean outcome in the uncensored treated is a consistent estimator of the meanoutcome of the mean outcome if everyone had been treated and had remained uncen-sored E[Y a=1,c=0].

17

Page 18: Reading Notes of Causal Inference - USTC

Analogously, the standardized mean outcome in the uncensored untreated is a consis-tent estimator of the mean outcome if everyone had been untreated and had remaineduncensored E[Y a=1,c=0].

The conditional mean from the stratum with the greatest number of individuals hasthe greatest weight in the computation of the standardized mean. The standardizedmean in the uncensored untreated is computed analogously except that the A = 1 inthe conditioning event is replaced by A = 0.

The standardized mean in the uncensored who received treatment level a is∑l

E[Y |A = a, C = 0, L = l]× Pr[L = l]. (4)

When some of the variables in L are continuous, one needs to replace Pr[L = l] bythe probability density function (PDF ) fL(l), and the above sum becomes an inte-gral.

6.2 Estimating themean outcome viamodeling

In general, the standardized mean of Y is written as∫E[Y |A = a, C = 0, L = l]dFL(l). (5)

• We fit a linear regression model for the mean weight gain with treatment A andall 9 confounders in L included as covariates.

• Standardizing these means to the distribution of the confounder L for all valuesl.

Our model restricts the possible values of E[Y |A = a, C = 0, L = l] such that theconditional relation between the continuous covariates and the mean outcome can berepresented by a parabolic curve.

Our model imposes the restriction that each covariate’s contribution to the mean isindependent of that of the other covariates, except that the contribution of smokingcessation A varies linearly with intensity of priori smoking.

18

Page 19: Reading Notes of Causal Inference - USTC

6.3 Standardizing the mean outcome to the confounder dis-tribution

Estimating the standardized means∑

lE[Y |A = a, L = l] × Pr[L = l] in the treated(A = 1) and in the untreated (A = 0).

• expansion of dataset,• outcome modeling,• prediction,• standardization by averaging.

6.4 IPweighting or standardization

Fitting a model for treatment A—-IP weighting.

Fitting a model for treatment Y—-standardization.

In neither case we fit a model for the confounders L, as we did not need the distribu-tion of the confounders to obtain to the IP weighted estimate, and we just used theempirical distribution of L to compute the standardized estimate.

Computing the standardizedmean outcomewith parametrically estimated conditionalmeans is a particular case of the parametric g-formula.

G-computationalgorithmformula: the generalization of standardization to time-varying treatments and confounders.

6.5 How seriously dowe take our estimates?

The validity of our estimates for the target population requires many conditions.

• First, the identifiability conditions of exchangeability, positivity, andwell-definedinterventions need to hold for the observational study to resemble a randomizedexperiment.

• Second, all variables used in the analysis need to be correctly measured.• Third, all models used in the analysis need to be correctly specified.

The validity of our causal inferences requires the following conditons.

19

Page 20: Reading Notes of Causal Inference - USTC

• exchangeability• positivity• consistency• no measurement error• no model misspecification

7 G-estimation of structural nestedmodels

IP weighting, the g-formula, and g-estimation are often collectively referred to as g-method because they are designed for application to generalized treatment contrastsinvolving treatments that varying over time.

Describing g-estimation is facilitated by the specification of a structural model, evenif the model is saturated. Model whose parameters are estimated via g-estimation areknown as structural nested models. The three g-methods are based on different modelingassumptions.

7.1 The causal question revisited

Investigators could partition the study population into mutually exclusive subsets ornon-overlapping strata, each of them defined by a particular combination of valuel of the variables in L, and then estimate the average causal effect in each of thestrata.

Adding all variables L, together with product terms between each component ofL andtreatment A, to the marginal structural model. Then the stabilized weights SWA(L)equal 1 and no IP weighting is necessary because the (unweighted) outcome regressionmodel, if correctly specified, fully adjusted for all confounding by L.

7.2 Exchangeability revisited

Pr[A = 1|Y a=0, L] = Pr[A = 1|L] (6)

20

Page 21: Reading Notes of Causal Inference - USTC

which is equivalent definition of conditional exchangeability for a binary treatmentA.

Specifically, parametric logistic model for the probability of treatment

logitPr[A = 1|Y a=0, L] = α0 + α1Ya=0 + α2L (7)

where α2 is a vector of parameters, one for each component of L. If L has p compo-nents L1, . . . , Lp then α2L =

∑pj=1 α2jLj .

Half of g-estimation

The expected value of the estimate of α1 is zero because Y a=0 does not predict Aconditional on L.

7.3 Structural nestedmeanmodel

Under exchangeability, the structural model can be written as

E[Y a − Y a=0|A = a, L] = β1a+ β2aL (8)

which is referred to as a structural nestedmeanmodel.

Comparing to the former chapters, structural nested models are semiparametric be-cause they are agnostic about both the intercept and the main effect of L–that is,there is no parameter β0 and no term β3L. Structural nested models make fewer as-sumptions and can be more robust to model misspecification than the parametric g-formula.

IPweighting and standardization can be used to adjust for these two biases. G-estimationon the other hand, can only be used to adjust for confounding, not selection bias.

7.4 Rank preservation

We could rank everyone according to Y a=1 and also according to Y a=0. We wouldthen have two lists of individuals ordered from larger to smaller of the correspondingcounterfactual outcome. If both lists are in identical order we say that there is rankpreservation.

21

Page 22: Reading Notes of Causal Inference - USTC

When the effect of treatmentA on the outcome Y is exactly the same, on the additivescale, for all individuals in the study population, we say that additive rank preser-vation holds. This conditional additive rank preservation holds if the effect ofthe treatmentA on the outcome Y is exactly the same for all individuals with the samevalues of L.

Example

Y ai − Y a=0

i = ϕ1a+ ϕ2aLi (9)

where ϕ1+ϕ2l is the constant causal effect for all individuals with the covariate valuesL = l.

The g-estimation procedure is actually the same for rank-perserving and non rank-perserving models.

Rank-preserving structural model is a structural mean model—-the mean of the indi-vidual shifts from Y a=0 to Y a=1 is equal to each of the individual shifts within levelsof L.

7.5 G-estimation

Suppose the goal is estimating the parameters of the structural nested mean modelE[Y a − Y a=0|A = a, L] = β1a.

Assuming that the additive rank-preserving model Y ai − Y a=0

i = ϕ1a is correctly spec-ified for all individuals i.Then the individual causal effect ϕ1 is equal to the averagecausal effect βi in which we are interested. We write the rank-preserving model asY a − Y a=0 = ϕ1a.

Y a=0 = Y a − ϕ1a (10)

• Linking the model to the observed data.

Y a=0 = Y − ϕ1a (11)

• If the model were correct and we knew the value of ϕ1 then we could calculatethe counterfactual outcome under no treatment Y a=0 for each individual in thestudy population. But we don’ t know ϕ1.

22

Page 23: Reading Notes of Causal Inference - USTC

• Estimating the value of ϕ1. Just testing the value by trying.

Important : G-estimation does not test whether conditional exchangeabilityholds; it assumes that conditional exchangeability holds.

We calculated the P-value from a Wald test. Any other valid test may be used.

Regardless of whether the conditional additive rank preservation holds, the validityof the g-estimation algorithm does not actually require that H(β1) = Y a=1 for allsubjects, where H(β1) and Y a=0 have the same conditional mean given L.

7.6 Structuralnestedmodelswithtwoormoreparameters

Marginal structural models that do not condition on V estimate the average causaleffect in the population, whereas those that condition on V estimate the average causaleffect within levels of V . Structural nested models estimate, by definition, the averagecausal population. Omitting product terms in structural nested models when there iseffect modification will generally lead to bias due to model misspecification.

E[Y a − Y a=0|A = a, L] = β1a+ β2aV (12)Y ai − Y a=0

i = ϕ1a+ ϕ2aV (13)For example, we could fit the logistic model

logitPr[A = 1|H(ϕdeg), L] = α0 + α1H(ϕdeg) + α2H(ϕdeg)V + α3L. (14)In fact, structural nested models of any type have rarely been used, partly because ofthe lack of user-friendly software and partly because the extension of these models tosurvival analysis require some additional considerations.

• Structural nested models preserves the null too. In contrast, although the g-formula preserves the null for the time-fixed treatments, it loses this propertyin the time-varying setting.

• Structural nested models with multiple parameters may not be necessary forfixed treatments A, but not true for the time-varying treatments.

8 Outcome regression and propensity scores

Outcome regression and propensity scores are not designed to handle the complexitiesassociated with causal inference for time-varying treatments.

23

Page 24: Reading Notes of Causal Inference - USTC

In Part III we will again discuss IP weighting, the g-formula, and g-estimation but willsay less about conventional outcome regression and propensity score methods.

8.1 Outcome regression

If we were willing to specify theL−Y association within levels ofA, we would considerthe structural model

E[Y a,c=0|L] = β0 + β1a+ β2aL+ β3L (15)

E[Y a,c=0|L = l], are equal to the corresponding mean outcomes in the uncensoredtreated, E[Y |A = 1, C = 0, L = l], under exchangeability, positivity, and well-definedinterventions.

Therefore the parameters of the above structural model can be estimated via ordinaryleast squares by fitting the outcome regression model

E[Y |A = 1, C = 0, L = l] = α0 + α1a+ α2aL+ α3L (16)

8.2 Propensity scores

In the superpopulation, the propensity score balances the covariates between the treatedand untreated. However, in the study population, due to sampling variability, thepropensity score only approximately ”balances” the covariates L.

Key result:

• Exchangeability of the treated and the untreated within levels of the covariatesL implies exchangeability within the levels of the propensity score p(L). Thatis, conditional exchangeability Y a ⊥ A|L implies Y a ⊥ A|p(L).

• Positivity within levels of the propensity score p(L)–which means that no indi-vidual has a propensity score equal to either 1 or 0–holds if and only if positivitywithin levels of the covariates L holds.

8.3 Propensity stratification and standardization

Creating strata that contain individuals with similar, but not identical, value of p(L).The deciles of the estimated p(L) is a popular choice.

24

Page 25: Reading Notes of Causal Inference - USTC

When our parametric assumptions forE[Y |A,C = 0, p(L)] are correct, plus exchange-ability and positivity hold, the model estimates the average causal effects within alllevel of s of the propensity score E[Y a=1,c=0|p(L) = s]− E[Y a=0,c=0|p(L) = s].

If wewere interested in the average causal effect in the entire study populationE[Y a=1,c=0]−E[Y a=0,c=0], we would standardize the conditional means E[Y |A,C = 0, p(L)] by us-ing the distribution of the propensity score. The procedure is the same one describedin Chapter 13 for continuous variables, except that we replace the variables L by theestimated p(L).

8.4 Propensitymatching

Attempting to form a matched population in which the treated and the untreated areexchangeable because they have the same distribution of p(L).

A common approach is to match treated individuals with a value s of the estimatedp(L) with untreated individuals who have a value s± 0.05, or some other small differ-ence.

8.5 Propensitymodels, structuralmodels,predictivemodels

Two different types of models for causal inference:

• Propensity models are models for the probability of treatment. A given the vari-ablesL used to try to achieve conditional exchangeability. We have used propen-sity models for matching, stratification, IP weighting and g-estimation.

The parameters of propensity models are nuisance parameters without a causalinterpretation because a variable L and treatment Amay be associated for manyreasons–not only because the variable L causes A.

• Structural models describe the relation between the treatmentA and some com-ponent of the distribution of the counterfactual outcome Y a, either marginallyor within levels of the variables L. For continuous treatment, a structural modelis often referred to as a dose-response model. The parameters for treatment instructural models are not nuisance parameters: they have a direct causal inter-pretation as outcome differences under different treatment values a.

Propensity models do not need to predict treatment very well. They just need toinclude the variables L that guarantee exchangeability.

25

Page 26: Reading Notes of Causal Inference - USTC

All causal inferencemethods based onmodels–propensitymodels and structuralmodels–require no misspecification of the functional form of the covariates. To reduce theprobability of model misspecification, we use flexible specifications.

9 Instrumental variable estimation

The causal inference methods described so far in this book rely on a key untestableassumption: all variables needed to adjust for confounding and selection bias havebeen identified and correctly measured.

9.1 The three instrumental conditions

Unformal:

• (1) Z is associated with A.• (2) Z does not affect Y except through its potential effect on A.• (3) Z and Y do not share causes.

9.2 The usual IV estimand

We will focus on dichotomous instruments, the average causal effect of treatment onthe additive scale E[Y a=1]− E[Y a=0] is identified and equals

E[Y |Z = 1]− E[Y |Z = 0]

E[A|Z = 1]− E[A|Z = 0](17)

which is the usual IV estimand for a dichotomous instrument.

For a continuous instrument Z, the usual IV estimand is Cov(Y,Z)Cov(A,Z)

.

• The numerator of the IV estimand–demominator–the average causal effect of Zon Y –id the intention to treat effect.

• The denominator–the average causal effect ofZ onA–is ameasure of compliancewith the assigned treatment.

Two-stage-least-squares estimator

26

Page 27: Reading Notes of Causal Inference - USTC

• First, fit the first-stage treatment model E[A|Z] = α0 + α1Z, and generate thepredicted values E[A|Z] for each subject.

• Second, fit the second-stage outcome model E[Y |Z] = β0 + β1E[A|Z].

The parameter estimate β1 will always be numerically equivalent to the standard IVestimate.

A commonly used rule of thumb is to declare an instrument as weak if the F-statisticfrom the first-stage model is less than 10.

The trade-offs involved in the choice between two-stage-least-squares linear modelsand structural mean models are similar to those involved in the choice between out-come regression and structural nested mean models for non-IV estimation.

9.3 A fourth identifying condition: homogeneity

Effect homogeneity

• Requiring the effect of treatment A on outcome Y to be constant across indi-viduals.

• Requiring the equality of the average causal effect within levels of Z in both thetreated and in the untreated, i.e., E[Y a=1 − Y a=0|Z = 1, A = a] = E[Y a=1 −Y a=0|Z = 0, A = a] for a = 0, 1.

Two approaches that bypass the homogeneity conditions:

• Introduction of baseline covariates in the models for IV estimation.• Next section.

9.4 An alternative fourth condition: monotonicity

Four disjoint subpopulation often referred as compliance types or principal strata:

• Always-takers: Az=1 = 1 and Az=0 = 1.• Never-takers: Az=1 = 0 and Az=0 = 0.• Compliers or cooperative: Az=1 = 1 and Az=0 = 0.• Defiers or contrarians: Az=1 = 0 and Az=0 = 1.

27

Page 28: Reading Notes of Causal Inference - USTC

When no defiers exist, we say there is monotonicity. Monotonicity holds whenAz=1 ≥Az=0.

Under monotonicity, the usual IV estimand equals the average causal effect of treat-ment in the compliers, that is

E[Y a=1 − Y a=0|Az=1 = 1, Az=0 = 0]. (18)

Under monotonicity, the usual IV estimand is the effect in the compliers.

Sketch of the proof—the equality between the usual IV estimand and the effect in thecompliers.

The effect of assignment Z on Y—-the numerator of the IV estimand—-is a weightedaverage of the effect of Z in each of the four principal strata. However, the effect ofZ on Y is exactly zero in always-takers because the effect of Z is entirely mediatedthrough A and the value of A in those no defiers exist under monotonicity. Thereforethe numerator of the IV estimand is the effect of Z on Y in the compliers–whichis the same as the population, which is precisely the denominator of the usual IVestimand.

In observation studies, the usual IV estimand can also be used to estimate the effectin the compliers in the absence of defiers.

Some criticism:

• The relevance of the effect in the compliers is questionable.• Monotonicity is not always a reasonable assumption in observation studies.

The situation is evenmore complicated for the proxy instrumentsZ representedby Figures16.2 and 16.3. The interpretation of the IV estimand as the effect inthe compliers is questionable when proposed dichotomous instrument is in thecompliers is questionable when the proposed dichotomous instrument is notcausal, even if monotonicity held for the continuous causal instrument Uz.

• The partitioning of the population into four subpopulations or principal stratamay not be justifiable. In many realistic settings, the subpopulation of compliersis an ill-defined subset of the population.

Definition of monotonicity for continuous causal instrumentUz: Auz is a non-decreasing function of uz on the support of Uz.

28

Page 29: Reading Notes of Causal Inference - USTC

9.5 The three instrumental condition revisited

Discussion of three instrument conditions:

• Condition (i), a Z − A association, is empirically verifiable.

When the Z − A association is weak, the instrument is said to be weak. Threeserious problems arise when the proposed instrument is weak.

Note: Instruments are guaranteed to be weak in the presence of strong con-founding, because a strong A−U association leaves little residual variance for astrong A− UZ , or A− Z, association.

– First, weak instruments yield effect estimates with wide 95% confidenceintervals.

– Second, weak instruments amplify bias due to violations of conditions (ii)and (iii).

– Third, even in large samples, weak instruments introduce bias in the stan-dard IV estimator and result in underestimation of its variance.

This problem is an example of finite sample bias.

That is, the effect estimate is in the wrong place and the width of the confidenceinterval around it is too narrow.

• Condition (ii), the absence of a direct effect of the instrument on the outcome,cannot be verified from the data.

Condition (ii) may be violated when a continuous or multivalued treatment A isreplaced in the analysis by a coarser version A∗.

In practice, many treatments are replaced by coarser versions for simplicity ofinterpretation. Coarsening of treatment is problematic for IV estimation, butnot necessary for the methods discussed in previous chapters.

• Condition (iii), no confounding for the effect of the instrument on the outcome,is also unverifiable.

Rather than making the unverifiable assumption that there is absolutely no con-founding for the effect unverifiable assumption that there is absolutely no con-founding for the effect of Z on Y , we might feel more comfortable making theunverifiable assumption that there is no unmeasured confounding for the effectof Z on Y within levels of the measured pre-instrument covariates V .

29

Page 30: Reading Notes of Causal Inference - USTC

Another strategy is to check for balanced distributions of the measured con-founders across levels of the proposed instrument Z.

A violation of condition (iii) may occur even in the absence of confounding forthe effect of Z on Y . Such exchangeability may be violated because of eitherconfounding or selection bias.

9.6 Instrumentalvariableestimationversusothermethods

Different aspects:

• First, IV estimation requires modeling assumptions even if infinite data wereavailable in order to identify the average causal effect in the population. IVestimation cannot be nonparametric—-models are required for identification.

Note: IV estimation is not the only method that requires modeling for identifi-cation of causal effects. Other econometric approaches like regression discontinu-ity analysis do too.

• Second, relatively minor violations of conditions (i)-(iv) for IV estimation mayresult in large biases of unpredictable or counterintuitive direction.

• Third, the ideal setting for the application of standard IV estimation is morerestrictive than that for other methods.

Causal inference relies on transparency of assumptions and on triangulation of resultsfrom methods that depend on different sets of assumptions.

10 Causal survival analysis

10.1 Hazards and risks

An individual who does not develop the event of interest before the administrativeend of follow-up has her survival time administratively censored.

But administrative censoring is not the only type of censoring that may occur in sur-vival analysis.

Note: In a study with staggered entry different administrative censoring times, evenwhen the administrative end of follow-up date is common to all.

30

Page 31: Reading Notes of Causal Inference - USTC

Two common measures to accommodate administrative censoring.

• Survival probability—-or the probability of individuals who survived throughtime k.

The survival probability Pr[T > k] is the proportion of individuals who survivedthrough time k.

Alternatively, we can define risk, or cumulative incidence, at k as one minus thesurvival 1− Pr[T > k] = Pr[T ≤ k].

In survival analyses, a natural approach to quantify the treatment effect is tocontrast the survival (or risk) under each treatment level at some or all times t.Alternatively, we could contrast the risks, or cumulative incidences, rather thanthe survivals.

• Hazard

At any time k, we can also calculate the proportion of individuals who developthe event among those who had not developed it before t. This is the hazardPr[T = k|T > k − 1].

The risk and the hazard are different measures. The denominator of the risk—-the number of individuals at baseline—-is constant across time k and its numerator—-all events between baseline and k—-is cumulative.That is, the risk will stay flator increase as k increases.

On the other hand, the denominator of the hazard—-the number of individualsalive at k—-varies over time t and its numerator includes only recent events—-those during interval k. That is, the hazard may increase or decrease over time.

A frequent approach to quantify the treatment effect in survival analyses is toestimate the ratio of the hazards in the treated and the untreated, known as thehazard ratio.

Two problems:

– First, hazard ratio vary over time.

However, many published survival analyses report a single hazard ratio,which is usually the consequence of fitting aCox proportional hazardsmodelthat assumes a constant hazard ratio by ignoring interactions with time.The reported hazard ratio is a weighted average of the time-specific hazardratios, which makes it hard to interpret.

31

Page 32: Reading Notes of Causal Inference - USTC

– Second, even if we presented the time-specific hazard ratios, their causalinterpretation is not straightforward.

Note: Other effect measures that can be derived from survival curves are years of lifelost and the restricted mean survival time.

10.2 From hazards to risks

Two main ways to arrange the analytic dataset’

• First, each row of the database corresponds to one person.• Second, each row of the database corresponds to a person-time.

Time-varying indicator of event Dk.

Survival at k: Pr[Dk = 0] = Pr[T > k].

Risk at k: Pr[Dk = 1] = Pr[T ≤ k].

Hazard at k: Pr[Dk = 1|Dk−1 = 0].

For k = 1 the hazard is equal to the risk because everybody is, by definition, alive atk = 0.

The survival probability at k is the product of the conditional probabilities of havingsurvived each interval between 0 and k.

Pr[Dk = 0] =k∏

m=1

Pr[Dm = 0|Dm−1 = 0] (19)

The survival at k equals the product of oneminus the hazard at all previous times.

The hazard at k, Pr[Dk = 1|Dk−1 = 0], can be estimated nonparametrically by divid-ing the number of cases during the interval k by the number of individuals alive at theend of interval k − 1.

If we substitute this estimate into the above formula the resulting nonparametric esti-mate of the survivalPr[Dk = 0] at k is referred to as theKaplan-Meier estimator.

Typically the number of cases during each interval is low (or even zero) and thus thesenonparametric hazard estimates will be very unstable. Even so, the Kaplan-Meier esti-mator remains an excellent estimator of the survival curve, provided the total numberof failures over the follow up period is reasonably large. In contrast, if our interest is

32

Page 33: Reading Notes of Causal Inference - USTC

in estimation of the hazard at a particular k, smoothing via a parametric model maybe required.

Fitting a logistic regressionmodel for Pr[Dk+1 = 1|Dk = 0] to parametrically estimatethe hazards

Note: Functions other than the logit can also be used tomodel dichotomous outcomesand therefore to estimate hazards.

logitPr[Dk+1 = 1|Dk = 1, A] = θ0,k + θ1A+ θ2A× k + θ3A× k2 (20)θ0,k = θ0 + θ4k + θ3A× k2 (21)

We then compute estimates of the survival Pr[Dk+1 = 0|A = a] by multiplying theestimates of Pr[Dk+1 = 0|Dk = 0, A = a] provided by the logistic model, separatedfor the treated and the untreated.

33

Page 34: Reading Notes of Causal Inference - USTC

Part III

Causal inference from complexlongitudinal data

34

Page 35: Reading Notes of Causal Inference - USTC

Other sections

A Fine Points

A.1 8.1 Selection bias in case-control studies

Selection bias in case-control studies and incidence-prevalence bias can be representedby Figure 8.1.

A.2 8.2The strength and direction of selection bias

direction:

• Negatively associated: having either A = 1 or E = 1 is sufficient and necessaryto cause death (an ”or” mechanism), but that neither A or E causes death in theabsence of U.

• Positively associated: having both A = 1 and E = 1 is sufficient and necessaryto cause death (an ”and” mechanism) and that neither A or E causes death in theabsence of U.

magnitude

A large selection bias requires strong associations between the collider and both treat-ment and outcome. (collider-stratification bias).

A.3 9.1The strength and direction ofmeasurement bias

Note: A notable exception is the setting in which A and Y are unassociated and themeasurement error is independent and nondifferential.

• direction: either further from or closer to the null than the A− Y association.• magnitude: Measurement bias generally increases with the strength of the ar-rows from UA to A∗ and from UY to Y ∗.

35

Page 36: Reading Notes of Causal Inference - USTC

A.4 9.2Pseudo-intention-to-treat analysis

The ITT effect can only be computed in the absence of loss to follow-up or other formsof censoring. In the presence of loss to follow-up or forms of censoring, the analysis ofrandomized experiments requires appropriate for selection bias due to compute theITT effect. Thus, we need pseudo-ITT effect.

A.5 9.3Effectiveness versus efficacy

Some author refer to the per-protocol effect as the treatment’s ”efficacy”, and to theITT effect as the treatment’s ”effectiveness”.

• Effectiveness: Effect of assigning treatment Z in a setting in which the in-terventions under study will not be optimally implemented, typically because afraction of study subjects will not comply.

• Efficacy: Average causal effect of treatment A in an ideal randomized experi-ment.

Problems of the view that ”effectiveness” is the truly interesting effectmeasure.

• First, the actual adherence in real life may be different.• Second, in real life, both doctors and patients are aware of the received treat-ment.

• Third, individual patients will be more interested in the per-protocol effect–the”efficacy” of treatment–than in the ITT effect.

A.6 10.1Honest confidence interval

We say a large-sample valid 95% confidence interval is uniform or honest if there existsa sample size n at which the interval is guaranteed to cover the true parameter valueat least 95% of the time, whatever be the value of the true parameter. By definition,any small-sample valid confidence interval is uniform or honest for all n for which theinterval is defined.

36

Page 37: Reading Notes of Causal Inference - USTC

A.7 10.2Quantitative bias analysis

Most discussions revolve around informal judgements about the potential directionand magnitude of the systematic bias. Some authors argue that quantitative methodsneed to be used to produce intervals around the effect estimate that integrate randomand systematic sources of uncertainty. These methods, referred to as quantitative biasanalysis.

A.8 11.1 Model dimensionality and the relation between fre-quentist and Bayesian intervals

Confidence intervals used in frequentist statistical inference.

Credible intervals used in Bayesian statistical inference.

A Bayesian 95% credible interval means that, given the observed data, ”there is a95% probability that the estimand is the interval”. In Bayesian inference, probabil-ity is defined as degree-of-belief—-a concept very different from probability as fre-quency.

Low-dimensional parametric models with large sample sizes, 95% Bayesian credibleintervals are also 95% frequentist confidence intervals, but in high-dimensional ornonparametric models, a Bayesian 95% credible interval may not be a 95$ confidenceinterval as it may trap the estimand much less than 95% of the time.

A.9 12.1 Setting a bad example

We selected individuals into our study conditional on an event that occurred after thestart of the treatment. If treatment affects the probability of selection into the study,we might have selection bias as described in Chapter 8.

In this fine point, here the missing data concerns the treatment itself.

A.10 12.2Checking positivity

Two possible ways in which positivity can be violated.

37

Page 38: Reading Notes of Causal Inference - USTC

• Structural violation: The structure of the problem guarantee that the proba-bility of treatment conditional on being off work is exactly 0 (a structural zero).We’ll always find zero cells when conditioning on that confounder.

• Random violations: Our sample is finite so, if we stratify on several con-founders, we will start finding zero cells at some places even if the probability oftreatment is not really zero in the target population.

In the presence of structural violations, causal inferences cannot be made about theentire population using IP weighting or standardization. The inference needs to berestricted to strata in which structural positivity holds.

In the presence of random violations, we used our parametric model to estimate theprobability of treatment in the strata with random zeroes using data from individ-uals in other strata. In other words, we use parametric models to smooth over thezeros.

A.11 13.1 Structural positivity

Positivity is also necessary for standardized because, when Pr[A = a|L = l] = 0 andPr[L = l] = 0, then the conditional mean outcome E[Y |A = a, L = l] is unde-fined.

The practical impact of deviations frompositivitymay vary greatly between IPweightedand standardized estimates that rely on parametric models. When using standardiza-tion, one can ignore the lack of positivity if one is willing to rely on parametric extrap-olation. That is, one can fit a model for E[Y |A,L] that will smooth over the stratawith structural zeroes. This smoothing will introduce bias into the estimation, andtherefore the nomimal 95% confidence intervals around the estimates will cover thetrue effect less than 95% of the time. estimates that rely on parametric models.

In general, in the presence of violations or near-violations of positivity, the standarderror of the treatment effect will be smaller for standardization than for IP weight-ing.

A.12 14.2Sensitivityanalysis forunmeasuredconfounding

G-estimation relies on the fact that α1 = 0 if conditional exchangeability given Lholds. Now consider a setting inwhich conditional exchangeability does not hold.

38

Page 39: Reading Notes of Causal Inference - USTC

The value of α1—-the magnitude of nonexchangeability.

Robins, Rotnitzky, and Scharfstein (1999) provide technical details on sensitivity anal-ysis for unmeasured confounding using g-estimation.

A.13 15.1Nuisance parameter

E[Y a,c=0|L] = β0 + β1a+ β2aL+ β3L, (22)

β1 and β2 are the causal parameters.

β0 and β3 are the nuisance parameters.

Deciding what method to use boils down to deciding which nuisance parameters–those in the outcome model or in the treatment model–we believe can be more ac-curately estimated.

A.14 15.2Effectmodification and the propensity score

Why matched estimates may differ from the overall effect estimate:

• Effect modification• Violations of positivity in the non-matched• An unmeasured confounder that is more/less prevalent in the matched popula-tion, etc.

Effect modification might be explained by differences in residual confounding acrosspropensity strata.

A.15 16.1Candidate instruments in observational studies

Three common used categories of candidate instruments are:

• Genetic factors: The proposed instrument is a genetic variant Z that is asso-ciated with treatment A and that, supposedly, is only related with the outcomeY through A.

39

Page 40: Reading Notes of Causal Inference - USTC

• Preference: The proposed instrument Z is a measure of the physician’s prefer-ence for one treatment over the other. The idea is that a physician’s preferenceinfluences the prescribed treatment A without having a direct effect on the out-come Y .

• Access: The proposed instrumental Z is a measure of access to the treatment.The idea is that access impacts the use of treatment A but does not directlyaffect the outcome Y .

A.16 17.1Competing events

Consider five strategies to handle truncation by death:

Specific details see Page 72.

None of these strategies solves the problem of truncation by death satisfactorily. Trun-cation by competing events raises logical questions about the meaning of the causalestimand that cannot be bypassed by statistical techniques.

A.17 17.2Models for survival analysis

• Nonparametric approaches to survival analysis, like constructing Kaplan-Meiercurves, make no assumptions about the distribution of the unobserved failuretimes due to administrative censoring.

• Parametric models for survival analysis assume a particular statistical distribu-tion (e.g.,exponential, Weibull) for the failure times or hazards.

• Other models for survival analysis, like the Cox proportional hazards model andthe accelerated failure time (AFT) model, do not assume a particular distribu-tion for the failure times or hazards. In particular, these models are agnosticabout the shape of the hazard when all covariates in the model have value zero–often referred to as the baseline hazard. These models, however, impose a priorirestrictions on the relation between the baseline hazard and the hazard underother combinations of covariate values. As a result, these methods are referredto as semiparametric methods.

40

Page 41: Reading Notes of Causal Inference - USTC

B Technical Points

B.1 8.1The built-in selection bias of hazard ratio

Not understand.

B.2 8.2Multiplicative survival model

Definition 1: When the conditional probability of survival Pr[Y = 0|E = e, A = a]given A and E is equal to a product g(e)h(a) of functions of e and a, we say that amultiplicative survival model holds.

Pr[Y = 0|E = e, A = a] = g(e)h(a) (23)

Definition 2: The survival ratio Pr[Y = 0|E = e, A = a]/Pr[Y = 0|E = e, A = 0]does not depend on e and is equal to h(a).

The data follow a multiplicative survival model when there is no interaction betweenA and E on the multiplicative scale as depicted in Figure 8.13.

If Pr[Y = 0|E = e, A = a] = g(e)h(a), then Pr[Y = 1|E = e, A = a] = 1− g(e)h(a)does not follow amultiplicativemoralitymodel.Hence, whenA andE are conditionallyindependent given Y = 0, they will be conditionally dependent given Y = 1.

B.3 9.1 Independence and nondifferentiality

Let f() denoteS a probability density function(PDF )

• Independent: If UA and UY ’s joint PDF equals the product of their marginalPDF ’s.

• nondifferential: If UA’s joint PDF is independent of the outcome Y, UY ’sPDF is independent of the treatment A.

B.4 9.2The exclusion restriction

We say that the exclusion restriction holds when Y z=0,a = Y z=1,a for all subjects andall value a and for the value A observed for each subject.That is, there is no direct

41

Page 42: Reading Notes of Causal Inference - USTC

arrow from Z to Y.

B.5 10.1Bias and consistency in statistical inference

Systematic bias precludes both consistency and exact unbiasedness of an estimator.That is, in reality, our actual interval will generally be anti-conservative.

Most researchers will declare an estimator unbiased only if it can center a valid Waldconfidence interval. As argue by Robins, this definition of bias is essentially equiv-alent to the definition of uniform asymptotic unbiasedness because in general onlyuniformly asymptotic unbiased estimators can center a valid Wald interval. All incon-sistent estimators (such as those resulting from unknown systematic bias), and someconsistent estimators, are biased under this definition, which is the one we use in themain text.

B.6 10.2Aformalstatementof theconditionalityprinciple

The likelihood for the observed data has three factors:

• the density of Y given A and L,• the density of A given L,• the marginal density of L.

The conditionality principle states that one should always perform inference on theparameter of interest conditional on any ancillary statistics.

B.7 10.3 Comparison between adjusted and unadjusted esti-mators

The MLE is only guaranteed to be more efficient than the marginal estimator estima-tor when the ratio of number of subjects to the number of parameters is large,

Note that marginal estimator uses prior information not used by conditional estima-tor.

42

Page 43: Reading Notes of Causal Inference - USTC

B.8 11.1A taxonomy of commonly usedmodels

Generalized Linear Models have three components:

• a linear function form∑p

i=0 θiXi,• a link form function g{�} such that g{E[Y |X]} =

∑pi=1 θiXi,

• a distribution on Y conditional on X.

Without the distribution Y conditional on X, we refer to the model as a conditionalmean model.

• Conditional mean model for outcomes with strictly positive value often use thelog link function to ensure that all predicted value will be greater than zero.

log{E[Y |X]} =

p∑i=1

θiXi. (24)

• Conditional mean model for dichotomous models often use a logit link.

log{ E[Y |X]

1− E[Y |X]} =

p∑i=1

θiXi. (25)

• We can estimate θ bymaximum likelihood under a normal model for the identitylink,

• a Poisson model for the log link• a logistic regression model for a logit link.

Generalized estimating equationmodels, often used to deal with repeated mea-sures, are a further example of a conditional mean model.

A kernel regression model does not impose a specific functional form on E[Y |X] butrather estimates E[Y |X = x] for any x by

∑ni=1 ωh(x−Xi)YI∑i=1 nωh(x−Xi)

where ωh(z) is a positivefunction, known as a kernel function, that attains its maximum value at z = 0 anddecreases to 0 as |z| gets large at a rate that depends on the parameter h subscriptingω.

Generalized additive models replace the linear combination∑p

i=0 θiXi of a con-ditional mean model by a sum of smooth function

∑pi=0 fi(Xi). The model can be

43

Page 44: Reading Notes of Causal Inference - USTC

estimated using a backfitting algorithm with fi(�) estimated at iteration k by kernelregression. In the text we discuss smoothing with parametric models, which specifyan a priori functional form for E[Y |X = x]. In estimating E[Y |X = x], we may bor-row information from values of X that are far from x. In contrast, kernel regressionmodels do not specify an a priori functional form value of X that are far from valuesofX near to x when estimating E[Y |X = x]. A kernel regression model is an exampleof a ”non-parametric” regression model. This use of the term ”non-parametric” dif-fers from our previous usage. Our non-parametric estimator of E[Y |X = x] only usedthose subjects for whomX equalled x exactly; no information was borrowed even fromclose neighbors. Here ”nonparametric” estimators of E[Y |X = x] use subjects withvalues ofX near to x. How near is controlled by a smoothing parameter referred to asthe bandwidth h. Our nonparametric estimators correspond to taking h = 0.

B.9 12.1Horvitz-Thompson estimators

Original Horvitz-Thompson (1952) estimator

EE[I(A = a)Y

f(A|L)] (26)

Modified Horvitz-Thompson estimator

EE[ I(A=a)Yf(A|L) ]

EE[ I(A=a)f(A|L) ]

(27)

44

Page 45: Reading Notes of Causal Inference - USTC

B.10 12.2More on stabilized weights

The IPweightedmeanwithweights g[A]f [A|L] is equal to the counterfactualmeanE[Y

a].

E[I(A = a)

f [A|Y ]] = 1 (28)

E[I(A = a)Y

f [A|Y ]] = E[Y a] (29)

E[ I(A=a)Yf [A|Y ]

]

E[ I(A=a)f [A|Y ]

]= E[Y a] (30)

E[ I(A=a)Yf [A|Y ]

g(A)]

E[ I(A=a)f [A|Y ]

g(A)]= E[Y a] (31)

E[I(A = a)Y

f [A|Y ]g(A)] = E[Y a]g(a) (32)

E[I(A = a)

f [A|Y ]g(A)] (33)

B.11 13.1Bootstrapping

The bootstrap is an alternative method for estimating standard errors and computing95% confidence intervals.

Bootstrap is a general method for large samples. We could have also used it to com-pute a 95% confidence interval for the IP weighted estimates frommarginal structuralmodels in the previous chapter.

B.12 13.2Doubly robustmethods

A method that requires a correct model for either treatment A or outcome Y .

An example:

• Estimating IP weight,• fitting a model for E[Y |A = a, C = 0, L = l, D],

45

Page 46: Reading Notes of Causal Inference - USTC

• using the predicted values frommodel to obtain the standardizedmean outcomeunder A = 1 and A = 0.

Under exchangeability and positivity given L, this estimator consistently estimatesthe average causal effect if either the model for the treatment or for the outcome iscorrect.

B.13 14.1Relationbetweenmarginal structuralmodelsandstruc-tural nestedmodels

A marginal structural mean model for the average outcome under treatment level awith the level of the binary covariate V , a component of L.

E[Y a|V ] = β0 + β1a+ β2aV + β3V. (34)

Rewrite the model as E[Y a|V ] = E[Y a=0|V ] + β1a+ β2aV

E[Y a − Y a=0|V ] = β1a+ β2aV (35)

which is referred to as a semiparametric marginal structural mean model.

In settings without time-varying treatments, structural nested models are identical tosemiparametric marginal structural mean models that leave the mean counterfactualoutcomes under no treatment unspecified.

B.14 14.1Multiplicative structural nestedmeanmodels

An example:

log(E[Y a|A = a, L]

E[Y a=0|A = a, L]) = β1a+ β2L (36)

which can be fit by g-estimation with H(ϕ†) defined to be Y exp[−ϕ†1a − ϕ†

2L]. Theabove multiplicative model can also be used for binary (0, 1) outcome variables as longas the probability of Y = 1 is small in all strata of L. Otherwise, the model mightpredict probability greater than 1.

If the probability is not small, one can consider a structural nested logistic model fora dichotomous outcome Y such as

logitPr[Y a = 1|A = a, L]− logitP t[Y a=0 = 1|A = a, L] = β1a+ β2L (37)

46

Page 47: Reading Notes of Causal Inference - USTC

Unfortunately, structural nested logisticmodel does not generalize easily to time-varyingtreatments and their parameters cannot be estimated using the g-estimation algorithmdescribed in the text.

B.15 14.2G-estimation of structural nestedmeanmodels

Our estimate of β1 is the value ofH(ϕ†) that minimizes the association betweenH(ϕ†)and A. When we base our g-estimate on the score test, this procedure is equivalent tofinding the parameter value ϕ† that solves the estimating equation

N∑i=1

I[Ci = 0]WCi Hi(ϕ

†)(Ai − E[A|Li]) = 0 (38)

Using the fact that Hi(ϕ†) = Yi − ϕ†Ai we obtain that

ϕ†1 =

N∑i=1

I[Ci = 0]WCi Yi(Ai − E[A|Li])/

N∑i=1

I[Ci = 0]WCi Ai(Ai − E[A|Li]) (39)

The choice of the function affects the statistical efficiency of the estimator, but notits consistency.

Whether we can further increase efficiency by replacing Hi(ϕ†) by a nonlinear func-

tion.

• No, for non-rank-preserving model.• Yes, for rank-preserving model.

Replacing H(ϕ†) by H(ϕ†)− E[H(ϕ†)|L] in the estimating equation.

Thus, we obtain a consistent estimator of ϕ if either (1) the model for E[H(ϕ†)|L] or(2) both models forE[A|L] and Pr[C = 1|A,L] are correct, without knowing which of(1) or (2) is correct. We refer to such an estimator as being robust doubly robust.

B.16 15.1Balancing scores and prognostic scores

A balancing score b(L) is any function of the covariates L such that A ⊥ L|b(L).

Exchangeability and positivity based on the variable L implies exchangeability andpositivity based on a balancing score b(L).

47

Page 48: Reading Notes of Causal Inference - USTC

B.17 16.1The instrumental conditions, formally

Instrumental conditions:

• Relevance condition: non-null association between Z and A.• Exclusion restriction: no direct effect of Z on Y .

Both versions of exclusion restriction are trivially true for proxy instruments.• Marginal exchangeability: Y a,z ⨿ Z for all a, z.

Both versions of marginal exchangeability are expected to hold in randomizedexperiments in which the instrument Z is the randomized assignment.

B.18 16.2Bounds: Partial identification of causal effects

The width of the bounds will vary depending on the chosen values.

The bounds for Pr[Y a=1 = 1] − Pr[Y a=0 = 1] can be further narrowed when thereexists a variable Z that meets instrumental condition (2) at the population level andmarginal exchangeability (3).

Unfortunately, all these partial identification methods are often relatively uninforma-tive because the bounds are wide.

There is a way to decrease the width of the bounds: making parametric assumptionsabout the form of the effect of A on Y . Under sufficiently strong assumptions de-scribed in Section 16.2, the upper and lower bounds converge into a single numberand average causal effect is point identified.

B.19 16.3AdditivestructuralmeanmodelsandIVestimation

Additive mean model for a dichotomous treatment A and an instrument Z:

E[Y a=1 − Y a=0|A = 1, Z] = β0 + β1Z (40)

E[Y − Y a=0|A,Z] = A(β0 + β1Z) (41)

β0—-the average causal effect of the treatment among the treated subjects with Z =0

48

Page 49: Reading Notes of Causal Inference - USTC

β0 + β1—-the average causal effect of treatment among the treated subjects with Z =1

If we priori assume that there is no additive effect modification by Z, then β1 = 0 andβ0 is exactly the usual IV estimand.

Proof:

E[Y a=0|Z = 1] = E[Y a=0|Z = 0] (42)

E[Y − A(β0 + β1)|Z = 1] = E[Y − Aβ0|Z = 0] (43)

by β1 = 0

β0 =E[Y |Z = 1]− E[Y |Z = 0]

E[A|Z = 1]− E[A|Z = 0](44)

An instrument is insufficient to identify the average causal effect.

Under this the additional assumption β1 = 0, β0 = E[Y a=1 − Y a=0|A = 1, Z = z] =E[Y a=1−Y a=0|A = 1] for any z is the average causal effect of treatment in the treated,but not generally the average causal effect in the study population E[Y a=1]−E[Y a=0].In order to conclude that β0 = E[Y a=1] − E[Y a=0] and thus that E[Y a=1] − E[Y a=0]is the usual IV estimated, we must assume that the effects of treatment in the treatedand untreated are identical, which is an additional untestable assumption.

B.20 16.4Multiplicative structural mean models and IV esti-mation

Multiplicative structural mean model for a dichotomous treatment A and an instru-ment Z:

E[Y a=1|A = 1, Z]

E[Y a=0|A = 1, Z]= exp(β0 + β1Z), (45)

E[Y |A,Z] = E[Y a=0|A,Z]exp[A(β0 + β1Z), (46)

exp(β0—-the causal risk ratio in the treated subjects with Z = 0

exp(β0 + β1)—-the causal risk ratio in the treated with Z = 1

49

Page 50: Reading Notes of Causal Inference - USTC

If we priori assume that β1 = 0, then the average causal effect on the multiplicative(risk ratio) scale is E[Y a=1]/E[Y a=0] = exp(β0), and the average causal effect on theadditive (risk difference) scale is

E[Y a=1]− E[Y a=0] = E[Y |A = 0](1− E[A])[exp(β0)− 1] + E[Y |A = 1]E[A][1− exp(β0)](47)

If we assume a multiplicative structural mean model with no multiplicative effectmodification by Z in the treated and in the untreated, then the average causal effectE[Y (1)]−E[Y (0)] remains identified, but no longer assume additive or multiplicativeeffect modification by Z.

Unfortunately, it is not possible to determine which, if either, assumption is true evenif we had an infinite sample size.

B.21 16.5More general structural meanmodels

Additive structural mean model:

E[Y − Y a=0|Z,A, V ] = γ(Z,A, V, ψ⋆) (48)

The parameters of this model can be identified via g-estimation under the conditionalcounterfactual mean independence assumption.

E[Y a=0|Z = 1, V ] = E[Y a=0|Z = 0, V ]. (49)

Multiplicative structural mean model:

E[Y |Z,A, V ] = E[Y0|Z,A, V ]exp[γ(Z,A, V, ψ⋆)] (50)

The parameters of this model can be identified via g-estimation under analogous con-ditions.

G-estimation of nested additive and multiplicative structural mean models can extendIV methods for time-fixed treatments and confounders to settings with time-varyingtreatments and confounders.

B.22 16.6Monotonicity and the effect in the compliers

The usual IV estimand equals the average causal effect in the compliers E[Y a=1 −Y a=0|Az=1 − Az=0 = 1] under monotonicity when no defiers exist.

50

Page 51: Reading Notes of Causal Inference - USTC

The proof:

E[Y z=1 − Y z=0] = E[Y z=1 − Y z=0|Az=1 = 1, Az=0 = 1]Pr[Az=1 = 1, Az=0 = 1] (51)+E[Y z=1 − Y z=0|Az=1 = 0, Az=0 = 0]Pr[Az=1 = 0, Az=0 = 0] (52)+E[Y z=1 − Y z=0|Az=1 = 1, Az=0 = 0]Pr[Az=1 = 1, Az=0 = 0] (53)+E[Y z=1 − Y z=0|Az=1 = 0, Az=0 = 1]Pr[Az=1 = 0, Az=0 = 1]. (54)

The intention-to-treat effect in both the always-taker and the never-takers is zero andassuming that no defier exist, then the above simplified to:

E[Y z=1 − Y z=0] = E[Y z=1 − Y z=0|Az=1 = 1, Az=0 = 0]Pr[Az=1 = 1, Az=0 = 0]. (55)

In the compliers,

E[Y z=1 − Y z=0|Az=1 = 1, Az=0 = 0] = E[Y a=1 − Y a=0|Az=1 = 1, Az=0 = 0]. (56)

The effect in the compliers is

E[Y a=1 − Y a=0|Az=1 = 1, Az=0 = 0] =E[Y z=1 − Y z=0]

Pr[Az=1 = 1, Az=0 = 0]. (57)

which is usual IV estimand if we assume that Z is randomly assigned.

E[Y z=1 − Y z=0] = E[Y |Z = 1]− E[Y |Z = 0]. (58)Pr[Az=1 − Az=0 = 1] = Pr[A = 1|Z = 1]− Pr[A = 1|Z = 0]. (59)

Pr[Az=0 = 1] = Pr[A = 1|Z = 0], (60)Pr[Az=1 = 0] = Pr[A = 0|Z = 1]. (61)

(62)

Under monotonicity:

Pr[Az=1 − Az=0 = 1] = 1− Pr[A = 1|Z = 0]− Pr[A = 0|Z = 1] (63)= 1− Pr[A = 1|Z = 0]− (1− Pr[A = 1|Z = 1]) (64)

= Pr[A = 1|Z = 1]− Pr[A = 1|Z = 0]. (65)

This proof only considers the setting depicted in Figure 16.1 in which the instrumentZ is causal.

When, as depicted in Figures 16.2 and 16.3, data on a surrogate instrument Z—-butnot on the causal instrument Uz—-are available. (The average causal effect in the com-pliers (defined according to Uz) is also identified by the usual IV estimator.

The proof depends critically on two assumptions:

51

Page 52: Reading Notes of Causal Inference - USTC

• Z is independent of A and Y given the causal instrument Uz.• Uz is binary.

However, this independence assumption has often little substantive plausibility unlessUz is continuous.

A corollary: The interpretation of the IV estimand as the effect in the compliers isquestionable in many applications of IV methods to observational data in which Z isat best a surrogate for Uz.

C Newwords

postmenopausal/ estrogen/ coronary/ hip/moot/ nomenclature/ haplotype/ frailty/ wasabi/atherosclerosis/ pragmatic/Alzheimer/ tumor/ lipoprotein/ cholesterol/ statin/ haphaz-ard/ dementia/ hepatitis/ inert/ prognosis/ ibuprofen/ calibrated/ agonistic/ malfea-sance/ ad hoc/ parsimonious/ homoscedasticity/ saturated/ enamored/ taxonomy/ ubiq-uitous/

D Questions

• The first book• Page 99: the illustrations of Figure 8.3 to 8.6.• Page 101: What is the Simpson’s paradox?• Page 100: What is it mean of the built-in selection bias of hazard ratio?• Page 105: Why the definition of causal effects could not ignore censoring?• Page 105: When shall we use IP weighting and when shall we use stratification?• Page 115: What does those ”In general,…” mean?• Page 123: What is the definition of uniform asymptotic unbiasedness?• Page 127: Why the second investigator is correct and why when the number ofmeasured variables is larger however, following the conditionality principle is nolonger a wise strategy?

• Page 128: Technical point 10.2.

52

Page 53: Reading Notes of Causal Inference - USTC

• The second book• Page 8: Why the high-dimensional model may trap the estimand much less than95% of the time?

Bayesian inference requires the specification of a prior distribution for all un-known parameters. In low-dimensional parametric models the information inthe data swamps that contained in reasonable priors. As a result, inferenceis insensitive to the particular prior distribution selected. However in high-dimensional models, this is no longer the case.

• Page 10: Technical point 11.1 needs to read carefully.• Page 14: What does conservative mean in Statistics?• Page 38: We calculated the P-value from a Wald test. Any other valid test maybe used.

• Page 41: Technical point 14.2, doubly robust, closed form.• Page 45: How to fit an outcome regression?• Page 57: F-statistic.• Page 71:Cox proportional hazards model?

Assuming a constant hazard ratio by ignoring interactions with time.• Page 73 Kaplan-Meier estimator.

E Possible Print errors

• The first book• Page 19: I felt that 40% and 60 % need to change their places.• Page 88: Fine Point 7.1 last sentence may miss ”t” when spelling ”convenional”.• Page 91: Fine Point 7.2 the eighth line may miss a blank space between ”that”and ”conditional”.

• Page 115: Repeat the content with the content on Page 116.• Page 118: Fine Point 9.3 the fifth line may miss ”t” when spelling ”no”.• The second book

53

Page 54: Reading Notes of Causal Inference - USTC

• Page 32: It is not a big deal. It would be better to add a ”,” between ”. . . ” and”Lp”.

• Page 58: Technical point 16.3, the eighth line from the bottom, there is a redun-dant ”need”.

• Page 58: Technical point 16.3, the seventh line there is a redundant ”a”.• Page 60: Technical point 16.5, the fifth line there is a redundant ”is”.• Page 68: Technical point 16.5, the equation of compliers line should be ”+E[Y z=1−Y z=0|Az=1 = 1, Az=0 = 0]Pr[Az=1 = 1, Az=0 = 0]”.

• Page 71: It would be better to replace ”person 2” by ”person two”. (I just feel itbetter to keep unified writing)

• Page 71: Last sentence may miss ”r” when spelling ”though”.

54