epse 581c: causal inference for applied researchers

48
EPSE 581C: Causal Inference for Applied Researchers Ed Kroc University of British Columbia [email protected] May 22, 2019 Ed Kroc (UBC) Causal Inference May 22, 2019 1 / 48

Upload: others

Post on 18-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPSE 581C: Causal Inference for Applied Researchers

EPSE 581C: Causal Inference for Applied Researchers

Ed Kroc

University of British Columbia

[email protected]

May 22, 2019

Ed Kroc (UBC) Causal Inference May 22, 2019 1 / 48

Page 2: EPSE 581C: Causal Inference for Applied Researchers

Last time

Model misspecification and (some of) its effects

Ed Kroc (UBC) Causal Inference May 22, 2019 2 / 48

Page 3: EPSE 581C: Causal Inference for Applied Researchers

Today

More model misspecification and (some of) its effects

Consistency and unbiasedness of estimators

Ed Kroc (UBC) Causal Inference May 22, 2019 3 / 48

Page 4: EPSE 581C: Causal Inference for Applied Researchers

Regression Discontinuity (RD) design

Suppose our data look like this:

Ed Kroc (UBC) Causal Inference May 22, 2019 4 / 48

Page 5: EPSE 581C: Causal Inference for Applied Researchers

Regression discontinuity design

Estimation:

It would be unreasonable to assume equal slopes on both sides of thethreshold. Thus, we may propose the model:

Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.

Under this specification, our estimate of the ACE is:

zACE pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q

“ pβT ` 2pβTX

But what if we misspecified the model by assuming equalslopes on both sides of the threshold?

This would produce a case of model misspecification.

Ed Kroc (UBC) Causal Inference May 22, 2019 5 / 48

Page 6: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification

Broadly construed, there are three main types of model misspecification:

(1) Misspecification of the random error structure.

Heteroskedasticity of errors

Autocorrelation of errors (response)

(2) Misspecification of the “link” function.

Severe lack of normality of errors

(3) Misspecification of the covariate structure.

Misspecified functional form for covariates

Omitted covariates

All three of these issues are common to all forms of regression analysis(including factor analysis, SEMs, mixed effects modelling, etc.)

In practice, (3) can be very difficult to detect and to properlycorrect for. Unfortunately, (3) is also the most important case.

Ed Kroc (UBC) Causal Inference May 22, 2019 6 / 48

Page 7: EPSE 581C: Causal Inference for Applied Researchers

Regression assumptions: descriptive/predictive vs. causal

The best way to check for violations of any of the regressionassumptions is by examining residual plots (or standardized residualplots for GLMs).

One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).

If all assumptions are satisfied, then all residual plots should looksomething like a random blob:

Ed Kroc (UBC) Causal Inference May 22, 2019 7 / 48

Page 8: EPSE 581C: Causal Inference for Applied Researchers

Regression assumptions: descriptive/predictive vs. causal

One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).

If errors autocorrelate, residual vs. fitted plot may look like:

Ed Kroc (UBC) Causal Inference May 22, 2019 8 / 48

Page 9: EPSE 581C: Causal Inference for Applied Researchers

Regression assumptions: descriptive/predictive vs. causal

One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).

If errors have unequal variances, then residual vs. fitted plot may looklike:

Ed Kroc (UBC) Causal Inference May 22, 2019 9 / 48

Page 10: EPSE 581C: Causal Inference for Applied Researchers

Regression assumptions: descriptive/predictive vs. causal

One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).

If the functional form of the predictors is misspecified, then residualsvs. fitted plot may look like:

Ed Kroc (UBC) Causal Inference May 22, 2019 10 / 48

Page 11: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

(3) Misspecification of the covariate structure.

Must respecify the functional form of the model.

Usually diagnosable by looking at residual plots and/or examining theraw data, but this is rarely trivial.

Moreover, a better functional form may be too complicated toreasonably estimate given the amount of data we have.

Taught to err on the side of simplicity in explanatory/predictiveinference,

. . . but for causal inference, this issue cannot be downplayed.

Ed Kroc (UBC) Causal Inference May 22, 2019 11 / 48

Page 12: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Suppose our data look like this:

Ed Kroc (UBC) Causal Inference May 22, 2019 12 / 48

Page 13: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Estimation:

It would be unreasonable to assume equal slopes on both sides of thethreshold. Thus, we may propose the model:

Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.

Under this specification, our estimate of the ACE is:

zACE pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q

“ pβT ` 2pβTX

But what if we misspecified the model by assuming equalslopes on both sides of the threshold?

This would produce a case of model misspecification.

Ed Kroc (UBC) Causal Inference May 22, 2019 13 / 48

Page 14: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Suppose we propose the misspecified model for our example data inthe previous diagram:

Y “ β0 ` βTT ` βXX ` δ1.

Under this specification, our estimate of the ACE is:

zACEwrong pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q

“ pβ1T

However, we know that the more appropriate estimate from theproperly specified model is

zACE rightpX “ 2q “ pβT ` 2pβTX .

Ed Kroc (UBC) Causal Inference May 22, 2019 14 / 48

Page 15: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Misspecified regression model in orange:

Ed Kroc (UBC) Causal Inference May 22, 2019 15 / 48

Page 16: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Thus, the estimate from our misspecified model is off by:

zACE rightpX “ 2q ´zACEwrong pX “ 2q “ pβT ` 2pβTX ´ pβ1T

It is very important to notice that

pβT ‰ pβ1T

This is because our estimates depend on the model specification.

Ed Kroc (UBC) Causal Inference May 22, 2019 16 / 48

Page 17: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Recall: we proposed the misspecified model for our example data:

Y “ β0 ` βTT ` βXX ` δ1.

In actuality, the true model is:

Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.

So the error in the misspecified model, δ1, does not satisfy thenecessary assumptions of the regression framework. In particular:

δ1 “ βTXT ¨ X ` δ,

so δ1 is confounded with T and X ; i.e. δ1 is not independent of T orX .

Ed Kroc (UBC) Causal Inference May 22, 2019 17 / 48

Page 18: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

Under our model misspecification, we know that a term is missingfrom our model; i.e. the interaction of T and X is absorbed into theerror term:

δ1 “ βTXT ¨ X ` δ,

Thus, Covpδ1,T q ‰ 0, and so

βT “CovpY ,T q ´ Covpδ1,T q

VarpT q

However, our standard regression estimators assume that there are noviolations of assumptions; thus, our actual estimate is:

pβ1T “yCovpY ,T q

xVarpT q“

řni“1pyi ´ syqpti ´ stqřn

i“1pti ´ stq2

Ed Kroc (UBC) Causal Inference May 22, 2019 18 / 48

Page 19: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: misspecified covariate function

We know that the actual population parameter we are interested in isβT from the correctly specified model:

Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ

Doing the same covariance algebra as before, and noting that allregression assumptions are (mostly) satisfied since the model isproperly specified, we find

βT “CovpY ,T q ´ βTXCovpTX ,T q

VarpT q“

CovpY ,T q

VarpT q´ βTXEpX q.

But using the misspecified model, we do not estimate this! Instead,we estimate only the first term:

β1T “CovpY ,T q

VarpT q

Ed Kroc (UBC) Causal Inference May 22, 2019 19 / 48

Page 20: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

Ed Kroc (UBC) Causal Inference May 22, 2019 20 / 48

Page 21: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.5 1.0 1.5

-0.15

-0.05

0.05

0.10

0.15

0.20

fitted(mod.w)

residuals(mod.w)

0.0 0.5 1.0 1.5 2.0

-0.10

-0.05

0.00

0.05

0.10

fitted(mod.r)

residuals(mod.r)

Clear evidence of model misspecification in residuals vs. fitted plot!

Ed Kroc (UBC) Causal Inference May 22, 2019 21 / 48

Page 22: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

x

yzACEwrong pX “ 0.5q “ pβ1T “ 0.515

zACE rightpX “ 0.5q “ pβT ` 0.5pβTX “ 0.008` 0.5 ˚ 0.999 “ 0.508

Not too bad. . ., but what if the misspecification was worse?

Ed Kroc (UBC) Causal Inference May 22, 2019 22 / 48

Page 23: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 2

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

Ed Kroc (UBC) Causal Inference May 22, 2019 23 / 48

Page 24: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 1

Misspecified model on LEFT; properly specified model on RIGHT:

-2.5 -2.0 -1.5 -1.0 -0.5 0.0

-0.3

-0.2

-0.1

0.0

0.1

0.2

fitted(mod.w2)

residuals(mod.w2)

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

-0.10

-0.05

0.00

0.05

0.10

fitted(mod.r)

residuals(mod.r)

Clear evidence of model misspecification in residuals vs. fitted plot!

Ed Kroc (UBC) Causal Inference May 22, 2019 24 / 48

Page 25: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: Ex. 2

Misspecified model on LEFT; properly specified model on RIGHT:

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

x

yzACEwrong pX “ 0.5q “ pβ1T “ ´0.152

zACE rightpX “ 0.5q “ pβT ` 0.5pβTX ` p0.5q2pβTX2 ““ ´0.527

Misspecified model ACE estimate is 3-times too small.

Ed Kroc (UBC) Causal Inference May 22, 2019 25 / 48

Page 26: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: ignore fit statistics

Notice: fit statistics are useless here.

That is, misspecified models can still “fit” the data very well.

Good enough for explanatory modelling.

Not good enough for causal modelling!

Ignore all fit statistics when performing causal modelling, including:

Goodness-of-fit F -tests

R2 statistics

Information criterion statistics (AIC, BIC, DIC, etc.)

Ed Kroc (UBC) Causal Inference May 22, 2019 26 / 48

Page 27: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: ignore statistical significance

Notice: statistical significance of model coefficient estimates isirrelevant here.

Recall numerical Ex. 1:

All estimates significant for misspecified model

In properly specified model, intercept (pβ0) and marginal treatment

(pβT ) estimates not statistically significant.

Recall numerical Ex. 2:

All estimates significant for misspecified model

In properly specified model, intercept (pβ0) and marginal first-order

treatment (pβT ) estimates not statistically significant.

Ed Kroc (UBC) Causal Inference May 22, 2019 27 / 48

Page 28: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: bigger sample size will never fixthe problem

It is common “wisdom” that the more data you have, the better youwill be able to quantify your effects of interest.

This is true for explanatory/descriptive and predictive modelling, butfalse for causal modelling.

Ed Kroc (UBC) Causal Inference May 22, 2019 28 / 48

Page 29: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators

There are two extremely important and desirable properties we usually likeour estimators to have:

Consistency

Unbiasedness

Other properties are also often desirable (e.g. asymptotic normality), butconsistency and unbiasedness are by far the most important.

Ed Kroc (UBC) Causal Inference May 22, 2019 29 / 48

Page 30: EPSE 581C: Causal Inference for Applied Researchers

Unbiasedness of estimators

Generally, an estimator pθ for some population parameter θ of a randomvariable of interest X is called unbiased if:

Eppθq “ θ

In words, an estimator is unbiased for its estimand (what it is trying toestimate) if, on average, the estimator equals the estimand.

Example: In a random sample, the sample mean, pθ “ 1n

řni“1 Xi , is an

unbiased estimator of the population mean, θ “ EpX q:

E

˜

1

n

nÿ

i“1

Xi

¸

“1

n

nÿ

i“1

EpXi q

“1

n

nÿ

i“1

EpX q

“nEpX q

n“ EpX q X

Ed Kroc (UBC) Causal Inference May 22, 2019 30 / 48

Page 31: EPSE 581C: Causal Inference for Applied Researchers

Consistency of estimators

Generally, an estimator pθ is called consistent if, as the sample size increaseswithout bound, the sample value of pθ approaches a single number, a:

for all ε ą 0, limnÑ8

Prp|pθ ´ a| ą ε | Snq “ 0,

where Sn denotes a random sample of size n.

If an estimator is both unbiased and consistent, then not only does itsaverage value equal the true estimand of interest, but as we increasethe sample size, the estimator becomes more and more precise aboutthis true value.

That is, such an estimator is both accurate and precise as sample sizeincreases.

Ed Kroc (UBC) Causal Inference May 22, 2019 31 / 48

Page 32: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators

It is entirely possible that an estimator is consistent but biased;e.g. the unadjusted sample variance:

1

n

nÿ

i“1

pxi ´ sxq2

It is also entirely possible that an estimator is unbiased butinconsistent; e.g. using the average of the sample min and max toestimate the population mean:

maxtxi : 1 ď i ď nu `mintxi : 1 ď i ď nu

2

Estimators can also be neither unbiased nor consistent. Very bad!

Also, some estimators are asymptotically unbiased and consistent.

Ed Kroc (UBC) Causal Inference May 22, 2019 32 / 48

Page 33: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

The sample mean is an unbiased and consistent estimator of thepopulation mean (for population random variables with finite mean):

sX “1

n

nÿ

i“1

Xi

The average of the sample extremes is an unbiased but inconsistentestimator of the population mean:

Avgpmin,maxq :“maxtxi : 1 ď i ď nu `mintxi : 1 ď i ď nu

2

Ed Kroc (UBC) Causal Inference May 22, 2019 33 / 48

Page 34: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

Example: Suppose we have 30 observations from a normallydistributed population, X „ Np3.3, 1q.

These observations generate the two sample statistics:

sX “ 3.39, Avgpmin,maxq “ 3.28

Both seem pretty good. This is no accident either.

Ed Kroc (UBC) Causal Inference May 22, 2019 34 / 48

Page 35: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

Sampling Distribution of the Sample Mean

Sample Mean

Frequency

2.8 3.0 3.2 3.4 3.6 3.8 4.0

050

100

150

200

Sampling Distribution of the Sample Average Spread

Sample Average Spread

Frequency

2.0 2.5 3.0 3.5 4.0 4.5

050

100

150

200

Simulated 1000 draws of 30 observations from X to create these(estimated) sampling distributions.

Both estimators unbiased, but average of extremes is not veryprecise. . .Ed Kroc (UBC) Causal Inference May 22, 2019 35 / 48

Page 36: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

Histogram of avg

avg

Frequency

3.0 3.2 3.4 3.6

050

100

150

200

Histogram of rng

rng

Frequency

2.0 2.5 3.0 3.5 4.0 4.5

050

100

150

200

250

Increased sample size: simulated 1000 draws of 100 observations fromX to create these new (estimated) sampling distributions.

Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.

Ed Kroc (UBC) Causal Inference May 22, 2019 36 / 48

Page 37: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

Histogram of avg

avg

Frequency

3.20 3.25 3.30 3.35 3.40

050

100

150

200

250

Histogram of rng

rng

Frequency

2.5 3.0 3.5 4.0

050

100

150

200

250

300

Increased sample size: simulated 1000 draws of 1000 observationsfrom X to create these new (estimated) sampling distributions.

Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.

Ed Kroc (UBC) Causal Inference May 22, 2019 37 / 48

Page 38: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators: Example

Histogram of avg

avg

Frequency

3.26 3.28 3.30 3.32 3.34

0500

1000

1500

2000

Histogram of rng

rng

Frequency

2.0 2.5 3.0 3.5 4.0 4.5

01000

2000

3000

Increased sample size: simulated 1000 draws of 10,000 observationsfrom X to create these new (estimated) sampling distributions.

Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.

Observe consistency of sample mean, inconsistency of sample averageof extremes.

Ed Kroc (UBC) Causal Inference May 22, 2019 38 / 48

Page 39: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators

When all the usual regression assumptions hold, the standardestimators for the model coefficients (e.g. maximum likelihood orordinary least squares estimators) are consistent and unbiased for thetrue population values of those parameters.

However, when the regression model is misspecified, the estimatorsare still consistent, but they are no longer unbiased. Moreover, theyare not even asymptotically unbiased.

White (1982), Econometrica: MLEs of regression coefficients willapproach the values that minimize the Kullback-Leibler divergencebetween the specified model and the true model.

Ed Kroc (UBC) Causal Inference May 22, 2019 39 / 48

Page 40: EPSE 581C: Causal Inference for Applied Researchers

Consistency and unbiasedness of estimators

UPSHOT: if the functional form of your model is misspecified, and/orif you are missing important covariates, it doesn’t matter how muchdata you have: your estimates will always be wrong, even if they arevery precise.

This is a HUGE problem for causal inference.

It is common “wisdom” that the more data you have, the better youwill be able to quantify your effects of interest; this is false whenperforming model-based causal inference.

Ed Kroc (UBC) Causal Inference May 22, 2019 40 / 48

Page 41: EPSE 581C: Causal Inference for Applied Researchers

Model misspecification: omitted variables

So far, we have only focused on model misspecification where thefunctional form of the covariates is misspecified, but our modelsalways contained all explanatory variables.

In practical non-experimental research, we will always be missingsome confounders; we can’t measure everything, or even knoweverything we should always be measuring!

Detecting important omitted variables can be very difficult.

Residual plots still the way to go, but they will not always suggestomitted variable bias.

Hence, why the exchangeability of treatment is so important in anRD-design: treatment is “as good as” randomly assigned near thethreshold; thus, biasing effects of omitted variables should benegligible (near the threshold).

Ed Kroc (UBC) Causal Inference May 22, 2019 41 / 48

Page 42: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

Why don’t we hear about these issues (omitted variables, modelmisspecification) in the context of controlled experiments?

ANSWER: usually, well-controlled experiments bypass these issues bydesign.

Example: Does an increase in NO2 in native SE BC soil causeArabidopsis lyrata leaves to grow larger?

3ˆ 5 factorial design on 90 seeds:

3 levels of NO2: control (native soil), 1.5 times average NO2

concentration, 2 times average NO2 concentration.

5 time points (after sprouting), no repeated measures: 5 days, 10 days,15 days, 20 days, 25 days.

Outcome measure: length of eighth leaves.

Ed Kroc (UBC) Causal Inference May 22, 2019 42 / 48

Page 43: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

Here, we could propose a full 2-way ANOVA model:

Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε,

where τX denotes the average treatment effect of X , µ denotes thegrand average lengths of eighth leaves (over all nitrogen levels andtime points), and ε denotes random error.

Experiment is controlled to fix the values of possible confounders:e.g. humidity, light, water, O2 levels, etc.

Levels of explanatory factors are also fixed; NO2 and age arecontinuous variables, but experimental control fixes the possiblevalues these variables can assume to finite sets.

Ed Kroc (UBC) Causal Inference May 22, 2019 43 / 48

Page 44: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

Here, we could propose a full 2-way ANOVA model:

Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε.

However, suppose there was some unknown confounder V that wedidn’t account for: e.g. maybe 10 of the 90 seeds are less viable thanthe others.

But here, randomization of seeds to experimental treatments (NO2ˆ

age) will likely remove the effect of this confounder:

Prpseed i P NO2 ˆ age | V q “ Prpseed i P NO2 ˆ ageq.

Therefore,

PrpLen | NO2, age, V q “ PrpLen | NO2, ageq

Ed Kroc (UBC) Causal Inference May 22, 2019 44 / 48

Page 45: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

Here, we could propose a full 2-way ANOVA model:

Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε.

What about misspecifying the functional form of the model?

Not an issue in ANOVA of controlled, randomized experiments.

Notice: ANOVA model does not have to posit an explicit functionalform between response and covariates because all covariates arecategorized into finitely many, controlled factor levels.

Suppose Len is (positive, concave down) quadratically related to age.Then average treatment effects τage will increase quadratically overthe 5 fixed ages since we estimate the average effect for each fixedage.

Ed Kroc (UBC) Causal Inference May 22, 2019 45 / 48

Page 46: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

Contrast with observational protocol: if we cannot control the age ofthe plants, then we are forced to quantify the average effect of afunction of age on response, e.g.

Len “ β0 ` τNO2 ` βage ¨ age ` βN02ˆage ¨ τNO2 ¨ age ` ε

Such a regression model assumes a linear relationship between ageand response.

But since we have no control over age, we are forced to model allages simultaneously; this is much harder to do than to simplycalculate the average effect of age on response for a finite, fixednumber of age categories.

Ed Kroc (UBC) Causal Inference May 22, 2019 46 / 48

Page 47: EPSE 581C: Causal Inference for Applied Researchers

A return to controlled experiments

A natural idea may be to simply ad hoc categorize age; i.e. weobserve 90 plants in the wild with arbitrary ages, but then categorizeage after the fact into 3 categories: 0–9 days, 10–19 days, 20–29 days.

But this only fixes the problem if sample units are exchangeable (overnitrogen treatment and all possible confounders) within each ad hocage category.

However, is nitrogen level fixed in the wild? Probably not!

And older plants may be exposed to more light and water (otherwisethe plants would die before reaching 10 days of age).

Therefore, in order to ensure exchangeability of sample units overtreatments, we now have to account for these omitted variables, aswell as the functional relationships between them, and betweennitrogen. . . So we are back to our model misspecification issues.

Ed Kroc (UBC) Causal Inference May 22, 2019 47 / 48

Page 48: EPSE 581C: Causal Inference for Applied Researchers

Next time

The Neyman-Rubin causal model

Propensity scores

Ed Kroc (UBC) Causal Inference May 22, 2019 48 / 48