publ0055:introductiontoquantitativemethods lecture2 ... · insured health age female non_white...

101
PUBL0055: Introduction to Quantitative Methods Lecture 2: Causality Jack Blumenau and Benjamin Lauderdale 1 / 49

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

PUBL0055: Introduction to Quantitative Methods

Lecture 2: Causality

Jack Blumenau and Benjamin Lauderdale

1 / 49

Page 2: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Motivation

Does health insurance improve health outcomes?A critical issue in public policy is whether and how governments shouldprovide health care to their citizens. But is there a causal effect ofhealth insurance on actual levels of health? We will evaluate thisquestion, and use it as an example to illustrate strengths andweaknesses of experimental and observational research.

• Y (Dependent variable): Health

• What is the self-assessed health of an individual?

• X (Independent variable, or ”treatment”): Health insurance

• Does the individual have health insurance, or not?

2 / 49

Page 3: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Lecture Outline

Causality and counterfactuals

Difference in means and confounding

Randomized experiments and causality

Observational studies and causality

Conclusions

3 / 49

Page 4: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Causality and counterfactuals

Page 5: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Why causality?

• Cause-and-effect relationships are at the heart of some of the mostinteresting social science theories

• We can often express important research topics as a simplequestion: Does X cause Y?

• Does increasing the minimum wage decrease employment?• Does economic development lead to democracy?• Does the race of a politician affect their chances of being elected?

• Establishing causal effects using quantitative data is difficult!

• Today we focus on defining what causal effects are, and how wemight think about measuring them using quantitative data

4 / 49

Page 6: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Counterfactuals

Whenever a person makes a causal statement, they are contrasting whatthey observe (something factual) with what they believe they wouldobserve if a key condition was different (something counterfactual).

• “Austerity caused Brexit” (Fetzer, 2019)

• Observation: Austerity policy in the UK and a vote to leave the EU• Belief: Without austerity, smaller Leave vote

• “Oil wealth inhibits democracy” (Ross, 2013)

• Observation: Oil-rich countries tend to be less democratic• Belief: If countries had less oil they would be more democratic

Problem: We can’t observe counterfactual outcomes! Causal inferencerequires estimating counterfactuals for comparison to realised outcomes.

5 / 49

Page 7: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects

We can define the causal effect of an independent variable𝑋 on anoutcome variable 𝑌 by considering the potential outcomes of 𝑌Potential outcomesThe potential outcomes of 𝑌 are the values of 𝑌 that would be realisedfor different values of𝑋. e.g.

• 𝑌𝑖(1) = the value that 𝑌𝑖 would have been if𝑋𝑖 was equal to 1

• 𝑌𝑖(0) = the value that 𝑌𝑖 would have been if𝑋𝑖 was equal to 0

Treatment effectFor any given individual, if we could observe both potential outcomes,the treatment effect of X on Y for that individual can be calculated as

𝑌𝑖(1) − 𝑌𝑖(0)

6 / 49

Page 8: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

What are the potential outcomes in our health insurance example?

• 𝑌𝑖(1) = Health of individual 𝑖 if they have health insurance• 𝑌𝑖(0) = Health of individual 𝑖 if they do not have health insurance

What are the treatment effects?

• If 𝑌𝑖(1) > 𝑌𝑖(0) then insurance improves health• If 𝑌𝑖(1) < 𝑌𝑖(0) then insurance worsens health• If 𝑌𝑖(1) = 𝑌𝑖(0) then insurance has no effect on health

7 / 49

Page 9: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1

5 3 2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 10: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5

3 2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 11: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3

2

2 1

5 4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 12: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1

5 4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 13: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5

4 1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 14: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4

1

3 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 15: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0

3 3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 16: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3

3 0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 17: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3

0

4 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 18: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0

4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 19: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4

3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 20: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3

1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 21: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 22: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Potential outcomes and causal effects (example)

• 𝑋𝑖 = 1 if the individual is insured, and𝑋𝑖 = 0 if uninsured• 𝑌𝑖(1) is the health of the individual if they were insured• 𝑌𝑖(0) is the health of the individual if they were uninsured• The treatment effect for an individual is 𝑌𝑖(1) − 𝑌𝑖(0)

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

8 / 49

Page 23: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1

5 ? ?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 24: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5

? ?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 25: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ?

?

2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 26: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1

5 ? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 27: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5

? ?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 28: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ?

?

3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 29: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0

? 3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 30: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ?

3 ?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 31: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3

?

4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 32: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0

? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 33: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ?

3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 34: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ? 3

?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 35: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 36: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

But we cannot observe both potential outcomes for any individual!

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Causal effect

1 1 5 ? ?2 1 5 ? ?3 0 ? 3 ?4 0 ? 3 ?

Average treatment effect (ATE)= ?+?+?+?4 = ?

4 = ?

9 / 49

Page 37: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

The Fundamental Problem of Causal Inference

The Fundamental Problem of Causal InfereceWe only ever observe one potential outcome for a given individual, andour observed outcome depends on the status of our explanatoryvariable. We can never directly observe individual causal effects.

10 / 49

Page 38: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

11 / 49

Page 39: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference in means and confounding

Page 40: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Quantity of Interest

We want to estimate the average treatment effect (ATE):

1𝑁

𝑁∑𝑖=1

{𝑌𝑖(1) − 𝑌𝑖(0)}

But we can’t observe 𝑌𝑖(1) and 𝑌𝑖(0) for any given unit!One alternative is to use the difference-in-means to estimate the ATE:

𝑌𝑋=1 − 𝑌𝑋=0

where 𝑌𝑋=1 and 𝑌𝑋=0 are the average observed outcomes for treatedand control units, respectively.

Question: Is the difference-in-means equal to the ATE?

12 / 49

Page 41: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1Difference-in-means= 5+5

2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE. Why?

13 / 49

Page 42: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1

Difference-in-means= 5+52 − 3+3

2 = 5 − 3 = 2No! The difference-in-means is larger than the ATE. Why?

13 / 49

Page 43: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1Difference-in-means= 5+5

2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE. Why?

13 / 49

Page 44: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1Difference-in-means= 5+5

2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE.

Why?

13 / 49

Page 45: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Using the difference-in-means to estimate the ATE

Question: Is the difference-in-means likely to equal the ATE?

Individual 𝑋𝑖 𝑌𝑖(1) 𝑌𝑖(0) Treatment effect

1 1 5 3 22 1 5 4 13 0 3 3 04 0 4 3 1

Average treatment effect (ATE)= 2+1+0+14 = 4

4 = 1Difference-in-means= 5+5

2 − 3+32 = 5 − 3 = 2

No! The difference-in-means is larger than the ATE. Why?

13 / 49

Page 46: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Confounding

Confounding differences between treatment and control groups meanthat the difference-in-means can give a biased estimate of the ATE.

1. Causal effect of job training on employment

• Finding: People with training are more likely to be employed• Confounding: People with training may be more motivated• Direction of bias: Positive, we overstate the effectiveness of training

2. Causal effect of aspirin on headache symptoms

• Finding: People who take aspirin report worse headache symptoms• Confounding: People who took aspirin had headaches to begin with!• Direction of bias: Negative, we understate the effectiveness of aspirin

→ confounding bias can be positive or negative!

Question: Is confounding likely in the health care example?

14 / 49

Page 47: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Confounding bias example

Does health insurance improve health outcomes?The National Health Interview Survey (NHIS) is an annual survey of theUS population that asks questions about health and health insurance.We are interested in two questions in particular:

• Y (Dependent variable): health

• “Would you say your health in general is excelent (5), very good (4),good (3), fair (2), or poor (1)?”

• X (Independent variable): insured

• “Do you have health insurance?” TRUE = Insured, FALSE = Notinsured

We will also use information from some of the other questions on thesurvey (gender, income, race, etc).

15 / 49

Page 48: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Confounding bias example

Here are the first six rows of the NHIS data:

Table 4: NHIS data

id insured health age female years_educ non_white income

1 FALSE 4 29 TRUE 14 FALSE 19282.932 FALSE 4 35 FALSE 11 FALSE 19282.933 TRUE 3 32 FALSE 12 FALSE 167844.534 TRUE 3 34 TRUE 16 FALSE 167844.535 TRUE 4 45 FALSE 12 FALSE 85985.786 TRUE 4 44 TRUE 12 FALSE 85985.78

16 / 49

Page 49: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-means

To calculate the difference in means between the two groups we will use:

• the mean() function• the subsetting ([]) operator

## Mean health level for insured individualsmean(nhis$health[nhis$insured == TRUE])

## [1] 3.943847

## Mean health level for non-insured individualsmean(nhis$health[nhis$insured == FALSE])

## [1] 3.617647

Insured individuals report health levels that are approximately 10%better than non-insured individuals.

17 / 49

Page 50: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-means

To calculate the difference in means between the two groups we will use:

• the mean() function• the subsetting ([]) operator

## Mean health level for insured individualsmean(nhis$health[nhis$insured == TRUE])

## [1] 3.943847

## Mean health level for non-insured individualsmean(nhis$health[nhis$insured == FALSE])

## [1] 3.617647

Insured individuals report health levels that are approximately 10%better than non-insured individuals.

17 / 49

Page 51: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-means

To calculate the difference in means between the two groups we will use:

• the mean() function• the subsetting ([]) operator

## Mean health level for insured individualsmean(nhis$health[nhis$insured == TRUE])

## [1] 3.943847

## Mean health level for non-insured individualsmean(nhis$health[nhis$insured == FALSE])

## [1] 3.617647

Insured individuals report health levels that are approximately 10%better than non-insured individuals.

17 / 49

Page 52: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-means

Questions:

1. Can we interpret the difference in means here as causal?2. How might we assess the possibility of confounding bias here?

18 / 49

Page 53: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Checking “balance”

One implication of confounding: treatment and control units will bedifferent with respect to characteristics other than the treatment.

We can check this by evaluating the degree of balance for treated andcontrol units on different pre-treatment variables.

1. Calculate ��𝑇=1 (average value of pre-treatment variable for treated)2. Calculate ��𝑇=0 (average value of pre-treatment variable for control)3. Large differences between ��𝑇=1 and ��𝑇=0 suggest confounding

For example, for age:mean(nhis$age[nhis$insured == TRUE]) -

mean(nhis$age[nhis$insured == FALSE])

## [1] 2.318222

→ The insured are, on average, 2.3 years older than the uninsured

19 / 49

Page 54: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Checking “balance”

One implication of confounding: treatment and control units will bedifferent with respect to characteristics other than the treatment.

We can check this by evaluating the degree of balance for treated andcontrol units on different pre-treatment variables.

1. Calculate ��𝑇=1 (average value of pre-treatment variable for treated)2. Calculate ��𝑇=0 (average value of pre-treatment variable for control)3. Large differences between ��𝑇=1 and ��𝑇=0 suggest confounding

For example, for age:mean(nhis$age[nhis$insured == TRUE]) -

mean(nhis$age[nhis$insured == FALSE])

## [1] 2.318222

→ The insured are, on average, 2.3 years older than the uninsured

19 / 49

Page 55: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Checking “balance”

insured health age female non_white years_educ income

Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6Insured 4.0 43.3 49.2 15.3 14.3 101315.4Difference 0.3 2.3 2.8 -2.4 2.7 58422.9

Insured individuals are• less likely to be non-white (15.3%) than the non-insured (17.8%)• more likely to be female (49.2%) than the non-insured (46.4%)• older (43) than non-insured (41)• more educated (14.3 years) than the non-insured (12 years)• significantly richer ($101315) than the non-insured ($42893)

Implications:1. Race, gender, age, education and income are all imbalanced withrespect to insurance status.

2. Confounding is very likely a problem in this data.20 / 49

Page 56: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

21 / 49

Page 57: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomized experiments and causality

Page 58: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Observational studies and randomized experiments

Randomized experiments• Units are randomly assigned todifferent X values

• Researchers directly intervenein the world they study

• Gold standard for causalinference

Observational studies• Units are assigned to X values“by nature”

• Researchers observe, but donot intervene in, the world

• Very commonly used in socialscience research

22 / 49

Page 59: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias

We saw previously that, in general, the difference-in-means will not be anunbiased estimate of the ATE because of confounding.

Question: How can we avoid this problem?

Answer: Randomize!

Key intuitionRandomization of treatment doesn’t eliminate differences betweenindividuals, but ensures that the mix of individuals being compared isthe same on average.

23 / 49

Page 60: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias

We saw previously that, in general, the difference-in-means will not be anunbiased estimate of the ATE because of confounding.

Question: How can we avoid this problem? Answer: Randomize!

Key intuitionRandomization of treatment doesn’t eliminate differences betweenindividuals, but ensures that the mix of individuals being compared isthe same on average.

23 / 49

Page 61: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias

We saw previously that, in general, the difference-in-means will not be anunbiased estimate of the ATE because of confounding.

Question: How can we avoid this problem? Answer: Randomize!

Key intuitionRandomization of treatment doesn’t eliminate differences betweenindividuals, but ensures that the mix of individuals being compared isthe same on average.

23 / 49

Page 62: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias

When the treatment,𝑋, is randomly assigned to units…

• … treated and untreated groups will be similar, on average, in termsof all characteristics (both observed and unobserved)

• … the only systematic difference between the two will be the receiptof treatment

• … the average outcome for controls will be similar to what wouldhave happened for the treatment group if they had not been treated

→ we can use the difference in means to estimate the ATE!

24 / 49

Page 63: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

All units

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 64: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

1 sample

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 65: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

2 samples

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 66: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

5 samples

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 67: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

100 samples

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 68: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

1000 samples

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 69: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias (visualization)

0 5 10 15 20

010

2030

40

Potential Outcome (Control)

Cou

nt

0 5 10 15 20

05

1015

2025

3035

6000 samples

Potential Outcome (Treatment)

Cou

nt

4.0 4.5 5.0 5.5 6.0 6.5

050

100

150

Difference in means

Cou

nt

True ATE

25 / 49

Page 70: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Randomization and selection bias

• Randomization of the treatment makes the difference in groupmeans an unbiased estimator of the true ATE

• On average, you can expect a randomized experiment to get theright answer

• This does not guarantee that the answer you get from any particularrandomization will be exactly correct!

26 / 49

Page 71: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Example – Experimental data

Does health insurance improve health outcomes?The RAND Health Insurance Experiment (RAND) was an experimentconducted between 1974 and 1982 in the US. In this experiment,researchers randomly allocated individuals to receive health insurance.

• Y (Dependent variable): health

• “Would you say your health in general is excelent (5), very good (4),good (3), fair (2), or poor (1)?”

• X (Independent variable): insured

• Was the participant randomly allocated to receive health insurance?TRUE = Insured, FALSE = Not insured

We will again use information from some of the other questions on thesurvey (gender, income, race, etc).

27 / 49

Page 72: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Experimental data

Here are the first six rows of the RAND data:

Table 6: RAND data

id insured health age female years_educ non_white income

1 FALSE 4 42 FALSE 12 TRUE 67486.4842 FALSE 4 43 TRUE 12 TRUE 67486.4843 TRUE 2 38 TRUE 12 TRUE 27608.1074 TRUE 3 24 FALSE 12 TRUE 4322.2035 TRUE 4 60 TRUE 9 TRUE 24540.5416 TRUE 4 59 FALSE 4 TRUE 24540.541

28 / 49

Page 73: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-means - Experimental data

Let’s calculate the difference in means again using the experimental data:

## Mean health level for insured individualsmean(rand$health[rand$insured == TRUE])

## [1] 3.405373

## Mean health level for non-insured individualsmean(rand$health[rand$insured == FALSE])

## [1] 3.423304

The difference between insured and non-insured individuals almostcompletely disappears in the experimental data!

29 / 49

Page 74: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Results - Experimental data

insured health age female non_white years_educ income

Uninsured 3.4 33.1 56.0 16.9 12.1 32597.2Insured 3.4 33.2 53.4 17.1 12.0 31220.9Difference 0.0 0.1 -2.6 0.2 -0.1 -1376.3

Insured and non-insured individuals• are equally likely to be non-white• are equally likely to be female• have similar ages• have similar years of education• have similar income levels.• and also report very similar health outcomes!

Implications:1. There is less evidence of imbalance in the experimental data2. The experimental effect of insurance is much smaller

30 / 49

Page 75: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Observational and Experimental Results Compared

Health

Age

Female

Non−White

Education

Income

−60 −40 −20 0% difference between non−insured and insured

ExperimentalObservational

31 / 49

Page 76: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Why not always use experiments?

If randomized experiments are the “gold standard” for causal inference,why not always use them?

1. Practical concerns

• It is often infeasible to randomly vary the “treatment” you care about• e.g. Could you randomize the type of electoral system in a country?

2. Ethical concerns

• Experiments involving real people can be ethically dubious• e.g. Manipulating the type of messages people see on Facebook

3. Cost concerns

• Experiments can be expensive in both money and time• Particularly relevant to planning dissertation projects

32 / 49

Page 77: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

33 / 49

Page 78: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Observational studies and causality

Page 79: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Observational research designs

Researchers using observational data cannot rely on randomization tosolve the selection bias problem so they must employ alternative designs:

1. Cross-sectional designs

• Compare outcomes for treated and control units, possibly adjustingfor pre-treatment characteristics

2. Before and after designs

• Compare outcomes before and after treatment for treated unitsusing over-time data

3. Difference-in-differences designs

• Compare outcomes before and after treatment for treated units, andcompare to the same change over time for control units

34 / 49

Page 80: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Cross-sectional designs

• Data: One observation per unit (i.e. individuals, countries, firms, etc),many units

• Approach: Compare average outcomes between treatment andcontrol units, often “controlling” for potential confounders

• Assumption: No unmeasured confounding between treated andcontrol units

35 / 49

Page 81: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Statistical control (example)

Our observational difference-in-means estimate is relatively large:

mean(nhis$health[nhis$insured == T]) -mean(nhis$health[nhis$insured == F])

## [1] 0.3262003

But insurance status is imbalanced with respect to income:

mean(nhis$income[nhis$insured == T]) -mean(nhis$income[nhis$insured == F])

## [1] 58422.87

How can we control for income?

36 / 49

Page 82: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Subclassification (example)

Subclassification: calculate differences between treatment and controlwithin levels of a confounding variable.

Imagine that we have just 3 levels of income (low, mid and high).

Caluclate the average health of insured and uninsured for each level:

insured_low_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "Low"])

uninsured_low_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "Low"])

insured_low_mean - uninsured_low_mean

## [1] 0.1010293

37 / 49

Page 83: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Subclassification

Subclassification: calculate differences between treatment and controlwithin levels of a confounding variable.

Imagine that we have just 3 levels of income (low, mid and high).

Caluclate the average health of insured and uninsured for each level:

insured_mid_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "Mid"])

uninsured_mid_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "Mid"])

insured_mid_mean - uninsured_mid_mean

## [1] 0.04954519

37 / 49

Page 84: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Subclassification

Subclassification: calculate differences between treatment and controlwithin levels of a confounding variable.

Imagine that we have just 3 levels of income (low, mid and high).

Caluclate the average health of insured and uninsured for each level:

insured_high_mean <- mean(nhis$health[nhis$insured == T &nhis$income_cat == "High"])

uninsured_high_mean <- mean(nhis$health[nhis$insured == F &nhis$income_cat == "High"])

insured_high_mean - uninsured_high_mean

## [1] 0.1542666

37 / 49

Page 85: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Subclassification

insured_low_mean - uninsured_low_mean

## [1] 0.1010293

insured_mid_mean - uninsured_mid_mean

## [1] 0.04954519

insured_high_mean - uninsured_high_mean

## [1] 0.1542666

Consequence: Once we control for income, the effects of insurance onhealth are much smaller than in the naive difference-in-meanscomparison.

38 / 49

Page 86: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Cross-sectional designs summary

1. Cross-sectional data (one observation per unit) is very common inthe social sciences

2. Using statistical “control” strategies is a common approach forattempting to overcome confounding bias

3. Strong assumptions required: we must control for all potentialconfounders to be confident in the causal claims we are making

4. We will revisit these designs in weeks 4, 5 and 6, particularly in thecontext of regression models

39 / 49

Page 87: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Before and after designs

• Data: Many observations of treated units over time

• Approach: Difference in average outcomes before and aftertreatment

• Assumption: No time varying confounding/selection bias

• Example: A health care reform that provides all citizens withcoverage at a given point in time.

40 / 49

Page 88: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Before and after designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

41 / 49

Page 89: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Before and after designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

Treatment effect?

41 / 49

Page 90: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Before and after designs

Advantages:

• Holds unit-specific features “constant” by comparing outcomeswithin units over time

• Only requires data on treated units

Disadvantages:

• Requires assuming that nothing else changed for treated units

• E.g. Maybe health outcomes are just getting better over time?

• Requires data from multiple time points

42 / 49

Page 91: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-differences designs

• Data: Many observations of many units, some of which are treatedat some point in time

• Approach: Difference in average outcomes before and aftertreatment for treated group, compared to same difference forcontrol group

• Assumption: Without treatment, treatment group outcomes wouldhave mirrored those in control group (“parallel trends”)

• Example: A health care reform that provides citizens of some stateswith health care coverage at a given point in time. Healthcare ofcitizens in non-treated states is not affected.

43 / 49

Page 92: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-differences designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

Treated states

44 / 49

Page 93: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-differences designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

Treated states

Control states

44 / 49

Page 94: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-differences designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

Treated states

Control states

Counterfactual

44 / 49

Page 95: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference-in-differences designs

0.0

0.5

1.0

1.5

2.0

2.5

Ave

rage

Hea

lth

Pre−treatment Reform Post−treatment

Treatment effect

44 / 49

Page 96: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Difference in differences designs

Advantages:

• Holds unit-specific features “constant” by comparing outcomeswithin units over time

• Removes confounding by shocks that are common to all units(treatment and control)

• Does not assume that outcomes remain constant over time

Disadvantages:

• Requires assuming that nothing else changed only for treated units

• E.g. Did other policies change in treatment states at the same time?

• Requires data from multiple time points and multiple types of units

45 / 49

Page 97: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Observational designs and assumptions

Each of these approaches to using observational data requireassumptions.

• Cross-sectional: No confounding

• Treatment and control groups should be similar in observed andunobserved characteristics

• Before/after: No time-varying confounding

• Without treatment, treatment units would not have changedbetween pre- and post-treatment periods

• Difference-in-differences: Parallel trends

• Without treatment, treatment units would have followed a similartrajectory as control units

46 / 49

Page 98: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

External and internal validity

Internal validity: Can we interpret estimates in a study as causal?

• Randomized experiments have high internal validity→ treatmentand control groups are similar on average

• Observational studies have lower internal validity→ pre-treatmentvariables may differ between treatment and control

External validity: How generalizable are the conclusions of a study?

• Randomized experiments tend to have lower external validity→difficult to conduct on representative samples

• Observational studies tend to have higher external validity→easier to use representative samples or the population itself

Picking a research design requires trading off between these goals.

47 / 49

Page 99: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Conclusions

Page 100: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

What have we covered?

• Causality can be understood as a process of counterfactualreasoning

• We can understand causal effects as being the differences inpotential outcomes for units with different treatment statuses

• Randomized experiments are the “gold standard” for causalinference

• Observational approaches can also be used for making causalclaims, but they require strong assumptions

• Causal inference is hard. But at least now you know that it is hard.

48 / 49

Page 101: PUBL0055:IntroductiontoQuantitativeMethods Lecture2 ... · insured health age female non_white years_educ income Uninsured 3.6 41.0 46.4 17.8 11.6 42892.6 Insured 4.0 43.3 49.2 15.3

Seminar

In seminars this week, you will learn about …

1. … working directories

2. … loading data into R

3. … calculating causal effects

4. … thinking critically about causality in empirical research

49 / 49