chapter 10 sample selection bias

23
Chapter 10 Sample Selection Bias

Upload: others

Post on 20-Oct-2021

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 10 Sample Selection Bias

Chapter 10Sample Selection Bias

Page 2: Chapter 10 Sample Selection Bias

Learning Objectives

• Articulate in words the implications of a nonrepresentative sample

• Explain using mathematics the implications of a nonrepresentative sample

• Apply a model of sample selection to correct sample selection bias

• Describe the experiment you would like to run if you could

Page 3: Chapter 10 Sample Selection Bias

Population: y = -1.80x + 925.3

500

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60 70 80 90 100

API S

core

FLE (%)

API Score and Free Lunch Eligibility

Back to CA Elementary Schools

Page 4: Chapter 10 Sample Selection Bias

Population: y = -1.80x + 925.3

Sample: y = -1.83x + 950.4

500

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60 70 80 90 100

API S

core

FLE (%)

A Non-Representative Sample of API Score and Free Lunch Eligibility

Selecting Observations with Positive Errors

Page 5: Chapter 10 Sample Selection Bias

Selecting Observations with from Schools With Few College Graduates

Population: y = -1.80x + 925.3

y = -0.6937x + 815.77

500

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60 70 80 90 100

API S

core

FLE (%)

API Score and Free Lunch Eligibility (FLE)

Page 6: Chapter 10 Sample Selection Bias

Selecting Observations Based on API

Population: y = -1.80x + 925.3

High API Subpopulation: y = -0.52x + 940.9

500

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60 70 80 90 100

API S

core

FLE (%)

Figure 10.3. Subpopulation with API>900

Page 7: Chapter 10 Sample Selection Bias

Selecting Observations Based on FLE

Population: y = -1.80x + 925.3

Low FLE Subpopulation: y = -1.81x + 925.7

500

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60 70 80 90 100

API S

core

FLE (%)

Figure 10.5. Subpopulation with FLE<50%

Page 8: Chapter 10 Sample Selection Bias

Examples of Non-Representative Sample Selection

• Y = wages, X = education– What if you sample only women? … only the employed?

• Y = GDP growth, X = interest rates– What if you use data from the 1970s to predict 2017?

• Y = body mass index, X = food stamps

– What if you use data only from those who got food stamps?

Answer: Depends on β1 in the population you sampled.

0 1 , cov[ , ] 0β β ε ε= + + =i i i i iY X X

Page 9: Chapter 10 Sample Selection Bias

Sample Selection Bias

• Non-representative sampling causes bias if

in the population you are sampling from

0 1 , cov[ , ] 0β β ε ε= + + =i i i i iAPI FLE FLE

cov[ , ] 0ε ≠i iFLE

Page 10: Chapter 10 Sample Selection Bias

Two Populations

• Population model of interest:

• If you sample from districts with low parental education levels, your population model is

• Is within the low population?

0 1 , cov[ , ] 0β β ε ε= + + =i i i i iAPI FLE FLE

0 1 cov[ , ] 0β β ε ε= + + =low low low lowi i i i iAPI FLE FLE

cov[ , ] 0ε =i iFLE

Page 11: Chapter 10 Sample Selection Bias

Sample Selection Bias

• Put the models together:

• Or

• If , then for the lowpopulation

0 1β β ε= + +i i iAPI FLE

0 1β β ε= + +low low lowi i iAPI FLE

0 1 0 1low low low

i i i iFLE FLEβ β ε β β ε+ + = + +

( ) ( )0 0 1 1ε β β β β ε= − + − +low low lowi i iFLE

1 1β β≠ low cov[ , ] 0ε ≠i iFLE

Page 12: Chapter 10 Sample Selection Bias

Sample Selection Bias

• Selecting your sample in a way that makes the error correlated with the explanatory variable will cause bias in your slope parameter.

0 1β β ε= + +i i iAPI FLE

0 1β β ε= + +low low lowi i iAPI FLE

Page 13: Chapter 10 Sample Selection Bias

Sample Selection Bias

• Selecting your sample in a way that makes the error correlated with the explanatory variable will cause bias in your slope parameter.

• Return to figures from earlier– Selecting only if εi > 0 => no bias– Selecting only if low parental education => bias– Selecting only if API large => bias– Selecting only if FLE low => no bias

0 1β β ε= + +i i iAPI FLE

Page 14: Chapter 10 Sample Selection Bias

Solutions to Sample Selection Bias

• Get better data– e.g., run your own experiment

• Make some extra assumptions and do a Heckman correction

Page 15: Chapter 10 Sample Selection Bias

What experiment would you run?

• Goal: Estimate whether offering free lunchto low-income families improves testscores (and how much it improvesthem)

• ?

Page 16: Chapter 10 Sample Selection Bias

Example of Correction for Sample Selection

• Mexican migrants to the US send more than $25b per year back to family and friends in Mexico

• What determines migrant remittances?

• Education, family size, gender, work experience?

Page 17: Chapter 10 Sample Selection Bias

A Model of Remittances

• Potential selection problem: – we only observe remittances from those who

migrated– may not be representative of potential future

migrants

• What is our population of interest?– Mexicans already in the US? No problem– Potential migrants? Selection problem

0 1 1 2 2 3 3β β β β ε= + + + +i i i i iR X X X

Page 18: Chapter 10 Sample Selection Bias

A Model of Remittances• We observe

• We don’t see the remittances from non-migrants

• But we can model the probability of migration if we have data on non-migrants

0 1 1 2 2 3 3

0 1 1 2 2 3 3

if 10 if 0

β β β β ε

α α α α

+ + + + == =

= + + + +

i i i i ii

i

i i i i i

X X X IR

I

I Z Z Z u

Page 19: Chapter 10 Sample Selection Bias

Heckman Solution

0 1 1 2 2 3 3

0 1 1 2 2 3 3 4ˆ

β β β β ε

β β β β β ε

= + + + +

= + + + + +

i i i i i

i i i i i i

R X X X

R X X X I

• Population model

• Estimate this model

• The selection model controls for the probability that the person is in the migrant sample

Page 20: Chapter 10 Sample Selection Bias

Model of Migration

Variable Estimated

Coefficient

t-Statistic

Migrant from the household 5 years ago 0.135 26.96

Number of family members 0.000 0.35

Gender (1 = male, 0 = female) 0.118 17.55

Years of education 0.000 0.04

Potential work experience 0.001 1.42

Potential work experience-squared 0.000 –4.03

Page 21: Chapter 10 Sample Selection Bias

Modeling RemittancesVariable Model A: Without

SelectionModel B: with

SelectionEstimated Coefficient t-Statistic

Estimated Coefficient t-Statistic

# children in household 175.88 1.37 115.75 0.89

Gender (= 1 if male) 967.62 2.32 1,416.67 3.17

Years of education 39.87 0.57 34.58 0.5

Potential work experience 48.17 0.79 69.53 1.13

Potential work experience2 –0.76 –0.64 –1.27 –1.06

Constant –2,408.17 –0.5 –2,336.85 –0.49

Predicted probability NA –3,516.74 –2.77

Page 22: Chapter 10 Sample Selection Bias

Remittances• People with migrants in their family are more likely

to migrate. Males also more likely to migrate.

• Males remit 968 more than females on average– Likely migrants (e.g., those with family history of

migration) remit less– Randomly selected males would not have this family

history and would remit more (1417)

• Education and work experience don’t affect remittances significantly

Page 23: Chapter 10 Sample Selection Bias

What We Learned• Nonrepresentative samples cause estimates of population

coefficients to be biased if you sample from a subpopulation that has nonzero correlation between the X variable and errors. – Examples include sampling on outcomes of the Y variable and

(sometimes) selection on X variables not in the model.

• If you can build a model of selection into the sample, you may be able to correct for selection bias.

• Experiments can solve the sample selection problem in theory. Imagining the perfect experiment can help you design a defensible econometric modeling strategy.