proev_han_f14.pdf

16
Economics 2810a Lawrence Katz Handout for Lecture #7 9/22/14 PROGRAM EVALUATION: METHODS AND APPLICATIONS Outline 1. The Program Evaluation Problem --Basic Set-Up, Selection Bias, and Different Treatment Effects: ITT vs. TOT --Randomized Social Experiments and Eligibility Randomization --Example: Moving to Opportunity – Kling, Liebman, and Katz (2007 EMA), KKL (2001 QJE) --Instrumental Variables and Local Average Treatment Effects --Natural or Quasi Experiments --Differences-in-Differences 2. Estimating the Labor Market Returns to Training Programs --Ashenfelter-Card (1985): Traditional Nonexperimental Methods -- CETA --LaLonde (1986): Experimental vs. Nonexperimental Methods -- NSW --Dehejia-Wahba: (1999) Propensity Score Methods -- NSW 3. Regression Discontinuity Methods – Imbens and Lemieux (2008) --Ludwig and Miller (2007): Estimating the Long-Run Impacts of Head Start 4. Estimating the Effects of Class Size on Student Achievement -- Krueger (1999): Experimental – Tennessee STAR --Angrist and Lavy (1999): Regression Discontinuity in Cross-Section Setting --Hoxby (2000): Regression Discontinuity in Panel Setting 5. Estimating Teacher Impacts on Student Achievement – Kane and Staiger (2008) 6. Estimating the Effects of Attending a Catholic High School --Altonji et al. (2005): Sorting on Unobservables "Similar" to Sorting on Observables I. The Program Evaluation Problem: Basic Set Up, Selection Bias and Treatment Effects An individual can be in either a treated state "1" or an untreated state "0": Y 0i = outcome for i without the treatment or program Y 1i = outcome for i with the treatment or program d i = 1 if receive treatment and 0 if does not The actual observed outcome for i is given by: Y i = d i Y 1i + (1-d i )Y 0i The causal effect (gain or loss) from treatment for an individual i is given by α i = Y 1i - Y 0i Average Treatment Effect (ATE) = expected gain to a randomly selected person from entire population ATE = E[Y 1i - Y 0i ] Mean Effect of Treatment on the Treated (TOT) or Selected Average Treatment Effect (SATE): TOT = E[Y 1i - Y 0i | d i =1] = E[α i | d i =1] Selection bias problem in estimating TOT: The standard approach (ignoring covariates) is to compare the mean post-program outcome (earnings) of the treatment group (trainees) and the comparison group (non- trainees):

Upload: berty12us

Post on 28-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: proev_han_f14.pdf

Economics 2810a Lawrence Katz Handout for Lecture #7 9/22/14

PROGRAM EVALUATION: METHODS AND APPLICATIONS Outline 1. The Program Evaluation Problem --Basic Set-Up, Selection Bias, and Different Treatment Effects: ITT vs. TOT --Randomized Social Experiments and Eligibility Randomization --Example: Moving to Opportunity – Kling, Liebman, and Katz (2007 EMA), KKL (2001 QJE) --Instrumental Variables and Local Average Treatment Effects --Natural or Quasi Experiments --Differences-in-Differences 2. Estimating the Labor Market Returns to Training Programs --Ashenfelter-Card (1985): Traditional Nonexperimental Methods -- CETA --LaLonde (1986): Experimental vs. Nonexperimental Methods -- NSW --Dehejia-Wahba: (1999) Propensity Score Methods -- NSW 3. Regression Discontinuity Methods – Imbens and Lemieux (2008) --Ludwig and Miller (2007): Estimating the Long-Run Impacts of Head Start 4. Estimating the Effects of Class Size on Student Achievement -- Krueger (1999): Experimental – Tennessee STAR --Angrist and Lavy (1999): Regression Discontinuity in Cross-Section Setting --Hoxby (2000): Regression Discontinuity in Panel Setting 5. Estimating Teacher Impacts on Student Achievement – Kane and Staiger (2008) 6. Estimating the Effects of Attending a Catholic High School --Altonji et al. (2005): Sorting on Unobservables "Similar" to Sorting on Observables I. The Program Evaluation Problem: Basic Set Up, Selection Bias and Treatment Effects An individual can be in either a treated state "1" or an untreated state "0": Y0i = outcome for i without the treatment or program Y1i = outcome for i with the treatment or program di = 1 if receive treatment and 0 if does not The actual observed outcome for i is given by: Yi = diY1i + (1-di)Y0i The causal effect (gain or loss) from treatment for an individual i is given by αi = Y1i - Y0i Average Treatment Effect (ATE) = expected gain to a randomly selected person from entire population ATE = E[Y1i - Y0i] Mean Effect of Treatment on the Treated (TOT) or Selected Average Treatment Effect (SATE): TOT = E[Y1i - Y0i | di=1] = E[αi | di=1] Selection bias problem in estimating TOT: The standard approach (ignoring covariates) is to compare the mean post-program outcome (earnings) of the treatment group (trainees) and the comparison group (non-trainees):

Page 2: proev_han_f14.pdf

2

E[Yi|di=1] - E[Yi|di=0]

= E[Y1i|di=1] - E[Y0i|di=1] + {E[Y0i|di=1] - E[Y0i|di=0]} = E[�i|di=1] + {E[Y0i|di=1] - E[Y0i|di=0]} The first term is the parameter of interest, but the term in brackets is the selection bias term. In the case of random assignment of treatment it is zero (up to sampling error when population moments are replaced by sample moments). If assignment is nonrandom, then omitted variables that affect both Y0i and selection into the program will generate selection bias. Selection bias arises when the non-participants differ from the participants in the non-participant state. Linear Model with Constant Treatment Effect (simplifying assumption): Y0i = Xi� + ui where Xi are observed covariates and ui are unobserved outcome determinants. E[ui|Xi] = 0 by construction. The actual outcome for i (Yi) is then given by: Yi = Y0i + di� (constant treatment effect assumption) The goal is to estimate �. Selection bias is present (ignoring observed covariates for the moment) if E[Y0i | di=1] � E[Y0i | di=0] or E[ui | di] � 0 Selection bias is present with observed covariates: E[ui | di,Xi] � 0. If ui is correlated with selection into the program even after conditioning on observed covariates then selection bias will bias estimates of program effects. More General Model with Covariates with outcomes function of observables (X) and unobservables (u1, u0): Y1i = g1(Xi) + u1i Y0i = g0(Xi) + u0i where E(u1i) = E(u0i) = 0 TOT = E[Y1i - Y0i | Xi, di=1] = E(�i|Xi, di=1) = g1(Xi) - g0(Xi) + E[u1i - u0i| Xi, di=1] The TOT combines both the structure (the g0 and g1 functions) and the means of the error terms for the treated in this more general set-up. Experimental (random assignment) approaches allow the identification of the TOT but without further assumptions do not necessarily allow one to identify the underlying structural parameters of g1 and g0. Approaches to Estimating the Average Treatment Effect on the Treated (TOT): 1. Randomized Social Experiment: Random assignment of treatment among applicants to programs (those that would have participated). A randomized social experiment generates an experimental control group consisting of those persons who would have participated but were randomly denied access to the

Page 3: proev_han_f14.pdf

3

program or treatment - thus requires no randomization bias --so that randomization does not change pool of applicants or behavior per se. The control group provides an estimate of E[Y0|d=1]. Compare sample means of treatment and control group in experiment. Under ideal conditions, social experiments recover: F(Y0 | d=1,X) from the distribution of outcomes of the control group and F(Y1 | d=1, X) from the outcomes of the treatment group if randomization administered at application stage, no attrition, and no randomization bias. The experiment supplements missing data by providing an estimate of E[Y0|d=1,X] from the sample mean for the control group and E[Y1|d=1,X] from the mean of the treatment group (which is also available in observational studies). Thus the TOT can be estimated, but one can't recover the overall distribution of gains (treatment effects) F(�|d=1,X) without stronger additional assumptions. How does randomization identify the TOT? Drop i subscripts for ease of presentation. Let d*= 1 if person applies to program (would participate unless randomized out) R = 1 if randomized in, R = 0 if randomized out Assumption of No randomization bias: Let Y1

* and Y0* be the outcomes observed under a regime of

randomization: Absence of randomization bias for the mean gain in the program implies: E[Y1 | d=1,X] = E[Y1

* | d=1,X] E[Y0 | d=1,X] = E[Y0

* | d=1,X] Randomization operates conditional on d*=1 which is appropriate since we are trying to get the mean treatment effect for those who would participate in the program. E[Y | d*=1, R=1, X] = E[Y1 | d=1,X] = g1(X) + E[u1| d=1,X] E[Y | d*=1, R=0, X] = E[Y0 | d=1,X] = g0(X) + E[u0| d=1,X] Thus the difference of the mean of the treatment group and control group yields the TOT: E[Y | d*=1, R=1, X] - E[Y | d*=1, R=0, X] = E[Y1 - Y0|d=1,X] = E[�|d=1,X] 2. Eligibility Randomization: Randomization of eligibility to a program is sometimes a less disruptive approach to implementing a social experiment. Under this approach, eligibility is randomly assigned (say across hospitals or training centers), but then individuals and program operators can freely choose to participate. Eligibility randomization allows one to directly estimate the mean effect of eligibility for the program on the population included in the experiment: the effect of eligibility for the program on outcomes is known as the Intent to Treat (ITT) effect. Consider a population of persons normally eligible for a program. Let e=1 if the person is kept eligible after randomization and e=0 if the person loses eligibility. Let d* equal "willingness to participate" -- will participate if eligible then d*=1. Assume eligibility e is randomly assigned. Ignore other covariates. Actual participation d = ed*, only participate if eligible and willing to participate. Intent-to-Treat Effect (ITT) = E[Y | e=1] - E[Y | e=0] = difference in mean outcomes for eligibles and ineligibles if eligibility is randomly assigned

Page 4: proev_han_f14.pdf

4

Randomization of eligibility directly allows the estimation of the ITT. But can one estimate the TOT when there in an eligibility randomization experiment? Yes, but one needs additional assumptions: The TOT can be estimated from an eligibility randomization experiment under the assumptions that (1) treatment group (eligibility) assignment is truly random; (2) the effect of treatment group assignment on outcomes operates only through participating in the program (e.g., using a housing voucher or getting training) with no direct effect of eligibility per se; and (3) control group members (the ineligibles) cannot participate in the program. Under these assumptions, the difference in average outcomes of eligibles and ineligibles divided by fraction of eligibles who participate provides an unbiased estimate of the TOT: TOT = E[Y1 - Y0|d=1] = (E[Y|e=1] - E[Y|e=0])/P(d=1|e=1) = ITT/P(d=1|e=1) where P( ) is the probability function, so that P(d=1 | e=1) is the program participation rate. Proof: (*) E[Y|e=1] = E[Y1|d=1,e=1]P(d=1|e=1) + E(Yo|d=0,e=1)P(d=0|e=1) = E[Y1|d*=1,e=1]P(d*=1|e=1) + E(Yo|d*=0,e=1)P(d*=0|e=1) (**) E[Y|e=0] = E[Y0|d*=1,e=0]P(d*=1|e=0) + E(Yo|d*=0,e=0)P(d*=0|e=0) But P(d=1|e=1)= P(d*=1|e=1) = P(d*=1|e=0,X) and P(d=0|e=1)= P(d*=0|e=1) = P(d*=0|e=0,X) and E[Y1|d=1] = E[Y1|d=1,e=1] from eligibility randomization; and E[Yo|d=0,e=1] = E[Yo|d*=0,e=1] =E[Yo|d*=0,e=0] since no direct effects of eligibility. So the result follows from subtracting (**) from (*) and then substituting. Instrumental Variable interpretation of eligibility randomization experiment: Let Z be an instrument that affects participation d. Z is a legitimate instrument if uncorrelated with outcomes; Z only effects Y by affecting d: E[Y0|Z=z]=E[Y0] and E[Y1|Z=z]=E[Y1] and participation d is a non-trivial function of Z: E[d|Z=z] is non-trivial function of Z. If there exist values of Z in the set z0 that occurs with positive probability and under which Pr[di=0|Zi member of z0] = 0, then can estimate TOT by defining e=1 for Z not in z0 and e=0 for Z in Z0. What can one estimate if have a legitimate instrument that affects probability of participation and can be excluded from outcomes equation? The answer is one can estimate a Local Average Treatment Effect (LATE) equal to the average treatment effect on those that can be induced to change their behavior by change in the instrument. Zi is a random variable where P(w) = E[di| Zi = w] is a nontrivial function of w. Let treatment for i depend on value of instrument Z: di = di(Zi) LATE = �z,w = E[Y1i - Y0i | di(z) � di(w)] Expected treatment effect for individuals who change treatment status as instrument changes value from w to z.

Page 5: proev_han_f14.pdf

5

The LATE can be identified if Z is a legitimate instrument (can be excluded from the Y equations) and if have monotonicity condition: di(z)� di(w) for all i or di(z) � di(w) for all i. Thus we are assuming that there are no "non-compliers" in terms of Angrist-Imbens-Rubin (1996). Assume: di(z) > di(w): (***) E[Yi|Zi=z] - E[Yi|Zi=w] = z(P(z)-P(w)) * E[Y1i - Y0i | di(z) - di(w) = 1] The LATE is consistently estimated by the ratio of the difference in sample mean outcomes for those with values of z and w for the instrument over the difference in fraction who are treated. Proof: The monotonicity condition allows the expected value of Y given the instrument to be decomposed into 3 groups (never takers (1-P(z)), compliers (P(z)-P(w)), and always-takers (P(w))). The LATE is the average treatment effect for the compliers (those that change treatment status with different values of the instrument). E[Yi|Zi=z] = {P(z)-P(w)}*E[Y1i|di(z)-di(w)=1] + P(w)*E[Y1i|di(z)=di(w)=1] + {1-P(z))*E[Y0i|di(z)=di(w) =0] E[Yi|Zi=w] = {P(z)-P(w)}*E[Y0i|di(z)-di(w)=1] + P(w)*E[Y1i|di(z)=di(w)=1] + {1-P(z))*E[Y0i|di(z)=di(w) =0] Subtracting the second from the first equation yields equation (***). Case of binary instrument Z: LATE = E[Y1i - Y0i | di(1) - di(0) = 1] = (E[Yi|Zi=1] - E[Yi|Zi=0])/(P(1)-P(0)) One can't estimate LATE if there exists a fourth category of "defiers" (those with di(z)=0 but di(w)=1)-- which arises with a failure of the monotonicity assumption -- see Angrist, Imbens, Rubin (JASA, 96). Let Yi = Yi(Zi , di ) Exclusion restriction for instrument Z: Yi(1 , di ) = Yi(0 , di ) for d= 0, 1. The instrument Z only affects Y through D. -------------------------------------------------------------------------------------------------------------------------- Causal Effects of Z on Y for Population Units Classified by di(0) and di(1) di(0) 0 1 0 Yi(1 , 0 ) - Yi(0 ,0 ) = 0 Yi(1 , 0 ) - Yi(0 ,1 ) = - (Yi(1) - Yi(0)) Never-Taker Defier di(1) 1 Yi(1 ,1) - Yi(0 ,0 ) = Yi(1) -Yi(0) Yi(1 , 1 ) - Yi(0 ,1 ) = 0 Complier Always-Taker

Page 6: proev_han_f14.pdf

6

MTO Example (Eligibility Randomization Social Experiment): C = 1 = complier = d*=1; C = 0 = non-complier = d*=0 ITT = E[Y| Z=1] - E[Y|Z=0] TOT = E[Y| C=1,Z=1] - E[Y|C=1,Z=0] = ITT/P[d=1| Z=1] = ITT/P[C=1] TOT is estimated difference in outcomes between those who actually use the program (MTO voucher) those in the Control group who would have used the program (MTO voucher) if it had been offered to them. To assess the magnitude of the TOT effect in relative as well as absolute terms, it is useful to have a benchmark level of the outcome in the absence of treatment for comparison. In equation (6), we use the mean outcome for treated compliers and the TOT difference to impute the Control Complier Mean outcome (CCM). CCM = E[Y|C=1,Z=0] = E[Y|C=1,Z=1] - {E[Y|C=1,Z=1] - E[Y|C=1,Z=0]} = E[Y|C=1,Z=1] - TOT Although E[Y|C=1,Z=0] is not directly observable, E[Y|C=1,Z=1] and TOT can be estimated.

Page 7: proev_han_f14.pdf

7

II. Other Issues In Interpreting Program Evaluation Results: 1. Displacement Effects 2. Spillover Effects 3. Marginal Effects III. Alternative Nonexperimental Estimators: Linear Models with Access to Pre- and Post-Program Information for a Training Program – Latent Index Function Approach Yt = Xit� + uit

We shall assume a training effect that is invariant across individuals, but not time, so �it = �t. Observed earnings for i at t can be written as: (1) Yt = Xit� + + di�t + uit. We assume E[uit|Xit] = 0 for all i and t. Nonrandom assignment means selection bias can arise because of dependence between di and uit: (2) E[uit|di,Xit] � 0. The decision-making rule for program assignment can be described in terms of a latent index function INi that depends on both observed (Zi) and unobserved (vi) covariates: (3) INi = Zi� + vi ; where di = 1 iff INi>0 and di = 0 otherwise. Alternative nonexperimental estimators try to undo the dependence between uit and di by making alternative assumptions about the forms of equations (1), (2), and (3). Dependence between uit and di can arise for two reasons: (1) dependence between Zi and uit (selection on observables); and (2) dependence between vi and uit (selection on unobservables). Dependence on observables is easily solved by controlling for those observables; selection on unobservables is a more difficult problem. B. Selection on observables (Zi): E(uit|di,Xi) � 0 and E(uit|di,Xi,Zi) � 0, but E(uit|di,Xi,Zi) = E(uit|Xi,Zi). In this case, controlling for observed selection variables solves selection bias problem. The only issue is getting the right functional form for the control function: E(uit|Xi,Zi) and then insert this into equation (1) and estimate by regression methods. 1. Propensity score and blocking approach -- Dehejia and Wahba (1999) 2. Exact match comparison approach if can discretize observables -- Card and Sullivan (1988) C. Selection on unobservables (vi): E(uit|di,Xi) � 0 and E(uit|di,Xi,Zi) � E(uit|Xi,Zi). In this case, one needs assumptions about distributions of vi, uit, and Zi to get estimate of �t using Control Function Estimators (Heckits, control for Propensity score) or need an instrument (a variable in Z not included in X). The availability of longitudinal data (pre- and post-program) allows one to try alternatives such as (1) Fixed Effects estimates assuming selection based on permanent earnings components; (2) Random-growth estimator - allow individual specific trends to effect selection; or (3) Transitory shocks -- see

Page 8: proev_han_f14.pdf

8

Ashenfelter and Card (85) and by Heckman and Hotz (89). Benefits of pre-program data on comparison group - Heckman et al. (1998, EMA) "Natural or Quasi Experiments" (see Angrist-Krueger 1999; Meyer 1995)- Natural Experiments result when exogenous variation in independent variables of interest is created by (1) sharp exogenous shocks to markets (baby boom, Black Death, Mariel boatlift); (2) institutional quirks (e.g. draft lotteries; Maimonides rule for maximum class size in Israel); or (3) exogenous policy changes that affect some groups but not other groups (e.g. changes in maximum UI that leave the replacement rate unchanged for workers not at the maximum in one state but not another). Basic approach: A comparison of changes for treatment and comparison groups (differences-in-differences) or a further difference relative to placebo treatment and comparison groups (differences-in-differences-in-differences). All this can be done in a simple components of variance scheme (time effects, location effects, treatment group effects, placebo group effects, interaction terms) or by using an IV - instrument variables - strategy in which one instruments for the treatment dummy variable with the natural experiment indicator variables. IV Estimates can be interpreted as natural experiments: Legitimate instruments generate a natural experiment that assigns treatment in a manner independent of unobserved covariates: --Vietnam Draft Lottery and Effects of Military service on earnings (Angrist, 1990) --Date of Birth, Compulsory Schooling Laws, and Returns to Education (Angrist and Krueger, 1991) --Mariel boatlift (Card, 1990) and impact of mass immigration on local labor markets --prison overcrowding legislation to estimate impacts of incarceration on crime (Levitt, 1996) Diffs-in-Diffs and DDD examples: Mariel boatlift (high and low skill workers and low skill immigration): Treatment city Miami; Placebo city Atlanta (p) Experimentals: Low Education workers Controls: High Education workers Before After Experimentals E E’ Controls C C’ Diff-in-diff = (E’ - E) - (C’ -C) DDD = [(E’ - E) - (C’ -C)] - [(E’p - Ep) - (C’p - Cp)] Regression approach to DDD with covariates (X) where S= skill group, A = after, and M = Miami: Yit = Xit� + Sit�1 + Mit�2 +A t�3 + SitMit�4 +SitA t�5 +A tMit�6 + SitA tMit�7 +�it Some Inference issues in DD: (1) Grouped errors if use micro data (Moulton J. of Econometrics 1986) (2) Serial correlation (Bertrand, Duflo, Mullainathan QJE 2004) (3) Are the identification assumptions plausible? Pre-existing trends? How does one interpret natural experiments? They provide estimates of Local Average Treatment Effects - impact on those affected by treatment effects -marginal impacts. Key Issues with Instrumental Variables Models: (1) Bad instruments / instrument legitimacy (2) Weak instruments – Baker, Bound, Jaeger (JASA 1995)

Page 9: proev_han_f14.pdf

9

IV. Estimating the Labor Market Impacts of Training Programs Ashenfelter and Card (1985 RESTAT): They compare the earnings histories of 1976 adult male enrollees in Comprehensive Employment and Training Act (CETA) training programs to the earnings histories of non-experimental comparison groups drawn from the March 1976 CPS (likely to be almost all non-enrollees) matched to Social Security earnings data. They choose CPS respondents who participated in the labor force in March 1976 and who have the same age distribution as the CETA trainees. Their goal is to estimate the impact of CETA training participation on earnings through regression adjustments to control for other differences between the trainees and the comparison group. Table 1: The trainees are clearly not a random sample of adult males with earnings in 1976 – trainees are less educated and have lower earnings than the comparison group unlike what one would have in a randomized experiment in comparing the treatment and control groups. Table 1 also provides evidence of "Ashenfelter's Dip" – trainees experience an earnings decline in 1975 the year before they enter the program. (1) Simple Differences-in-Differences Estimates: Suppose earnings yit for individual i year t are given by yit = �i + dt + Dit� + �it where �i is a permanent component (individual fixed effect), dt is an economy-wide component (time fixed effect), Dit is a dummy variable for participation in training in period � that takes on a value of 1 for trainees after � and 0 otherwise, and �it is serially uncorrelated error term (transitory component of earnings). Assume selection into CETA training in � is governed by the permanent earnings component �i

Dit = 1 for t > � iff �i < y where y is a constant based on potential trainees discount rates and tastes for training. In this case, a simple differences-in-differences estimate comparing the change in earnings for trainees between some pre-training period (�-j) and the post-training period (�+1) to the change in earnings over the same period for the comparison group provides an unbiased estimate of the training effect: E[yi�+1 - yi�-j | Di�+1=1] = (d�+1 - d�-j) + � E[yi�+1 - yi�-j | Comparison Group] = (d�+1 - d�-j) + �� where � is the fraction of the comparison group that participates in CETA training (contamination effect). If � is trivially small (approximately 0), then one gets an unbiased estimate of � through the differences-in-differences estimator: E[yi�+1 - yi�-j | Di�+1=1] - E[yi�+1 - yi�-j | Comparison Group] = �

Page 10: proev_han_f14.pdf

10

If multiple years of data available, then there are multiple difference-in-differences estimates which should all be equal up to sampling error. A test for the correct specification is a test of equality of the alternative d-d estimates. Table 2: The choice of initial years greatly affects the A&C estimates. If use 1975, then it looks like large positive effect (from Ashenfelter's dip and mean reversion), earlier years look like negative effects (from the fact that trainees are individuals with flatter age-earnings profiles than the comparison group). Furthermore, Ashenfelter's dip strongly indicates that transitory earnings are likely to play a key role in training program entry not just the permanent (average) earnings component. Shocks to earnings also appear to be serially correlated. Thus, one needs a more sophisticated model of earnings dynamics and program selection. (2) Components of Variance Estimates: Assume that selection is based on individual-specific fixed effect and individual-year-specific disturbance tern which display first-order autoregressive serial correlation. (i) yit = �i + dt + Dit� + �it where �it = � �it-1 + eit and vi is a random variable independent of earnings components. Training occurs iff: yi�-k + vi < y That is, training occurs iff (ii) zi = (�i – �) + �i�-k + vi < y - � + d�-k � z ; where � is mean of �i Use method of moments: Predict the means and covariances of the earnings of the comparison group and the trainees using (i) and (ii). Estimate the means and covariances using the comparison group and the trainee group – the sample moments. Match the estimated sample moments to the predicted moments, get parameter estimates. Use the parameter estimates to predict trainee earnings if they had not received training. The difference between the predictions of the trainee earnings without training and the actual earnings of trainings is the estimate of the effect of training �. For the comparison group, the means and covariances are the unconditional means and covariances from (i): E(yit) = �i + dt cov(yit,yis) = 2||2

εω σασ st−+ var(yit) = 22εϖ σσ +

For the trainees, if we assume that �i, �it, and vi are jointly normally distributed, then the conditional mean is:

*][)(]|[)var(

),cov()(]|[ 2||2 λσασβ ε

τϖ

ktititii

i

iititiit DyEzzzE

zzy

yEzzyE −−+−+=<��

���

�+=<

where 0)var(

]|[* >

<−=

i

ii

zzzzEλ

Page 11: proev_han_f14.pdf

11

The mean of trainee earnings differs from the mean of comparison earnings by a training effect plus the sum of two components (a permanent components and a geometrically declining transitory component symmetric centered around the selection period) each proportional to *. The model imposes the restriction that in the pre- and post-training periods the earnings of the trainees and comparisons diverge in a systematic pattern that depend on only one free parameter *. The restrictions of the model are rejected since they fail to capture a systematically weaker trend in trainees' earnings than in comparison group earnings. A&C supplement the model with individual-specific earnings growth rates trends (gi): yit = �i + dt + git + Dit� + �it

This model does better but still much instability in the estimates. LaLonde (1986 AER): A classic study in which the estimates of the impact of a training program from a randomized social experiment the National Supported Work demonstration project are used as a benchmark (the "true" estimates) to compare to alternative non-experimental estimates using alternative comparison groups and econometric specifications. Experimental Estimates: Difference in mean earnings of NSW treatment group and NSW controls (applicants randomized out of access to the program). Non-experimental estimates: LaLonde throws away the experimental controls and uses comparison group with longitudinal earnings histories from the PSID and CPS-SSA earning matched samples. He differences-in-differences models, more detailed regression models, and Heckit (control function) estimates. Key Insights: Experimental treatments and controls look identical (balanced) up to sampling error. None of the standard non-experimental approaches provide reliable estimates; different estimators passing standard specification tests give widely varying estimates. Dehejia and Wahba (1999 JASA): This paper re-examines the use of non-experimental estimators to estimate the treatment impact on earnings of the NSW demonstration using propensity score methods. Propensity Score Method: A semi-parametric generalization of the Heckman selection correction model. Its advantages are a more general first stage equation and a better diagnostic for assessing the comparability of the treatment and comparison groups (how balanced are the covariates of treatment and comparison group members with similar propensity scores). It is an approach to doing "selection on observables." Thus, the propensity score method is most useful when the econometrician observes all of the variables used in selection but does not know the exact form of the "rule" that leads to selection into treatment. Rosenbaum and Rubin (1983): If treatment and potential outcomes are independent conditional on the observed covariates X, then they are independent conditional on the conditional probability of receiving treatment given the covariates. Let Yi = diYi1 + (1-di)Yi0 where di = 1 if treated, Yi1 = outcome for i with treatment, and Yi0 is the outcome for i without treatment.

Page 12: proev_han_f14.pdf

12

Selection on observables Xi: {Yi1, Yi0}di } | Xi or E[Yij | Xi, di=1] = E[Yij | Xi, di=0] = E[Yij | Xi, di=j] Conditional on the observables there is no systematic pre-treatment difference between the groups assigned to treatment and control. This allows one to estimate the TOT by: TOT = E{E(Yi|Xi, di=1) – E(Yi|Xi,di=0) | di=1}, where the outer expectation is over the distribution of Xi|di=1, the distribution of pre-treatment variables in the treated population. The exact matching estimator approach is to match treatment and control observations by X, get the difference in mean outcomes (the treatment effect at X=x) at each value of X, and then get the TOT by averaging these estimated treatment effects over the distribution of X. When X is high dimensional and takes on many values, this exact matching approach may be impractical. This is where the propensity score theorem is helpful: p(Xi) = Pr(di=1 | Xi) = E(di | Xi) = probability of i being assigned to treatment = propensity score Propensity score theorem: {Yi1, Yi0}di } | Xi � {Yi1, Yi0}di } | p(Xi) Thus, in this case of selection on observables, adjusting for the propensity score removes the biases associated with differences in covariates. Why is it sufficient to condition just on the propensity score? The reason is that under the Rosenbaum-Rubin assumptions for selection on observables (covariates X), the covariates are independent of assignment to treatment conditional on the propensity score. In other words, the distribution of covariates should be the same across treatment and comparison groups for observations with the same propensity score. This implication of the assumptions for the propensity score approach to be appropriate provides a diagnostic: one can group observations in strata based on the estimated propensity score and check whether the covariates are balanced across the treatment and comparison groups in each strata. Implementing the Propensity Score Approach: (1) Start with a parsimonious logit or probit selection equation for treatment and estimate the propensity to select into the treatment group (2) Sort the data according to the estimated propensity score from lowest to highest (3) Divide the observations into blocks (or strata) of equal propensity score range (0-0.1, 0.1-0.2, 0.2-0.3, etc.) (4) Do t-tests for the difference-in-means for all covariates across treatment and control observations in each block (5a) If all covariates are balanced (no significant differences in means), stop. Use the estimated propensity scores. (5b) If a particular block has one or more unbalanced covariates (but there is balance elsewhere), divide the block into finer blocks and re-evaluate (5c) If still problems with unbalanced covariates, modify initial logit or probit equation to add higher order terms in problem covariates and/or further interactions. Re-evaluate.

Page 13: proev_han_f14.pdf

13

There are a number of different semi-parametric ways to use the propensity score to estimate the TOT given that you have achieved "balance" in the covariates. The multiplicity of methods arises when the true functional form of the second stage equation is unknown: (1) Control Function: Use your first stage equation to form the Heckman selection correction term and add it to your second stage regression (2) Stratify: Divide the data into blocks based on the propensity score. Run the second stage equation within each block (this might just be the mean difference in outcomes for treatment and comparison observations in each block). Calculate the weighted mean of the within-block estimates to get the TOT (weight by number of treatment observations in each block). (3) Match: Match each treatment observation with a comparison observation, based on similar propensity scores (find closest match). Treat the data like panel data (like twins data) and run within-match (match fixed effects) models of the treatment effect. (4) Weight: Weight each observation by its propensity score and estimate the second stage equation (Hirano, Imbens and Ridder 2003). Dehejia and Wahba (1999) illustrate these methods and show that once one achieves "balance" of covariates within blocks that these propensity score methods come quite close to the experimental estimates for the NSW demonstration in contrast to the lack of reliability of other the traditional econometric non-experimental estimators examined by LaLonde (1986). See table below.

Page 14: proev_han_f14.pdf
Page 15: proev_han_f14.pdf
Page 16: proev_han_f14.pdf