next: an improved method for identifying impacts in regression ... and rooklyn -- next rd... ·...

31
Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs Mark C. Long University of Washington Box 353055 Seattle, WA 98195-3055 [email protected] (corresponding author) Jordan Rooklyn University of Washington [email protected] Working Paper, August 31, 2016 Abstract: This paper develops and advocates for a data-driven algorithm that simultaneously selects the polynomial specification and bandwidth combination that minimizes the predicted mean squared error at the threshold of a discontinuity. It achieves this selection by evaluating the combinations of specification and bandwidth that perform best in estimating the next point in the observed sequence on each side of the discontinuity. We illustrate this method by applying it to data with a simulated treatment effect to show its efficacy for regression discontinuity designs and re- examine the results of papers in the literature. Keywords: Program evaluation; regression discontinuity Acknowledgments: Brian Dillon, Dan Goldhaber, Jon Smith, Jake Vigdor, Ted Westling, and University of Washington seminar audience members provided helpful feedback on the paper. This research was partially supported by a grant from the U.S. Department of Education’s Institute of Education Sciences (R305A140380).

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

  

Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

Mark C. Long University of Washington

Box 353055 Seattle, WA 98195-3055

[email protected] (corresponding author)

Jordan Rooklyn

University of Washington [email protected]

Working Paper, August 31, 2016

Abstract: This paper develops and advocates for a data-driven algorithm that simultaneously selects the polynomial specification and bandwidth combination that minimizes the predicted mean squared error at the threshold of a discontinuity. It achieves this selection by evaluating the combinations of specification and bandwidth that perform best in estimating the next point in the observed sequence on each side of the discontinuity. We illustrate this method by applying it to data with a simulated treatment effect to show its efficacy for regression discontinuity designs and re-examine the results of papers in the literature. Keywords: Program evaluation; regression discontinuity Acknowledgments: Brian Dillon, Dan Goldhaber, Jon Smith, Jake Vigdor, Ted Westling, and University of Washington seminar audience members provided helpful feedback on the paper. This research was partially supported by a grant from the U.S. Department of Education’s Institute of Education Sciences (R305A140380).  

Page 2: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

2  

Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

I. Introduction and Literature Review

Regression discontinuity (RD) designs have become a very popular method for

identifying the local average treatment effect of a program. In many policy contexts, estimating

treatment effects via social experiments is not feasible due to either cost or ethical

considerations. Furthermore, in many contexts, allocating a treatment on the basis of some score

(often a score that illustrates the individual’s worthiness of receiving the treatment) seems

natural. RD holds the promise of having some of the advantages of random treatment allocation

(assuming that being just above or just below the threshold score for receiving the treatment is

effectively random) without the adverse complications of full-blown randomized experiments.

However, RD designs present a challenge for researchers: how to identify the predicted value of

the outcome (Y) as the score (X) approaches the threshold (T) from both the left and right hand

side of that threshold.

A number of guides to standard practice have been written during the past ten years; the

highly cited guide by Lee and Lemieux (2010) provides the following guidance:1

“When the analyst chooses a parametric functional form (say, a low-order

polynomial) that is incorrect, the resulting estimator will, in general, be biased.

When the analyst uses a nonparametric procedure such as local linear

regression—essentially running a regression using only data points ‘close’ to the

cutoff—there will also be bias….Our main suggestion in estimation is to not rely

on one particular method or specification” (p. 284)

To illustrate this point, Lee and Lemieux reanalyze the data from Lee (2008) who evaluated the

impact of party incumbency on the probability that the incumbent party will retain the district’s

seat in the next election for the U.S. House of Representatives. In this analysis, X is defined as

the Democratic vote share in year t minus the vote share of the “Democrats strongest opponent

(virtually always a Republican)” (Lee, 2008, p. 686). Lee and Lemieux estimate the treatment

effect by using polynomials ranging from order zero (i.e., the average of prior values) up to a 6th

                                                            1 For other discussions of standard methods, see Imbens and Lemieux (2008), DiNardo and Lee (2010), Jacob et al. (2012) and Van Der Klaauw (2013).

Page 3: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

3  

order polynomial with the same order polynomial estimated for both sides of the discontinuity

and with bandwidths ranging from 1% to 100% (i.e., using all of the data). For each bandwidth,

they identify the “optimal order of the polynomial” by selecting the one with the lowest value of

the Akaike information criterion (AIC) value. And, they identify an optimal bandwidth “by

choosing the value of h that minimizes the mean square of the difference between the predicted

and actual value of Y” (p. 321). As shown in Table 2 of their paper, using the optimal

bandwidth, which is roughly 5%, and the optimal order of the polynomial for this bandwidth

(quadratic), the estimated effect of incumbency on the Democratic party’s vote share in year t+1

is 0.100 (s.e. = 0.029).

While this model selection procedure has the nice feature of selecting the specification

and bandwidth “optimally”, it has two limitations: (1) it suggests that a particular order of the

polynomial and bandwidth be used on both sides of the discontinuity, and (2) the AIC evaluates

the fit of the polynomial at all values of X, and doesn’t attempt to evaluate the fit of the

polynomial as X approaches the threshold, which is more appropriate for the RD treatment effect

estimation.

Gelman and Imbens (2014) argue against using high order polynomial regressions to

estimate treatment effects in an RD context and instead “recommend that researchers … control

for local linear or quadratic polynomials or other smooth functions” (p. 2). We focus here on

their second critique:

“Results based on high order polynomial regressions are sensitive to the order of

the polynomial. Moreover, we do not have good methods for choosing that order

in a way that is optimal for the objective of a good estimator for the causal effect

of interest. Often researchers choose the order by optimizing some global

goodness of fit measure, but that is not closely related to the research objective of

causal inference” (p. 2).

The goal of our paper is to provide an optimal method for choosing the polynomial order (as well

as the bandwidth) that Gelman and Imbens (2014) note is currently lacking in the literature.

Gelman and Zelizer (2015) illustrate the challenges that could come from using a higher-

order polynomial by critiquing a prominent paper by Chen, Ebenstein, Greenstone, and Li

(2013), which we describe in greater detail below, which examines the effect of an air pollution

policy on life expectancy. Gelman and Zelizer note:

Page 4: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

4  

“[Chen et al.’s] cubic adjustment gave an estimated effect of 5.5 years with

standard error 2.4. A linear adjustment gave an estimate of 1.6 years with standard

error 1.7. The large, statistically significant estimated treatment effect at the

discontinuity depends on the functional form employed. …the headline claim, and

its statistical significance, is highly dependent on a model choice that may have a

data-analytic purpose, but which has no particular scientific basis” (pp.3-4).

Gelman and Zelizer conclude that:

“…we are not recommending global linear adjustments as an alternative. In some

settings a linear relationship can make sense …. What we are warning against is

the appealing but misguided view that users can correct for arbitrary dependence

on the forcing variable by simply including several polynomial terms in a

regression” (p. 6).

In the case study in Section 3.3 of this paper, we re-examine the Chen et al. results using our

method. We show that Gelman and Zelizer’s concerns are well founded; our method shows that

the estimated effect of pollution on life expectancy is much smaller.

In addition to finding the most appropriate form for the specification, researchers also

face the challenge of deciding whether to estimate the selected specification over the whole

range of X (that is a “global” estimate of Y=f0(X) and Y=f1(X) where f0(.) and f1(.) reflect the

function on the left and right sides of the threshold) or to estimate the selected specification over

a narrower range of X near T, a “local” approach.

Imbens and Kalyanaraman (2012) argue for using a local approach and develop a

technique for finding the optimal bandwidth. The Imbens and Kalyanaraman bandwidth

selection method is devised for the estimation of separate local linear regressions on each side of

the threshold. They note that “ad hoc approaches for bandwidth choice, such as standard plug-in

and cross-validation methods … are typically based on objective functions which take into

account the performance of the estimator of the regression function over the entire support and

do not yield optimal bandwidths for the problem at hand” (p. 934). Their method, in contrast,

finds the bandwidth that minimizes mean squared error at the threshold. Imbens and

Kalyanaraman caution that their method, which we henceforth label IK, “gives a convenient

starting point and benchmark for doing a sensitivity analyses regarding bandwidth choice” (p.

940) and thus they remind the user to examine the results using other bandwidths.

Page 5: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

5  

While the IK method greatly helps researchers by providing a data-generated method for

choosing the optimal bandwidth, it does so by assuming that the researcher is using a local linear

regression on both sides of the threshold. This is can introduce substantial bias if (1) a linear

regression is the incorrect functional form and (2) if the treatment changes the relationship

between and . Our method, thus, simultaneously selects the optimal polynomial order and the

optimal bandwidth for each side of the discontinuity. We achieve this result by evaluating the

performance of various combinations of order and bandwidth with performance measured as

mean squared error in predicting the observed values of Y as X approaches the threshold (from

either side); estimating the mean squared error at the threshold as a weighted average of prior

mean squared errors with greater weight on mean squared errors close to the threshold; and

identifying the specification/bandwidth combination that has the lowest predicted mean squared

error at the threshold.

We show that our method does modestly better than the IK method when applied to real data

with a simulated treatment effect. We then apply our method to data from two prominent papers

(Lee (2008) and Chen et al. (2013)) and we document the extent to which our method produces

different results.

2. Method

The goal of RD studies is to estimate the local average treatment effect defined as the

expected change in the outcome for those whose score is at the threshold:

| | | , where is the value of if observation

i is untreated and is the value of if the treatment is received.

Assume that treatment occurs when .2 Assume that there is a smooth and

continuous relationship between and in the range ∆ and that this

relationship can be expressed as | . Likewise, assume that there is a smooth

                                                            2 Note that our method is designed for “sharp” regression discontinuities, where treatment is received by all those who are on one side of a threshold and not received by anyone on the other side of the threshold. In “fuzzy” contexts, where there is a discontinuity in the probability of receiving treatment at the threshold, one can obtain estimates of the local effect of the treatment on the treated by computing the ratio of the discontinuity in the outcome at the threshold and the discontinuity in the probability of receiving treatment at the threshold. When applied in the context of fuzzy RDs, our method will identify the intent-to-treat estimate for those at the threshold, but will not yield an estimate of the local average treatment on the treated effect.

Page 6: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

6  

and continuous relationship between and in the range ∆ and that this

relationship can be expressed as | . Assuming that the only discontinuity in the

relationship between and at is due to the impact of the treatment, the estimand, ,

is defined as the difference of the two estimated functions evaluated at the threshold: =

.

Define mean square prediction error ( ) as follows: .

Our goal is to select the bandwidths (∆ and ∆ ) and order of the polynomials ( and ) for

estimating and such that is minimized3:

(1) argmin∆ , ,∆

=

argmin∆ , ,∆

=

argmin∆ , ,∆

=

argmin∆ , ,∆

=

argmin∆ , ,∆

2

To this point, the minimization problem is unconstrained and standard. Imbens and

Kalyanaraman (2012) add the following constraints to this problem: ∆ ∆ ∆and

1. That is, they assume linear relationships between and in the ranges ∆ and

∆ with the treatment effect, , being identified as the jump between those two linear

functions at .

                                                            3 Note that choosing a higher bandwidth allows for more data to be used in estimating . , which reduces the variance of the estimated parameters. But, a larger bandwidth increases the chance that . is not constant and smooth within the range in which it is estimated. A higher polynomial order can improve the fit of the function . to the observed distribution of and , and thus lowers the bias. But, a higher polynomial order leads to increased variance of the prediction, particularly in the tails of the distribution (e.g., at ). By minimizing though the choice of these parameters, we balance between our desires for low bias and low variance.

Page 7: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

7  

We take a different approach, which involves a different set of simplifying assumptions.

First, unlike IK, our approach allows the treatment to more flexibly change the functional

relationship between and as we do not assume linear functions on either side of the

discontinuity. Our method has . estimated solely on data where ∆ , and .

estimated solely on data where ∆ .This approach is akin to the common practice

in RD studies of estimating one regression which fully interacts the polynomial terms with

.4

Second, we simplify the minimization problem considerably by dropping the last term

(i.e., 2 ). Here is our justification for doing so. Suppose

that for a given choice of ∆ and , the prediction on the left side of the threshold is positive

(i.e., 0 . One could attempt to select ∆ and such that the prediction error

on the right side of the threshold is also positive (i.e., 0 and equal to the bias

on the left so as to cancel it. In fact, one could carry this further and select ∆ and such that

the error on the right side of the threshold is as positive as possible and thus making the last term

as negative as possible (a point Imbens and Kalyanaraman note as well). However, doing so

comes at a penalty of increasing the square of the prediction error on the right side (i.e.,

) and thereby results in a higher . Thus, there is little to be gained by

selecting ∆ and on the basis of the last term in Equation 1. If we can ignore this term, we

substantially simplify the task by breaking it into two separate problems:

(2) argmin∆ , ,∆

argmin∆ ,

argmin∆ ,

argmin∆ ,

argmin∆ ,

.

The advantage of our approach is that we can directly evaluate how different choices of ∆

and perform in predicting observed outcomes before one reaches the threshold, and pick values

of ∆ and that have demonstrated strong performance in terms of their mean squared prediction

errors for observed values. Our key insight is that by focusing on data from one side of the

                                                            4 Note that in such models . and . are in effect estimated solely based on data from their respective sides of the threshold as the same coefficients could be obtained by separate polynomial regressions on each side of the threshold. Put differently, no information from the right hand side is being used to estimate the coefficients on the left hand side and vice-versa.

Page 8: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

8  

threshold only, we can use that observed data to calculate a series of s and then predict

(and ) as weighted averages of observed s (and confidence intervals

around the weighted averages of observed s). We recognize, however, that if the treatment

does not affect the functional relationship between and (e.g., . . ), then our

method would be inefficient (but unbiased), as one would gain power to estimate the common

slope parameters of . and . by using data on both sides of the threshold.

Index the observed distinct values of as 1 to . Define as equal

to , where is a polynomial of order that is estimated over the interval

from to 1 using the observed distributions of and in this interval, reflects the

number of prior observations that are used to estimate the polynomial, and is the observed

value of when = . Note that this formula uses an adaptive bandwidth that is a function of

(i.e., Δ ) to accommodate areas where the data are thin.

Suppose that we estimated as a straight average of these calculated values of

(i.e., ∑ ), and then selected the parameters

and that minimized this straight average. One disadvantage of doing so would be that it

would ignore variance across the values of and would not consider the number of

observations of used to compute this average.5 Less confidence should be placed in

estimates of that rely on fewer or more variable observations of . Thus, rather

than select the parameters and that minimize the average, we select parameters and

that minimize the upper bound of an 80% confidence interval around the average (i.e., such

that there is only a 10% chance that the true, unknown, mean value of the broader distribution

from which our observations are drawn is greater than this upper bound).6

A second disadvantage of a straight average is that it places equal weight on the

calculated values of regardless of how far is from the threshold. So as to place more

weight on the calculated values of for which is close to , we estimate as a

weighted average of the calculated values of :

(3) ∑ ∙ ,

                                                            5 The number of observations of declines by 1 for each unit increase in either or 6 As with all confidence intervals, the choice of 80% is arbitrary. Different values can be set by the user of our Stata program for executing this method (Long and Rooklyn, 2016).

Page 9: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

9  

where is a kernel function (defined below). We then find the parameters that

solve argmin∆ ,

. , where is the estimated standard error of

. To find these parameters, we compute for all combinations of and

subject to the following constraints: and are integers; ∈ max ,

1 . . min , 1 , where and are the minimum and maximum number of prior

observations the researcher is willing to allow to be used in computing ; and ∈

. . , where and are the minimum and maximum polynomial orders the

researcher is willing to consider, 0, and when 0, is defined as the average of

the prior values of . We select the combination of and (among those that are

considered) that minimize .7

In our empirical investigations below, we use an exponential kernel, defined as follows:

(4) ∑

.

is the base weight, and we alternately explore base weights equal to 1, 103, 106, and 1010.

When 1, . is the uniform kernel which gives uniform weight to each value of

when estimating . When = 103 (106) [1010], while all s get some positive weight,

50% (75%) [90%] of the weight is placed on the last 10% of s that are closest to the

threshold. That is, higher values of gives more emphasis to s closer to the threshold

than further away.

We repeat this process to estimate the parameters that solve argmin∆ ,

. √, with the only difference being that we index the observed distinct values of

as to 1 so that the analysis moves from the extreme right in towards .

                                                            7 Note that if using a linear specification with only the last two data points (or any polynomial of order using the last +1 data points) there will be no variance in the estimate of Y at the threshold. If this occurs for both sides of the discontinuity, there would be no variance to the estimate of the jump at the threshold. Such a lack of variance of the difference at the discontinuity would disallow hypothesis testing (or, conclude that there is an infinite t-statistic). As this result is unsatisfactory in most contexts, the reader may want to disallow such specification/bandwidth combinations.

Page 10: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

10  

To illustrate our method, suppose we had a series of six data points with (X, Y)

coordinates (1,12), (2,15), (3,16), (4,13), (5,10), and (6,7), and we would like to use this

information to estimate the next value of Y (when X=7). These six points are shown in Panel A

of Figure 1. Our task is to find the specification that generally performs well in predicting the

next value of Y, and more specifically, as discussed above, has a low for X=7.

The argument for imposing a limited bandwidth, and not using all of the data points to

predict the next value of Y, is a presumption that there has been a change in the underlying

relationship between Y and X; for example a discrete jump in the value of Y (perhaps unrelated to

X), or a change in the function defining the relationship of Y=f(X). If such a change occurred,

then limiting the bandwidth would (ideally) constrain the analysis to the range in which f(X) is

steady. In the example discussed above, there does appear to be a change in f(X) as the function

appears to become linear after X=3. Of course, this apparent change could be a mirage and the

underlying relationship could in fact be quadratic with no change. If there is no change in the

relationship between Y and X, then one would generally want to use all available data points to

best estimate f(X).

Our method for adjudicating between these specifications and bandwidth choices is to

compare all possibilities based on (and the upper bound of its confidence interval).

Panels B through F of Figure 1 show the performance of possible candidate estimators. The

corresponding Table 1 illustrates our method, where Panel A gives the predicted values based on

polynomial orders in the range ∈ 0. . .2 , and Panel B gives the calculations of each

for the feasible combinations of and . Note that since the last four observations happen to

be on a line (i.e., (3,16), (4,13), (5,10), and (6,7)), the linear specification using two prior data

points has no error in predicting the values of Y when X equals 5 or 6, and the same is true for

either the linear or quadratic specifications using three prior values for predicting the value of Y

when X equals 6.

[Insert Figure 1]

[Insert Table 1]

Panel C of Table 1 shows the weighted averages using various kernels. A linear

specification using two prior data points has the lowest weighted average MSPE using all four

Page 11: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

11  

base weights, as is indicated by the bolded numbers.8 This result is not surprising given the

perfect linear relation of Y and X for the last four data points. As one can see, as the base weight

increases, the weighted average approaches the value of the last in the series.

There is clearly a trade-off involved here. With greater weight placed on the last s in the

series, one gets less bias in the estimate of at the threshold as less weight is placed on

s far away from the threshold. However, relying solely on the last (i.e., )

could invite error – a particular specification might “accidently” produce a near perfect

prediction for the last values of Y before the threshold and thus have a lower , but

incorrectly predict the unknown value of Y at the threshold.

Panel D of Table 1 presents the upper-bound of the 80% confidence interval around

. Note that the linear specification using two prior data points has the lowest upper bound

for three of the four base weights (with the exception being for the uniform weight). Since high

base weights produce wider confidence intervals as they increase the sample standard deviation

of the weighted average , using this upper bound of the confidence interval helps avoid

“unhappy accidents” that could occur when using only . When we apply our method to

simulated data, we find that the performance is relatively insensitive to the base weight, although

we favor =103 given its strong performance documented below.

Our Stata program for executing this method (Long and Rooklyn, 2016) allows the user

to (a) select the minimum number of s that must be included in the analysis (≥2),

excluding from consideration combinations of bandwidth and polynomial orders that result in

few observations of , and thus to avoid “unhappy accidents”; (b) select the minimum and

maximum order of the polynomial that the user is willing to consider, (c) select the minimum

number of observations the researcher is willing to allow to be used to estimate the next

observation, and (d) the desired confidence interval for . For the rest of the paper

(excluding Section 3.3), we set the minimum number of MSPEs to five, the minimum and

maximum polynomial orders to zero and five, the minimum number of observations to five, and

the confidence interval to 80%.

                                                            8 If there are ties for the lowest , which did not occur in Table 1, we select the specification with the lowest order polynomial (and ties for a given specification are adjudicated by selecting the smaller bandwidth). We make these choices given the preference in the literature for narrower bandwidths and lower-order polynomials.

Page 12: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

12  

In the next section, we illustrate the method by applying it to simulated data and use the

method to re-evaluate examples from the existing literature.

3. Case Studies That Illustrate the Method

3.1 Case Study 1: Method Applied to Jacob et al. (2012) with a Simulated Treatment

Jacob, Zhu, Somers, and Bloom (2012) provide a primer on how to use RD methods.

They illustrate contemporary methods using a real data set with a simulated treatment effect,

described as follows:

“The simulated data set is constructed using actual student test scores on a

seventh-grade math assessment. From the full data set, we selected two waves of

student test scores and used those two test scores as the basis for the simulated

data set. One test score (the pretest) was used as the rating variable and the other

(the posttest) was used as the outcome. … We picked the median of the pretest (=

215) as the cut-point (so that we would have a balanced ratio between the

treatment and control units) and added a treatment effect of 10 scale score points

to the posttest score of everyone whose pretest score fell below the median.” (pp.

7-8).

We utilize these data provided by Jacob et al. to illustrate the efficacy of our method. Since the

test scores are given in integers, and since the number of students located at each value of the

pretest scores differs, we add a frequency weight to the regressions in constructing our predicted

values and the weight for computing the weighted average MSPE becomes ∙

, where is the number of observations that have that value of X.

In the first panel of Table 2, we estimate the simulated treatment effect (which should be

-10 by construction) with the threshold at 215. Our method selects a linear specification using

23 data points for the left hand side and a quadratic specification with 33 data points for the right

hand side (these selections are not sensitive to the base weight). Compared to the IK method,

which selects a bandwidth of 6.3 for both sides, our method selected a much larger bandwidth.9

                                                            9 To estimate these IK bandwidths and resulting treatment effect estimations, we use the “rd” command for Stata that was developed by Nichols (2011) and use local linear regressions (using

Page 13: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

13  

Our method outperforms IK with a slightly better estimate of the treatment effect (-9.36 versus -

10.68) and smaller standard errors (0.73 versus 1.27). The much smaller standard error provides

our method more power than IK to correctly identify smaller treatment effects.

[Insert Table 2]

The second and third panels of Table 3 reset the threshold for the simulated effect to 205

and 225, which are respectively at the 19th and 77th percentiles of the distribution of X. With the

threshold at 205, our model produces estimates of the simulated treatment effect in the range of -

-9.96 to -10.09 with base weights of 1 to 106 and -8.60 with a base weight of 1010. Regardless of

the base weight, our method selects a quadratic specification using the first 47 observations on

the right side of the discontinuity. In contrast, the IK method uses a bandwidth of only 7.3 on

both sides of the discontinuity and yields an inferior estimates of the treatment effect (-8.25) and

with a higher standard error.

Our method and the IK method produce comparable estimates of the treatment effect

when the threshold is set at 225 (-11.67 to -11.78 for our method versus -11.74 for IK), yet our

method again has smaller standard errors due to more precision in the estimates of the regression

line. Figure 3 illustrates our preferred specifications and bandwidths for these three thresholds

using 103 as the base weight.

[Insert Figure 3]

The next analysis, which is shown in Table 3, evaluates how our method performs when

there is a zero simulated treatment effect. We restore the Jacob et al. data to have no effect and

then estimate placebo treatment effects with the threshold set at 200, 205, …, 230. We are

testing whether our method generates false positives: apparent evidence of a treatment effect

when there is no treatment. Our model yields estimated treatment effects that are generally small

and lie in the range of -1.67 to 0.64. The bad news is that 2 of the 7 estimates are significant at

the 10% level (1 at the 5% level). Thus, a researcher who uses our method would be more likely

to incorrectly claim a small estimated treatment effect to be significant. The IK method does

better at not finding significant placebo effects in the Jacob et al. (2012) data (none of the IK

estimates are significant). However, the IK estimates have a broader range of -2.27 to 1.75.

Thus the researcher using the IK method would be more inclined to incorrectly conclude that the

                                                            

triangular (“edge”) kernel weights) within the selected bandwidth. We also find nearly identical results using the “rdob” program for Stata written by Fuji, Imbens, and Kalyraman (2009).  

Page 14: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

14  

treatment had a sizable effect even when the policy had no effect. The mean absolute error for

this set of estimates is 0.76 using our method versus 0.97 using the IK method. The only reason

that our method is more likely to incorrectly find significant effects is our lower standard errors,

which lie in the range of 0.68 to 1.10 versus the IK standard errors, which lie in the range of 1.22

to 1.89. Thus, we conclude that our higher rate of incorrectly finding significant effects is not a

bug but a feature. The researcher who uses our method and finds an insignificant effect can

argue that it’s a “well estimated zero”, while that advantage is less likely to be present using IK.

[Insert Table 3]

To further investigate the efficacy of our method and to compare it to IK’s method, we

augment the Jacob et al. (2012) data by altering the outcome as follows:

_ 5 200 0.1 200

0.0015 200 . This cubic augmentation increases up to a local maxima of 7.7

points at 206, then declines to a local minima of -19.1 at 239, and then

curves upward again. We then estimate simulated treatment effects of 10 points for those below

various thresholds, alternatively set at 200, 205, …, 230. This simulated treatment effect added

to an underlying cubic relation between and _ should be harder to

identify using the IK method as it relies on an assumption of local linear relations. We

furthermore evaluate our method relative to IK where the augmentation of posttest only occurs

on the left or right side of the specification. Note that since a treatment could have heterogeneous

effects, and thus larger or smaller effects away from the threshold, it is possible for the treatment

to not only have a level effect at the threshold, but also alter the relationship between the

outcome (Y) and the score (X).10 Our method should have a better ability to handle such cases,

and to thus derive a better estimate of the local effect at the threshold.

The results are shown in Table 4 and the corresponding graphical representations are

shown in Figure 4. In Panel A of Table 4, we show the results with the cubic augmentation

applied to both sides of threshold. Across the seven estimations, our method produces an

average absolute error of 0.94, which is a 7% improvement on the absolute error found using the

IK method, where the average absolute error was 1.00. In Panels B and C of Table 4, we show

the results with the cubic augmentation applied to the left and right sides of threshold,

                                                            10 When we add the augmentation to the left hand side only, we level-shift the right hand side up or down so that there is a simulated effect of -10 points at the threshold, and vice-versa.

Page 15: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

15  

respectively. Our method is particularly advantageous when the augmentation is applied to the

right side – for these estimations, our method produces an average absolute error that is 30%

lower than the average absolute error using the IK method. As shown in Figure 4, the principal

advantage of our method is the adaptability of the bandwidth and curvature given the available

evidence on each side of the threshold.

[Insert Table 4]

[Insert Figure 4]

Having now (hopefully) established the utility of our method, in the next two sections we

apply the method to two prominent papers in the RD literature.

3.3 Case Study 2: Method Applied to Data from Lee (2008)

Our second case study applies our method to re-estimate findings in Lee (2008) discussed

in Section 1. First, we re-examine the result shown in Lee’s Figure 2a. Y is an indicator variable

that equals 1 if the Democratic Party won the election in that district in year t+1. The key

identifying assumption is that there is a modest random component to the final vote share (e.g.,

rain on Election Day) that cannot be fully controlled by the candidates and that, effectively,

"whether the Democrats win in a closely contested election is...determined as if by a flip of a

coin" (p. 684). Lee’s data comes from U.S. Congressional election returns from 1946 to 1998

(see Lee (2008) for full description of the data).11

The Lee data present a practical challenge for our method. It contains 4,341 and 5,332

distinct values of X on the left and right sides of the discontinuity. Using every possible number

of prior values of X to predict Y at all distinct values of X, while possible, requires substantial

computer processing time. To reduce our processing time, we compute the average value of X

and Y within 200 bins on each side of the discontinuity, with each bin having a width of 0.5%

(since X ranges from -100% to +100% with the discontinuity at 0%). Binning the data as such

has the disadvantage of throwing out some information (i.e., the upwards or downwards sloping

relationship between X and Y within the bin); yet, for most practical applications this information

loss is minor if the bins are kept narrow.

                                                            11 We obtained these data on January 2, 2015 from http://economics.mit.edu/faculty/angrist/data1/mhe/lee.

Page 16: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

16  

To estimate the treatment effect, Lee applies “a logit with a 4th order polynomial in the

margin of victory, separately, for the winners and the losers” (Lee, 2001, p. 14) using all of the

data on both sides of the discontinuity. Given that our binning results in fractional values that lie

in the interval from 0% to 100%, we use a generalized linear model using a logit link function as

recommended by Papke and Woolridge (1996) for modeling proportions.12

We find that a specification that is linear and uses less than half of the data points is best

for X'β for both the left and the right sides (64 and 28 values on the right and left respectively,

with the corresponding bandwidth range for the assignment variable being -32.0% to 13.0%).13

We estimate that the Democratic Party has a 15.3% chance of winning the next election if they

were barely below 50% on the prior election, and a 57.7% chance of winning the next election if

they are just to the right of the discontinuity. Figure 5 shows the estimated curves. Our estimate

of the treatment effect (i.e., barely winning the prior election) is 42.3% (s.e. = 3.5%) is smaller

than Lee’s estimate, which is found in Lee (2001): 45.0% (s.e. = 3.1%).

[Insert Figure 5]

Next, we re-examine the result shown in Lee’s Figure 4a, where Y is now defined as the

Democratic Party’s vote share in year t+1. Lee (2008) used a 4th order polynomial in X for each

side of the discontinuity and concluded that the impact of incumbency on vote share was 0.077

(s.e. = 0.011). That is, being the incumbent raised the expected vote share in the next election by

7.7 percentage points. Applying our method (as shown in Figure 6), we find that the best

specification/bandwidth choice uses a quadratic specification based on the last 171 observations

on the left hand side and a 5th order polynomial based on the 188 observations to the right of the

discontinuity (with the corresponding bandwidth range for the assignment variable being -94.8%

to 93.7%). Our estimated treatment effect is smaller than Lee’s and has a smaller standard error:

0.057 (s.e. = 0.003).

[Insert Figure 6]

Lee’s (2008) study was also reexamined by Lee and Lemieux (2010) and Imbens and

Kalyanaraman (2011). We noted in Section 1 that according to Lee and Lemieux’s analysis, the

                                                            12 See also Baum (2008). 13 After binning the data, we end up with 145 distinct values of X on the left side as some bins have no data (i.e., no elections in which the Democratic vote share in year t minus the strongest opponents share fell in that range).

Page 17: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

17  

optimal bandwidth/specification resulted in a larger estimate of the effect of incumbency (0.100)

and a larger standard error (0.029). Scanning across their Table 2, the smallest estimated effect

that they found was 0.048. Thus, our estimate is not outside of the range of their estimates.

Nonetheless, our estimate is smaller than what would be selected using Lee and Lemieux’s two-

step method for selecting the optimal bandwidth and then optimal specification for that

bandwidth. Imbens and Kalyanaraman’s found that the optimal bandwidth for a linear

specification on both sides was 0.29 and using this bandwidth/specification produced an estimate

of the treatment effect of 0.080 (s.e. = 0.008). Again, their preferred estimate is somewhat larger

than the estimate found using our method and with a higher standard error.14

3.3 Case Study 5: Method Applied to Data from Chen, Ebenstein, Greenstone, & Li (2013)

Our final case study is a replication of a prominent paper by Chen et al. (2013) that

alarmingly concludes that “an arbitrary Chinese policy that greatly increases total suspended

particulates (TSPs) air pollution is causing the 500 million residents of Northern China to lose

more than 2.5 billion life years of life expectancy” (p. 12936). This policy established free coal

to aid winter heating of homes north of the Huai River and Qinling Mountain range. Chen et al.

used the distance from this boundary as the assignment variable with the treatment discontinuity

being the border itself.

As shown in the first column of our Figure 7 (which reprints their Figures 2 and 3), Chen

et al. estimate that being north of the boundary significantly raises TSP by 248 points and

significantly lowers life expectancy by 5.04 years. These estimates are also shown in Panel A of

Table 5.

[Insert Figure 7]

[Insert Table 5]

We have attempted to replicate these results. Unfortunately, the primary data are

proprietary and not easy to obtain; permission for their use can only be granted by the Chinese

                                                            14 Note however, that when we apply the Stata programs written by Fuji, Imbens, and Kalyraman (2009) and Nichols (2011) that produce treatment estimates using the Imbens and Kalyanaraman (2011) method, we find the optimal bandwidth for a linear specification on both sides was 0.11 and using this bandwidth/specification produced an estimate of the treatment effect of 0.059 (s.e. = 0.002), which are quite similar to our estimates.

Page 18: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

18  

Center for Disease Control.15 Rather than use the underlying primary data, we are treating the

data shown in their Figures 2 and 3 as if it were the actual data. To do so, we have manually

measured the X and Y coordinates of each data point in these figures as well as the diameter of

each circle (where the circle’s area is proportional to the population of localities represented in

the bin).16 The middle column of Figure 7 and Panel B of Table 5 present our replication

applying their specification (a global cubic polynomial in latitude with a treatment jump at the

discontinuity) to these data. We obtain similar results, although the magnitudes are smaller and

less significant; our replication of their specification produces estimates that being north of the

boundary raises TSP by 178 points (p-value 0.069) and insignificantly lowers life expectancy by

3.94 years (p-value 0.389). Comparing the first and second columns of Figure 7, note that the

shapes of the estimated polynomial specifications are generally similar with the modest

discrepancies showing that there is a bit of information lost by binning the data.

In Panel C of Table 5, we apply our method to estimate these treatment effects.17 We

find significant effects on TSP, with TSP rising significantly 146 points (using IK’s method, TSP

is found to rise significantly by 197 points). Thus, Chen et al.’s conclusion that TSP rises

significantly appears to be reasonable and robust to alternative specifications.

However, as shown in the second column in Panel D of Table 5, the estimated treatment

impact on Life Expectancy is much smaller; we estimate that being north of the boundary

significantly lowers life expectancy by 0.40 years, which is roughly one-tenth the effect size we

estimated using their global cubic polynomial specification. The fragility of these results should

not be surprising given a visual inspection of the scatterplot, which does not reveal a clear

pattern to the naked eye. In fact, for the right hand side of the threshold for Life Expectancy, we

find that a simple averaging of the 8 data points to the left of the threshold gives the best

prediction at the threshold.

We agree with Gelman and Zelizer’s (2015) critique that the result

                                                            15 Personal communication with Michael Greenstone, March 16, 2015. 16 We have taken two separate measurements for each figure and use the average of these two measurements for the X and Y coordinates and the median of our four measurements of the diameter of each circle. 17 Given that there are a small number of observations of and on each side of the discontinuity, we placed no constraint on the minimum number of observations or the minimum number of MSPEs that are required to be included. We considered polynomials of order 0 to 5.  

Page 19: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

19  

“indicates to us that neither the linear nor the cubic nor any other polynomial

model is appropriate here. Instead, there are other variables not included in the

model which distinguish the circles in the graph” (p.4).

4. Conclusion

While regression discontinuity design has over a 50 year history for estimating treatment

impacts (going back to Thistlewaite and Campbell (1960)), the appropriate method for selecting

the specification and bandwidth to implement the estimation has yet to be settled. This paper’s

contribution is the provision of a method for optimally and simultaneously selecting a bandwidth

and polynomial order for both sides of a discontinuity. We identify the combination that

minimizes the estimated mean squared predicted error at the threshold of a discontinuity. Our

paper builds on Imbens and Kalyanaraman (2012), but is different from their approach which

solves for the optimal bandwidth assuming that a linear specification will be used on both sides

of the discontinuity. Our insight is that one can use the information on each side of the

discontinuity to see what bandwidth/polynomial-order combinations do well in predicting the

next data point as one moves closer and closer to the discontinuity. We apply our paper to

reexamine several notable papers in the literature. While some of these paper’s results are shown

to be robust, others are shown to be more fragile, suggesting the importance of using optimal

methods for specification and bandwidth selection.

Page 20: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

20  

References

Baum, C.F. (2008). Modeling proportions. Stata Journal 8: 299–303. Chen, Y., Ebenstein, A., Greenstone, M., and Li, H. (2013). Evidence on the impact of sustained

exposure to air pollution on life expectancy from China’s Huai River policy. Proceedings of the National Academy of Sciences 110, 12936–12941.

DiNardo, J., and Lee, D. (2010). Program evaluation and research designs. In Ashenfelter and Card (eds.), Handbook of Labor Economics, Vol. 4.

Fuji, D., Imbens, G. and Kalyanaraman, K. (2009). Notes for Matlab and Stata regression discontinuity software. https://www.researchgate.net/publication/228912658_Notes_for_Matlab_and_Stata_regression_discontinuity_software. Software downloaded on July 2, 2015 from http://faculty-gsb.stanford.edu/imbens/RegressionDiscontinuity.html.

Gelman, A., and Imbens, G. (2014). Why high-order polynomials should not be used in regression discontinuity designs. National Bureau of Economic Research, Working Paper 20405, http://www.nber.org/papers/w20405.

Gelman, A., and Zelizer, A. (2015). Evidence on the deleterious impact of sustained use of polynomial regression on causal inference. Research & Politics, 2(1), 1-7.

Imbens, G., and Kalyanaraman, K. (2012.) Optimal bandwidth choice for the regression discontinuity estimator. Review of Economic Studies, 79, 933–95.

Imbens, G., and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics 142, 615–635.

Jacob, R., Zhu, P., Somers, M., and Bloom, H. (2012). A practical guide to regression discontinuity, MDRC, Accessed via http://www.mdrc.org/sites/default/files/regression_discontinuity_full.pdf.

Lee, D.S. (2001). The electoral advantage to incumbency and voters’ valuation of politicians’ experience: A regression discontinuity analysis of elections to the U.S. National Bureau of Economics Research, Working Paper 8441.

Lee, D.S. (2008). Randomized experiments from non-random selection in U.S. House elections. Journal of Econometrics, 142, 675-697.

Lee, D. S., and Lemieux, T. (2010). Regression discontinuity designs in economics. Journal of Economic Literature 48, 281-355.

Long, M.C., and Rooklyn, J. (2016). Next: A Stata program for regression discontinuity. University of Washington.

Nichols, A. (2011). rd 2.0: Revised Stata module for regression discontinuity estimation. http://ideas.repec.org/c/boc/bocode/s456888.html

Papke, L.E., and Wooldridge, J. (1996.) Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11: 619–632.

Thistlewaite, D., and Campbell, D. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment". Journal of Educational Psychology 51(6): 309–317.

Van Der Klaauw, W. (2008). Regression-discontinuity analysis: A survey of recent developments in economics. Labour 22, 219–245.

Page 21: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

21  

Figure 1: Predicting the next value after six observed data points

Panel A: Data available to predict next Y

Panel B: Predicting Y given X ≥ 2 using prior value of X

Panel C: Predicting Y given X ≥ 3 using prior two values of X

Panel D: Predicting Y given X ≥ 4 using prior three values of X

Panel E: Predicting Y given X ≥ 5 using prior four values of X

Panel F: Predicting Y given X = 6 using prior five values of X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

02

46

81

01

21

41

61

82

0Y

1 2 3 4 5 6X

Page 22: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

22  

Table 1: Computing Mean Squared Prediction Error (MSPE) and Selecting the Optimal Specification and Bandwidth

0 0 0 0 0 1 1 1 1 2 2 2

1 2 3 4 5 2 3 4 5 3 4 5

Panel A: Prediction of YX Y1 122 15 12.03 16 15.0 13.5 18.04 13 16.0 15.5 14.3 17.0 18.3 15.05 10 13.0 14.5 14.7 14.0 10.0 12.7 15.0 6.0 7.56 7 10.0 11.5 13.0 13.5 13.2 7.0 7.0 9.0 11.4 7.0 4.0 3.4

Panel B: Error SquaredX2 9.03 1.0 6.3 4.04 9.0 6.3 1.8 16.0 28.4 4.05 9.0 20.3 21.8 16.0 0.0 7.1 25.0 16.0 6.36 9.0 20.3 36.0 42.3 38.4 0.0 0.0 4.0 19.4 0.0 9.0 13.0

Panel C: Predicted Value of MSPE given X = Threshold (i.e., Weighed Average of MSPEs)

Base Wgt. = 1 (Uniform) 7.4 13.3 19.9 29.1 38.4 5.0 11.9 14.5 19.4 6.7 7.6 13.0Base Wgt. = 10^3 8.9 19.4 31.6 37.0 38.4 0.8 2.7 8.2 19.4 3.2 8.4 13.0Base Wgt. = 10^6 8.998 20.2 35.0 40.7 38.4 0.1 0.5 5.2 19.4 1.0 8.8 13.0Base Wgt. = 10^10 8.99999 20.25 35.9 42.0 38.4 0.002 0.1 4.2 19.4 0.2 8.97 13.0

Panel D: Upper Bound of 80% Confidence Interval Around Predicted Value of MSPE given X = Threshold

Base Wgt. = 1 (Uniform) 9.9 19.9 38.6 69.5 11.2 28.0 46.8 15.7 11.9Base Wgt. = 10^3 13.2 29.7 57.1 84.1 11.2 26.2 47.6 17.5 13.9Base Wgt. = 10^6 14.1 32.6 65.5 94.5 12.1 27.6 49.6 16.5 14.8Base Wgt. = 10^10 14.4 33.4 68.0 98.6 12.4 27.9 49.8 15.9 15.0

Number of prior data points

( 0 )

Polynomial order ( 0 )

Page 23: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

23  

Table 2: Estimating a Simulated Treatment Effect of -10 with Jacob et al. (2012) Data

Base Weight 1 1,000 10^6 10^10 1 1,000 10^6 10^10 1 1,000 10^6 10^10

Left Side of ThresholdOptimal Specification Linear Linear Linear Linear Linear Linear Linear Linear Cubic Cubic Cubic CubicOptimal # Prior Observations 23 23 23 23 19 19 17 6 44 44 44 33Total # Prior Observations 42 42 42 42 32 32 32 32 52 52 52 52

Right Side of ThresholdOptimal Specification Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Quadratic Linear Linear Linear LinearOptimal # Prior Observations 33 33 33 33 47 47 47 47 20 20 20 20Total # Prior Observations 42 42 42 42 52 52 52 52 32 32 32 32

Our Estimate of Treatment EffectEstimate -9.36 -9.36 -9.36 -9.36 -9.96 -9.96 -10.09 -8.60 -11.67 -11.67 -11.67 -11.78s.e. (Estimate) (0.73) (0.73) (0.73) (0.73) (0.93) (0.93) (0.95) (1.38) (0.98) (0.98) (0.98) (1.07)

Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear SpecificationBandwidth 6.3 7.3 7.3Estimate -10.68 -8.25 -11.74s.e. (Estimate) (1.27) (1.50) (1.44)

Simulated Treatment Effect = -10

Threshold = 215 Threshold = 205 Threshold = 225

Page 24: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

24  

Figure 2: Selection of Specification and Bandwidth Using Data from Jacob et al. (2012) With Simulated Treatment Effect of -10 at Various Thresholds

Simulated Threshold

= 205

Estimated Treatment Effect =

-9.39 (s.e. = 0.24)

Simulated Threshold

= 215

Estimated Treatment Effect =

-9.96 (s.e. = 0.18)

Simulated Threshold

= 225

Estimated Treatment Effect = -11.67

(s.e. = 0.14)

180

200

220

240

260

280

160 180 200 220 240 260

180

200

220

240

260

280

160 180 200 220 240 260

180

200

220

240

260

280

160 180 200 220 240 260

Page 25: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

25  

Table 3: Estimating a Placebo Treatment Effect with Jacob et al. (2012) Data

Threshold 200 205 210 215 220 225 230

Left Side of ThresholdOptimal Specification Linear Linear Linear Linear Linear Cubic QuadraticOptimal # Prior Observations 19 19 32 23 40 44 39Total # Prior Observations 27 32 37 42 47 52 57

Right Side of ThresholdOptimal Specification Quadratic Quadratic Linear Quadratic Quadratic Linear LinearOptimal # Prior Observations 47 47 29 33 24 20 20Total # Prior Observations 57 52 47 42 37 32 27

Our Estimate of Treatment EffectEstimate -0.90 0.04 -1.39 0.64 0.50 -1.67 -0.20s.e. (Estimate) (1.10) (0.93) (0.68) (0.73) (0.77) (0.98) (1.03)Significance ** *

Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear SpecificationBandwidth 6.7 7.3 6.3 6.3 7.3 7.3 8.1Estimate 0.01 1.75 -2.27 -0.68 0.22 -1.74 0.08s.e. (Estimate) (1.89) (1.50) (1.50) (1.27) (1.22) (1.44) (1.58)Significance

Absolute Error AverageLong & Rooklyn 0.90 0.04 1.39 0.64 0.50 1.67 0.20 0.76Imbens & Kalyanaraman 0.01 1.75 2.27 0.68 0.22 1.74 0.08 0.97Better Performance: IK LR LR LR IK LR IK LR

Note: ***, **, and * denote two-tailed significance at the 1%, 5%, or 10% level.

Simulated Treatment Effect = 0

Page 26: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

26  

Table 4: Estimating a Simulated Treatment Effect with Jacob et al. (2012) Data, Augmented with Cubic Function of Prestest (X) Added to Posttest (Y)

Threshold 200 205 210 215 220 225 230

Our Estimate of Treatment EffectEstimate -10.02 -8.14 -11.63 -9.35 -10.26 -11.27 -10.86s.e. (Estimate) (1.59) (1.23) (1.40) (1.04) (0.91) (1.11) (1.50)

Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear SpecificationEstimate -10.23 -8.64 -12.58 -10.89 -9.95 -11.78 -9.88s.e. (Estimate) (1.94) (1.55) (1.72) (1.24) (1.21) (1.41) (1.55)

Absolute Error AverageLong & Rooklyn 0.02 1.86 1.63 0.65 0.26 1.27 0.86 0.94Imbens & Kalyanaraman 0.23 1.36 2.58 0.89 0.05 1.78 0.12 1.00Better Performance: LR IK LR LR IK LR IK LR

Our Estimate of Treatment EffectEstimate -10.27 -8.09 -11.32 -9.38 -10.13 -11.67 -10.82s.e. (Estimate) (1.51) (1.16) (1.44) (0.99) (1.05) (0.98) (1.22)

Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear SpecificationEstimate -10.10 -8.74 -12.72 -10.94 -9.95 -11.73 -9.69s.e. (Estimate) (2.06) (1.62) (1.42) (1.22) (1.21) (1.43) (1.60)

Absolute Error AverageLong & Rooklyn 0.27 1.91 1.32 0.62 0.13 1.67 0.82 0.96Imbens & Kalyanaraman 0.10 1.26 2.72 0.94 0.05 1.73 0.31 1.02Better Performance: IK IK LR LR IK LR IK LR

Our Estimate of Treatment EffectEstimate -10.65 -10.01 -11.70 -9.338 -9.63 -11.27 -10.25s.e. (Estimate) (1.21) (1.01) (0.59) (0.80) (0.53) (1.11) (1.35)

Using Imbens and Kalyanaraman's (2012) Optimal Bandwidth for Linear SpecificationEstimate -10.02 -7.97 -12.16 -10.664 -9.78 -11.79 -10.11s.e. (Estimate) (1.75) (1.46) (1.57) (1.30) (1.21) (1.41) (1.55)

Absolute Error AverageLong & Rooklyn 0.65 0.01 1.70 0.662 0.37 1.27 0.25 0.70Imbens & Kalyanaraman 0.02 2.03 2.16 0.664 0.22 1.79 0.11 1.00Better Performance: IK LR LR LR IK LR IK LR

Panel A: Augmentation Applied to Both Sides of Threshold

Panel B: Augmentation Applied to Left Side of Threshold

Panel C: Augmentation Applied to Right Side of Threshold

Page 27: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

27  

Figure 4: Selection of Specification and Bandwidth Using Data from Jacob et al. (2012) Augmented with Cubic Function of Prestest (X) Added to Posttest (Y) and With Simulated Treatment Effect of -10 at

Various Thresholds

Panel A: Augmentation Applied to Both Sides of

Threshold

Panel B: Augmentation Applied to Left Side of

Threshold

Panel C: Augmentation Applied to Right Side of

Threshold

=200

=205

=210

=215

 

(Figure 4 Continued on Next Page)

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

350

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

350

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

Page 28: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

28  

Figure 4 Continued

Panel A: Augmentation Applied to Both Sides of

Threshold

Panel B: Augmentation Applied to Left Side of

Threshold

Panel C: Augmentation Applied to Right Side of

Threshold

=220

=225

=230

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

100

200

300

400

160 180 200 220 240 260

100

150

200

250

300

160 180 200 220 240 260

200

250

300

350

400

160 180 200 220 240 260

Page 29: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

29  

Figure 5: Replication of Lee’s (2008) Figure 2a Using Our Specification/Bandwidth Selection Method

Figure 6: Replication of Lee’s (2008) Figure 4a Using Our Specification/Bandwidth Selection Method

0.2

.4.6

.81

-1 -.5 0 .5 1

0.2

.4.6

.81

Vo

te S

har

e, E

lect

ion

t+1

-1 -.5 0 .5 1Democratic Vote Share Margin of Victory, Election t

Page 30: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

30  

Figure 7: Replication of Chen et al.’s (2013) Figures 2 & 3

Figures 2 and 3 reprinted from Chen et al. (2013)

Replication using their specification applied to data inferred from figures

Replication using our method applied to data inferred from figures

020

040

060

080

0

TS

P

-20 -10 0 10 20

Degrees North of the Huai River Boundary

02

004

00

60

08

00

TS

P

-20 -10 0 10 20

Degrees North of the Huai River Boundary

6570

7580

8590

95

Lif

e E

xpec

tanc

y (Y

ears

)

-20 -10 0 10 20

Degrees North of the Huai River Boundary

65

70

75

80

85

90

95

Life

Exp

ect

an

cy (

Ye

ars

)

-20 -10 0 10 20

Degrees North of the Huai River Boundary

Page 31: Next: An Improved Method for Identifying Impacts in Regression ... and Rooklyn -- Next RD... · Next: An Improved Method for Identifying Impacts in Regression Discontinuity Designs

31  

Table 5: Estimating Treatment Effects using Data Inferred from Chen et al. (2013)