new estimation of limited dependent variable models with dummy...

27
Editor’s Note. This article is the JBES 1999 Invited Lecture at the Joint Statistical Meetings, August 1999, in Baltimore, Maryland. Estimation of Limited Dependent Variable Models With Dummy Endogenous Regressors: Simple Strategies for Empirical Practice Joshua D. Angrist Department of Economics, Massachusetts Institute of Technology, Cambridge, MA 02142-1347, and National Bureau of Economic Research, Cambridge, MA ([email protected]) Applied economists have long struggled with the question of how to accommodate binary endogenous regressors in models with binary and nonnegative outcomes. I argue here that much of the dif culty with limited dependent variables comes from a focus on structural parameters, such as index coef cients, instead of causal effects. Once the object of estimation is taken to be the causal effect of treatment, several simple strategies are available. These include conventional two-stage least squares, multiplicative models for conditional means, linear approximation of nonlinear causal models, models for distribution effects, and quantile regression with an endogenous binary regressor. The estimation strategies discussed in the article are illustrated by using multiple births to estimate the effect of childbearing on employment status and hours of work. KEY WORDS: Instrumental variables; Labor supply; Sample selection; Semiparametric methods; Tobit. Econometric models with dummy endogenous regressors capture the causal relationship between a binary regressor and an outcome variable. A canonical example is the evaluation of training programs, in which the binary regressor is an indica- tor for those who were trained and outcomes are earnings and employment status. Other examples include treatment effects in epidemiology and health economics, effects of union sta- tus, and the effects of teen childbearing on schooling or labor- market outcomes. All of these problems have a treatment- control avor. The notion that treatment status is “endoge- nous” re ects the fact that simple comparisons of treated and untreated individuals are unlikely to have a causal interpreta- tion. Instead, the dummy endogenous-variablemodel is meant to allow for the possibility of joint determination of outcomes and treatment status or omitted variables related to both treat- ment status and outcomes. The principal challenge facing empirical researchers con- ducting studies of this type is identi cation. Successful identi cation in this context usually means nding an instru- mental variable (IV) that affects outcomes solely through its impact on the binary regressor of interest. For better or worse, however, the formal discipline of econometrics is not much concerned with the “ nding instruments problem”; this is a job left to the imagination of empirical researchers. This divi- sion of responsibility reminds me a little of Steve Martin’s old joke about “how to make a million dollars and never pay taxes.” First, Martin blandly suggests, “get a million dollars.” In the same spirit, once you have solved the dif cult prob- lem of nding an instrument, then the tasks of estimation and inference—typically using two-stage least squares (2SLS)— look relatively straightforward. But perhaps there is reason to worry about estimation and inference in this context after all. Even with a plausible instru- ment, the dummy endogenous-variables model still seems to raise some special econometric problems. For one thing, the endogenous regressor is binary, so perhaps a nonlinear rst stage is in order. Second, and more importantly, in many cases the outcome of interest is also binary. Examples include employment status in the evaluation of training programs and survival status in health research. In other cases, the depen- dent variable has limited support, most often being nonnega- tive with a mass point at 0. Examples of this sort of outcome include earnings, hours worked, and expenditure on health care. The analysis of such limited dependent variables (LDV’s) seems to call for nonlinear models like probit and tobit. This generates few stumbling blocks when the regressors are exoge- nous, but, with endogenous regressors, LDV models appear to present special challenges. This article argues that dif culties with endogenous vari- ables in nonlinear LDV models are usually more apparent than real. For binary endogenous regressors, at least, the techni- cal challenges posed by LDV models come primarily from what I see as a counterproductive focus on structural param- eters such as latent index coef cients or censored regression coef cients, instead of directly interpretable causal effects. In my view, the problem of causal inference with LDV’s is not © 2001 American Statistical Association Journal of Business & Economic Statistics January 2001, Vol. 19, No. 1 2

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Editorrsquos Note This article is the JBES 1999 Invited Lecture at the Joint Statistical Meetings August1999 in Baltimore Maryland

Estimation of Limited Dependent VariableModels With Dummy EndogenousRegressors Simple Strategiesfor Empirical Practice

Joshua D AngristDepartment of Economics Massachusetts Institute of Technology Cambridge MA 02142-1347and National Bureau of Economic Research Cambridge MA (angristmitedu)

Applied economists have long struggled with the question of how to accommodate binary endogenousregressors in models with binary and nonnegative outcomes I argue here that much of the dif culty withlimited dependent variables comes from a focus on structural parameters such as index coef cientsinstead of causal effects Once the object of estimation is taken to be the causal effect of treatmentseveral simple strategies are available These include conventional two-stage least squares multiplicativemodels for conditional means linear approximation of nonlinear causal models models for distributioneffects and quantile regression with an endogenous binary regressor The estimation strategies discussedin the article are illustrated by using multiple births to estimate the effect of childbearing on employmentstatus and hours of work

KEY WORDS Instrumental variables Labor supply Sample selection Semiparametric methodsTobit

Econometric models with dummy endogenous regressorscapture the causal relationship between a binary regressor andan outcome variable A canonical example is the evaluation oftraining programs in which the binary regressor is an indica-tor for those who were trained and outcomes are earnings andemployment status Other examples include treatment effectsin epidemiology and health economics effects of union sta-tus and the effects of teen childbearing on schooling or labor-market outcomes All of these problems have a treatment-control avor The notion that treatment status is ldquoendoge-nousrdquo re ects the fact that simple comparisons of treated anduntreated individuals are unlikely to have a causal interpreta-tion Instead the dummy endogenous-variable model is meantto allow for the possibility of joint determination of outcomesand treatment status or omitted variables related to both treat-ment status and outcomes

The principal challenge facing empirical researchers con-ducting studies of this type is identi cation Successfulidenti cation in this context usually means nding an instru-mental variable ( IV) that affects outcomes solely through itsimpact on the binary regressor of interest For better or worsehowever the formal discipline of econometrics is not muchconcerned with the ldquo nding instruments problemrdquo this is ajob left to the imagination of empirical researchers This divi-sion of responsibility reminds me a little of Steve Martinrsquosold joke about ldquohow to make a million dollars and never paytaxesrdquo First Martin blandly suggests ldquoget a million dollarsrdquoIn the same spirit once you have solved the dif cult prob-lem of nding an instrument then the tasks of estimation andinferencemdashtypically using two-stage least squares (2SLS)mdashlook relatively straightforward

But perhaps there is reason to worry about estimation andinference in this context after all Even with a plausible instru-ment the dummy endogenous-variables model still seems toraise some special econometric problems For one thing theendogenous regressor is binary so perhaps a nonlinear rststage is in order Second and more importantly in manycases the outcome of interest is also binary Examples includeemployment status in the evaluation of training programs andsurvival status in health research In other cases the depen-dent variable has limited support most often being nonnega-tive with a mass point at 0 Examples of this sort of outcomeinclude earnings hours worked and expenditure on healthcare The analysis of such limited dependent variables (LDVrsquos)seems to call for nonlinear models like probit and tobit Thisgenerates few stumbling blocks when the regressors are exoge-nous but with endogenous regressors LDV models appear topresent special challenges

This article argues that dif culties with endogenous vari-ables in nonlinear LDV models are usually more apparent thanreal For binary endogenous regressors at least the techni-cal challenges posed by LDV models come primarily fromwhat I see as a counterproductive focus on structural param-eters such as latent index coef cients or censored regressioncoef cients instead of directly interpretable causal effects Inmy view the problem of causal inference with LDVrsquos is not

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

2

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 3

fundamentally different from causal inference with continuousoutcomes

Section 1 begins by discussing identi cation strategies inLDV models with dummy endogenous regressors I show thatthe auxiliary assumptions associated with structural model-ing are largely unnecessary for causal inference There isone important exception to this claim however and that iswhen attention focuses on conditional-on-positive effects asin sample-selection models For example labor economistssometimes study the effect of an endogenous treatment onhours worked for those who work Identi cation of sucheffects turns heavily on an underlying structural frameworkOn the other hand the motivation for estimating this sort ofeffect is often unclear (to me) Moreover claims for identi -cation in this context typically strike me as overly ambitiousbecause even an ideal randomized experiment fails to identifyconditional-on-positive causal effects

Setting aside the conceptual problems inherent inconditional-on-positive effects a focus on causal effectsinstead of say censored regression parameters or latent indexcoef cients has a major practical payoff First and most basicis the observation that if there are no covariates or the covari-ates are sparse and discrete linear models and associated esti-mation techniques like 2SLS are no less appropriate for LDVrsquosthan for other kinds of dependent variables This is becauseconditional expectation functions with discrete covariates canbe parameterized as linear using a saturated model regardlessof the support of the dependent variable Of course relation-ships involving continuous covariates or a less-than-saturatedparameterization for discrete covariates are usually nonlin-ear (even if the outcome variable has continuous support)In such cases however it still makes sense to ask whethernonlinear modeling strategies change inferences about causaleffects

If nonlinearity does seem important it can be incorpo-rated into models for conditional means using two new semi-parametric estimators The rst due to Mullahy (1997) isbased on a multiplicative model that can be estimated usinga simple nonlinear IV estimator The second developed byAbadie (1999) allows exible nonlinear approximation ofthe causal response function of interest In addition to newstrategies for estimating effects on means I also discussestimates of the effect of treatment on distribution ordi-nates and quantiles using an approach developed by AbadieAngrist and Imbens (1998) This provides an alternative tothe estimation of conditional-on-positive effects Two advan-tages of these new approaches are their computational sim-plicity and weak identi cation requirements relative to othersemiparametric approaches Another advantage is the fact thatthey estimate causal effects directly and are not tied to alatent-indexcensored-regression framework The new estima-tors are illustrated by estimating the effect of childbearing onwomenrsquos employment status and hours of work using multi-ple births as an instrument This ldquotwins instrumentrdquo was usedto estimate the labor-supply consequences of childbearing byRosenzweig and Wolpin (1980) Bronars and Grogger (1994)Gangadharan and Rosenbloom (1996) and Angrist and Evans(1998)

1 CAUSAL EFFECTS AND STRUCTURALPARAMETERS

11 The Effects of Interest

The relationship between fertility and labor supply is oflongstanding interest in labor economics and demography Fora recent discussion and references to the literature see Angristand Evans (1998) which is the basis of the empirical work inSection 4 The AngristndashEvans application is concerned withthe effect of going from a family size of two children to morethan two children Let Di be an indicator for women with morethan two children in a sample of women with at least twochildren The reasons for focusing on the transition from twoto more than two are both practical and substantive First onthe practical side there are plausible instruments available forthis fertility increment Second recent reductions in martialfertility have been concentrated in the 2ndash3 child range

What is the object of interest in an application like thisSometimes the purpose of research is merely descriptive inwhich case we might simply compare the outcomes of womenwho have Di

D 1 with those of women who have DiD 0 For

this descriptive agenda no special issues are raised by thefact that the dependent variable is limited beyond the obvi-ous consideration that if Yi is binary then one need only lookat means In contrast if Yi is a variable like earnings witha skewed distribution the mean may not capture everythingabout labor-supply behavior that is of interest In fact a com-plete description would probably look at the entire distributionof outcomes or at least at selected quantiles

A major problem with descriptive analyses is that they mayhave little predictive value Part of the motivation for studyinglabor supply and fertility is interest in how changes in gov-ernment policy and the environment affect childbearing andlabor supply For example we might be interested in the con-sequences of changes in contraceptive technology or costs amotivation for studying the twins experiment mentioned byRosenzweig and Wolpin (1980 p 347) Similarly one of thequestions addressed in the labor-supply literature is to whatextent declines in fertility have been a causal factor increas-ing female employment rates or reducing poverty In contrastwith descriptive analyses causal relationships answer counter-factual questions and are therefore more likely to be of valuefor predicting the effects of changing policies or changing cir-cumstances or understanding the past (eg see Manski 1996)

Causal relationships can be described most simply usingexplicit notation for counterfactuals or potential outcomesThis approach to causal inference goes back at least toR A Fisher but the modern version is usually attributed toRubin (1974 1977) Let Y1i denote the labor-market behaviorof mother i if she has a third child and let Y0i denote labor-market behavior otherwise for the same mother The averageeffect of childbearing on mothers who have a third child is

E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17 D E6Y1iƒ Y0i

mdashDiD 170 (1)

Note that the rst term on the left side is observed but thesecond term is an unobserved counterfactual average that weassume is meaningful

The right side of (1) is often called the effect of treatment onthe treated and is widely discussed in the evaluation literature

4 Journal of Business amp Economic Statistics January 2001

(eg Rubin 1977 Heckman and Robb 1985 Angrist 1998)In the context of social-program evaluation the effect of treat-ment on the treated tells us whether the program was bene cialfor participants This is not the only average effect of interestwe might also care about the unconditional average effect orthe effect in some subpopulation de ned by covariates (ieE6Y1i

ƒY0imdashX7 for covariates X) Ultimately of course we are

also likely to want to extrapolate from the experiences of thetreated to as-yet-untreated groups Such extrapolation makeslittle sense however unless average causal effects in existingpopulations can be reliably assessed

Simple comparisons of outcomes by Di generally fail toidentify causal effects Rather a comparison between treatedand untreated individuals equals the effect of treatment on thetreated plus a bias term

E6YimdashDi

D 17ƒ E6YimdashDi

D 17

D E6Y1imdashDi

D 17 ƒ E6Y0imdashDi

D 07

D E6Y1iƒ Y0i

mdashDiD 17

C 8E6Y0imdashDi

D 17ƒ E6Y0imdashDi

D 0790 (2)

The bias term disappears in the childbearing example if child-bearing is determined in a manner independent of a womanrsquospotential labor-market behavior if she does not have childrenIn that case 8E6Y0i

mdashDiD 07 ƒ E6Y0i

mdashDiD 179 D 0 and simple

comparisons identify the effect of treatment on the treatedBut this independence assumption seems unrealistic becausechildbearing decisions are made in light of information aboutearnings potential and career plans

12 Structural Models

What connects the causal parameters discussed in the pre-vious section with the parameters in structural economet-ric models Suppose that instead of potential outcomes webegin with a labor-supply model for hours worked alongthe lines of many second-generation labor-supply studies (seeKillingsworth 1983 for a survey) In this setting childbearingis determined by comparing the utility of having a child andnot having a child We can model this process as

DiD 14X 0

iƒ gt Daggeri51 (3a)

where Xi is a K 1 vector of observed characteristics thatdetermine utility and Daggeri is an unobserved variable re ecting aperson-speci c utility contrast

In a simple static model labor supply is given by the com-bination of the participation decision and hours determinationfor workers Workers chose their latent hours 4yi5 by equat-ing offered wages wi with the marginal rate of substitutionof goods for leisure mi4yi5 Participation is determined bythe relationship between wi and the marginal rate of substitu-tion at zero hours mi405 Since offered wages are unobservedfor nonworkers and reservation wages are never observed wedecompose these variables into a linear function of observablecharacteristics and regression error terms (denoted vwi1 vmi) asin the article by Heckman (1974) and many others

wiD X 0

ibdquowC wDi

C vwi (3b)

and

mi4yi5 D X 0ibdquom

C ndashyiC mDi

C vmi0 (3c)

Equating (3b) and (3c) and relabeling parameters and the errorterm we can solve for observed hours

YiD X 0

ibdquo C DiC ˜i if wi gt mi405

YiD 0 otherwise0

Equivalently

YiD 14X 0

ibdquo C Di gt ƒ˜i54X0ibdquo C Di

C ˜i50 (4)

Childbearing is said to be endogenous if the unobserved errordetermining Di depends on the unobserved error in the partic-ipation and hours equations

Since the structural equations tell us what a woman woulddo under alternative values of Di they describe the same sortof potential outcomes referred to in the previous model Theexplicit link is

YiD Y0i41ƒ Di5 C Y1iDi1 (5)

where

Y0iD 14X 0

ibdquo gt ƒ˜i54X0ibdquo C ˜i5 (6a)

and

Y1iD 14X 0

ibdquo C gt ƒ˜i54X0ibdquo C C ˜i50 (6b)

Once the structural parameters are known we can use theserelationships to write down expressions for causal effects Forexample the effect of treatment on the treated is

E6Y1iƒ Y0i

mdashDiD 17

D E 14X 0ibdquo C gt ƒ˜i54X

0ibdquo C C ˜i5

ƒ 14X 0i bdquo gt ƒ˜i54X

0ibdquo C ˜i5mdashX 0

iƒ gt Daggeri90 (7)

Note however that knowledge of the parameters on the rightside of (7) is still not enough to evaluate this expression Thefollowing lemma outlines the identi cation possibilities in thiscontext

Lemma Assume that the covariates 4Xi5 are independentof continuously distributed latent errors 4Daggeri1 ˜i5 Then

1 If Daggeri is not independent of ˜i and the probability oftreatment is always nonzero the effect of treatment on thetreated is not identi ed without further assumptions (Heckman1990)

2 If Daggeri is independent of ˜i the effect of treatment on thetreated is identi ed (exogenous treatment)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 5

3 Suppose there is a covariate denoted Zi with coef cientƒ1 in (3a) which is excluded from (3b) and (3c) Without lossof generality assume ƒ1 gt 0 Then the local average treatmenteffect (LATE) given by E6Y1i

ƒ Y0imdashX 0

iƒ C ƒ1 gt Daggeri gt X 0iƒ7 is

identi ed ( Imbens and Angrist 1994)4 Suppose that LATE is identi ed as in 3 and that P6Di

D1mdashZi

D 07 D 0 Then LATE equals the effect of treatment on thetreated E6Y1i

ƒY0imdashDi

D 17 Similarly if P6DiD 1mdashZi

D 17 D 1LATE equals E6Y1i

ƒ Y0imdashDi

D 07 (Angrist and Imbens 1991)

This set of results can easily be summarized using non-technical language First without additional assumptions theeffect of treatment on the treated is not identi ed in latent-index models Second the three positive identi cation resultsin the lemma for exogeneous treatment the LATE resultand the specialization of LATE to effects on the treated ornontreated require no information about the structural modelother than the distribution of D and Z On the other hand theLATE result for endogenous treatments in part 3 does not gen-erally refer to the effect of treatment on the treated so herethe role played by the structural model in identifying causaleffects merits further discussion

The treatment effect captures the effect of treatment on thetreated for those whose treatment status is changed by theinstrument Zi The data are informative about the effect oftreatment on these people because the instrument changes theirbehavior Thus an exclusion restriction is enough to identifycausal effects for a group directly affected by the ldquoexperi-mentrdquo at hand (Angrist Imbens and Rubin 1996 called thesepeople compliers) In some cases this is the set of all treatedindividuals while in other cases this is only a subset In anycase however this result provides a foundation for crediblecausal inference because the assumptions needed for this nar-row ldquoidenti cation-in-principlerdquo can be separated from mod-eling assumptions required for smoothing and extrapolation toother groups of interest The quality of this extrapolation is ofcourse an open question and undoubtedly differs across appli-cations In my experience however often estimates of LATEdiffer little from estimates based on the stronger assumptionsinvoked to identify effects on the entire treated population (Ofcourse I have to admit that when simple and sophisticatedestimation strategies do differ I invariably prefer the simpleFor an illustration of why this is see the example that fol-lows)

Although extrapolation is an important part of what empir-ical researchers do parameters like LATE and the effectof treatment on the treated provide a minimum-controversyjumping-off point for inference and prediction Structuralparameters are sometimes more closely linked to economictheory than average causal effects But the ultimate goal oftheory-motivated structural estimation seems to differ littlefrom the causal agenda For example Keane and Wolpin(1997 p 111) used structural models to ldquoforecast the behaviorof agents given any change in the state of the world that canbe characterized as a change in their constraintsrdquo A prereq-uisite for this is credible assessment of causal effects of pastchanges Structural parameters that are not linked to causaleffects are not useful for this basic purpose [Mullahy (1998)made a similar point] The rest of my discussion is therefore

limited to models in which the effects of interest are de neddirectly in terms of potential outcomes

2 CAUSAL EFFECTS ON LDVrsquos

21 Average Effects in Experimental Data

Does the fact that a dependent variable is binary or non-negative have any implications for empirical causal analysisA useful starting point for this discussion is the analysis ofrandomized experiments since some of the issues raised bythe presence of LDVrsquos have nothing to do with endogeneitySuppose that Di was randomly assigned or at least assignedby some mechanism that ensures independence between Di

and Y0i In this case a simple difference in means betweenthose with Di

D 1 and with DiD 0 identi es the effect of treat-

ment on the treated

E6YimdashDi

D 17 ƒ E6YimdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17

4by independence of Y0i and Di5

D E6Y1iƒ Y0i

mdashDiD 17

4by linearity of conditional means50 (8)

If Di is also independent of Y1i as would be likely in an exper-iment then E6Y1i

ƒ Y0imdashDi

D 17 D E6Y1iƒ Y0i7 the uncondi-

tional average treatment effect (Usually this ldquounconditionalrdquoaverage still refers to a subpopulation eligible to participate inthe experiment)

Equation (8) shows that the estimation of causal effectsin experiments presents no special challenges whether Yi isbinary nonnegative or continuously distributed If Yi is binarythen the difference in means on the left side of (8) cor-responds to a difference in probabilities while if Yi has amass point at 0 the difference in means is the difference inE6Yi

mdashYi gt 01Di7P6Yi gt 0mdashDi7 But these facts have no bearingon the causal interpretation of estimates or in the absence offurther assumptions or restrictions the choice of estimators

22 Conditional-on-Positive Effects

In many studies with nonnegative dependent variablesresearchers are interested in effects in a subset of the popula-tion with positive outcomes Interest in conditional-on-positiveeffects is sometimes motivated by the following decompo-sition of differences in means discussed by McDonald andMof t (1980)

E6YimdashDi

D 17ƒ E6YimdashDi

D 07

D 8P6Yi gt 0mdashDiD 17ƒ P6Y gt 0mdashDi

D 079

E6YimdashYi gt 01Di

D 17

C 8E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 079

P6Y gt 0mdashDiD 070 (9)

6 Journal of Business amp Economic Statistics January 2001

This decomposition describes how much of the overalltreatment-control difference is due to participation effects (iethe impact on 16Yi gt 07) and how much is due to an increasein intensity for those with Yi gt 0 For a recent example ofthis distinction see Evans Farrelly and Montgomery (1999)who analyzed the impact of workplace smoking restrictionson smoking participation and intensity

In an experimental setting the interpretation of the rst partof (9) as giving the causal effect of treatment on participationis straightforward Does the conditional-on-positive differencein the second part also have a straightforward interpretationThe large literature constrasting two-part and sample-selectionmodels for LDVrsquos suggests not (See for example DuanManning Morris and Newhouse 1984 Hay and Olsen 1984Hay Leu and Roher 1987 Leung and Yu 1996 Maddala1985 Manning Duan and Rogers 1987 Mullahy 1998)

To analyze the conditional-on-positive comparison furtherit is useful to write the mean difference by treatment status asfollows (still assuming Y0i and Di are independent)

E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 17

D E6Y1iƒ Y0i

mdashY1i gt 01DiD 17 (10a)

C 8E6Y0imdashY1i gt 01Di

D 17ƒ E6Y0i gt 01DiD 1790

(10b)

On one hand (10a) suggests that the conditional contrastestimates a potentially interesting effect since this clearlyamounts to a statement about the impact of treatment on thedistribution of potential outcomes (in fact this is somethinglike a comparison of hazard rates) On the other hand from(10b) it is clear that a conditional-on-working comparisondoes not tell us how much of the overall treatment effect isdue to an increase in work among treated workers The prob-lem is that the conditional contrast involves different groups ofpeoplemdashthose with Y1i gt 0 and those with Y0i gt 0 Supposefor example that the treatment effect is a positive constantsay Y1i

D Y0iC Since the second term in (10b) must then be

negative the observed difference E6YimdashYi gt 07ƒ E6Yi

mdashYi gt 07is clearly less than the causal effect on treated workers whichis in the constant-effects model This is the selectivity-biasproblem rst noted by Gronau (1974) and Heckman (1974)

In principle tobit and sample-selection models can be usedto eliminate selectivity bias in conditional-on-positive com-parisons These models depict Yi as the censored observationof an underlying continuously distributed latent variable Sup-pose for example that

YiD 16Yi gt 07Y uuml

i 1 where

Y uumli

D Y uuml01

C 4Y uuml1i

ƒ Y uuml0i5Di

D Y uuml0i

C Di0 (11)

Recent studies with this type of censoring in a female labor-supply model are those of Blundell and Smith (1989) and Lee(1995) both of which include endogeneous regressors Notethat in this context the constant-effects causal model is appliedto the latent variable not the observed outcome

Under a variety of distributional assumptions [eg normal-ity as in Heckman (1974) or weaker assumptions like sym-metry as in Powell (1986a)] the parameter is identi edBut what is the interpretation of in a causal model Onepossible answer is that is the causal effect of Di on Y uuml

i 1

though Y uumli is not observed so this is not usually of intrinsic

interest However a direct calculation using (11) shows that

is also a conditional-on-positive causal effect (for details seethe appendix) In particular

D E6Y1iƒ Y0i

mdashY1i gt 07 if lt 0 (12a)

and

D E6Y1iƒ Y0i

mdashY1i gt 07 if gt 00 (12b)

Thus the censored-regression model does succeed in sepa-rating causal effects from selection effects in conditional-on-positive comparisons [We do not know a priori whether isthe effect in (12a) or (12b) but it seems reasonable to use thesign of the estimated to decide]

Although (11) provides an elegant resolution of theselection-bias dilemma in practice I nd the use of censored-regression models to accomplish this unattractive One prob-lem is conceptual Although a censored-regression modelseems natural for arti cially censored data (eg topcodedvariables in the Current Population Survey) the notion of alatent labor-supply equation that can take on negative values isless clear cut The censoring in this case comes about becausesome people choose to work zero hours and not because ofmeasurement problems [Maddala (1985) made a similar com-ment regarding Tobinrsquos original application] Here an under-lying structural model seems essential to the interpretationof empirical results For example in the labor-supply modelfrom Section 1 the censored latent variable is the differencebetween unobserved offered wages and marginal rates of sub-stitution But even assuming the effect of regressors on thisdifference is of interest the latent index coef cients alone haveno predictive value for observable quantities

Second even if we adopt a theoretical framework thatmakes the latent structure meaningful identi cation of acensored-regression model requires assumption beyond thoseneeded for indenti cation of unconditional causal effects ofthe type described by (8) Semiparametric estimators that donot rely on distributional assumptions fail here because theregressor is discrete and there are no exclusion restrictions onthe selection equation (see Chamberlain 1986) Moreover inaddition to requiring distributional assumptions identi cationof in (12) turns heavily on the additive constant-effectsmodel in (11) This is because E6Y1i

ƒY0imdashY1i gt 07 involves the

joint distribution of Y1 and Y0 The constant-effects assump-tion nails this joint distribution down but the data containinformation on marginal distributions only (which is why evenrandomized trials fail to answer the causal question that moti-vates samples-selection models)

These concerns echoed by many applied researchers stim-ulated investigation of alternative strategies for the analysisof nonnegative outcomes (eg see Mof trsquos 1999 survey)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 2: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 3

fundamentally different from causal inference with continuousoutcomes

Section 1 begins by discussing identi cation strategies inLDV models with dummy endogenous regressors I show thatthe auxiliary assumptions associated with structural model-ing are largely unnecessary for causal inference There isone important exception to this claim however and that iswhen attention focuses on conditional-on-positive effects asin sample-selection models For example labor economistssometimes study the effect of an endogenous treatment onhours worked for those who work Identi cation of sucheffects turns heavily on an underlying structural frameworkOn the other hand the motivation for estimating this sort ofeffect is often unclear (to me) Moreover claims for identi -cation in this context typically strike me as overly ambitiousbecause even an ideal randomized experiment fails to identifyconditional-on-positive causal effects

Setting aside the conceptual problems inherent inconditional-on-positive effects a focus on causal effectsinstead of say censored regression parameters or latent indexcoef cients has a major practical payoff First and most basicis the observation that if there are no covariates or the covari-ates are sparse and discrete linear models and associated esti-mation techniques like 2SLS are no less appropriate for LDVrsquosthan for other kinds of dependent variables This is becauseconditional expectation functions with discrete covariates canbe parameterized as linear using a saturated model regardlessof the support of the dependent variable Of course relation-ships involving continuous covariates or a less-than-saturatedparameterization for discrete covariates are usually nonlin-ear (even if the outcome variable has continuous support)In such cases however it still makes sense to ask whethernonlinear modeling strategies change inferences about causaleffects

If nonlinearity does seem important it can be incorpo-rated into models for conditional means using two new semi-parametric estimators The rst due to Mullahy (1997) isbased on a multiplicative model that can be estimated usinga simple nonlinear IV estimator The second developed byAbadie (1999) allows exible nonlinear approximation ofthe causal response function of interest In addition to newstrategies for estimating effects on means I also discussestimates of the effect of treatment on distribution ordi-nates and quantiles using an approach developed by AbadieAngrist and Imbens (1998) This provides an alternative tothe estimation of conditional-on-positive effects Two advan-tages of these new approaches are their computational sim-plicity and weak identi cation requirements relative to othersemiparametric approaches Another advantage is the fact thatthey estimate causal effects directly and are not tied to alatent-indexcensored-regression framework The new estima-tors are illustrated by estimating the effect of childbearing onwomenrsquos employment status and hours of work using multi-ple births as an instrument This ldquotwins instrumentrdquo was usedto estimate the labor-supply consequences of childbearing byRosenzweig and Wolpin (1980) Bronars and Grogger (1994)Gangadharan and Rosenbloom (1996) and Angrist and Evans(1998)

1 CAUSAL EFFECTS AND STRUCTURALPARAMETERS

11 The Effects of Interest

The relationship between fertility and labor supply is oflongstanding interest in labor economics and demography Fora recent discussion and references to the literature see Angristand Evans (1998) which is the basis of the empirical work inSection 4 The AngristndashEvans application is concerned withthe effect of going from a family size of two children to morethan two children Let Di be an indicator for women with morethan two children in a sample of women with at least twochildren The reasons for focusing on the transition from twoto more than two are both practical and substantive First onthe practical side there are plausible instruments available forthis fertility increment Second recent reductions in martialfertility have been concentrated in the 2ndash3 child range

What is the object of interest in an application like thisSometimes the purpose of research is merely descriptive inwhich case we might simply compare the outcomes of womenwho have Di

D 1 with those of women who have DiD 0 For

this descriptive agenda no special issues are raised by thefact that the dependent variable is limited beyond the obvi-ous consideration that if Yi is binary then one need only lookat means In contrast if Yi is a variable like earnings witha skewed distribution the mean may not capture everythingabout labor-supply behavior that is of interest In fact a com-plete description would probably look at the entire distributionof outcomes or at least at selected quantiles

A major problem with descriptive analyses is that they mayhave little predictive value Part of the motivation for studyinglabor supply and fertility is interest in how changes in gov-ernment policy and the environment affect childbearing andlabor supply For example we might be interested in the con-sequences of changes in contraceptive technology or costs amotivation for studying the twins experiment mentioned byRosenzweig and Wolpin (1980 p 347) Similarly one of thequestions addressed in the labor-supply literature is to whatextent declines in fertility have been a causal factor increas-ing female employment rates or reducing poverty In contrastwith descriptive analyses causal relationships answer counter-factual questions and are therefore more likely to be of valuefor predicting the effects of changing policies or changing cir-cumstances or understanding the past (eg see Manski 1996)

Causal relationships can be described most simply usingexplicit notation for counterfactuals or potential outcomesThis approach to causal inference goes back at least toR A Fisher but the modern version is usually attributed toRubin (1974 1977) Let Y1i denote the labor-market behaviorof mother i if she has a third child and let Y0i denote labor-market behavior otherwise for the same mother The averageeffect of childbearing on mothers who have a third child is

E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17 D E6Y1iƒ Y0i

mdashDiD 170 (1)

Note that the rst term on the left side is observed but thesecond term is an unobserved counterfactual average that weassume is meaningful

The right side of (1) is often called the effect of treatment onthe treated and is widely discussed in the evaluation literature

4 Journal of Business amp Economic Statistics January 2001

(eg Rubin 1977 Heckman and Robb 1985 Angrist 1998)In the context of social-program evaluation the effect of treat-ment on the treated tells us whether the program was bene cialfor participants This is not the only average effect of interestwe might also care about the unconditional average effect orthe effect in some subpopulation de ned by covariates (ieE6Y1i

ƒY0imdashX7 for covariates X) Ultimately of course we are

also likely to want to extrapolate from the experiences of thetreated to as-yet-untreated groups Such extrapolation makeslittle sense however unless average causal effects in existingpopulations can be reliably assessed

Simple comparisons of outcomes by Di generally fail toidentify causal effects Rather a comparison between treatedand untreated individuals equals the effect of treatment on thetreated plus a bias term

E6YimdashDi

D 17ƒ E6YimdashDi

D 17

D E6Y1imdashDi

D 17 ƒ E6Y0imdashDi

D 07

D E6Y1iƒ Y0i

mdashDiD 17

C 8E6Y0imdashDi

D 17ƒ E6Y0imdashDi

D 0790 (2)

The bias term disappears in the childbearing example if child-bearing is determined in a manner independent of a womanrsquospotential labor-market behavior if she does not have childrenIn that case 8E6Y0i

mdashDiD 07 ƒ E6Y0i

mdashDiD 179 D 0 and simple

comparisons identify the effect of treatment on the treatedBut this independence assumption seems unrealistic becausechildbearing decisions are made in light of information aboutearnings potential and career plans

12 Structural Models

What connects the causal parameters discussed in the pre-vious section with the parameters in structural economet-ric models Suppose that instead of potential outcomes webegin with a labor-supply model for hours worked alongthe lines of many second-generation labor-supply studies (seeKillingsworth 1983 for a survey) In this setting childbearingis determined by comparing the utility of having a child andnot having a child We can model this process as

DiD 14X 0

iƒ gt Daggeri51 (3a)

where Xi is a K 1 vector of observed characteristics thatdetermine utility and Daggeri is an unobserved variable re ecting aperson-speci c utility contrast

In a simple static model labor supply is given by the com-bination of the participation decision and hours determinationfor workers Workers chose their latent hours 4yi5 by equat-ing offered wages wi with the marginal rate of substitutionof goods for leisure mi4yi5 Participation is determined bythe relationship between wi and the marginal rate of substitu-tion at zero hours mi405 Since offered wages are unobservedfor nonworkers and reservation wages are never observed wedecompose these variables into a linear function of observablecharacteristics and regression error terms (denoted vwi1 vmi) asin the article by Heckman (1974) and many others

wiD X 0

ibdquowC wDi

C vwi (3b)

and

mi4yi5 D X 0ibdquom

C ndashyiC mDi

C vmi0 (3c)

Equating (3b) and (3c) and relabeling parameters and the errorterm we can solve for observed hours

YiD X 0

ibdquo C DiC ˜i if wi gt mi405

YiD 0 otherwise0

Equivalently

YiD 14X 0

ibdquo C Di gt ƒ˜i54X0ibdquo C Di

C ˜i50 (4)

Childbearing is said to be endogenous if the unobserved errordetermining Di depends on the unobserved error in the partic-ipation and hours equations

Since the structural equations tell us what a woman woulddo under alternative values of Di they describe the same sortof potential outcomes referred to in the previous model Theexplicit link is

YiD Y0i41ƒ Di5 C Y1iDi1 (5)

where

Y0iD 14X 0

ibdquo gt ƒ˜i54X0ibdquo C ˜i5 (6a)

and

Y1iD 14X 0

ibdquo C gt ƒ˜i54X0ibdquo C C ˜i50 (6b)

Once the structural parameters are known we can use theserelationships to write down expressions for causal effects Forexample the effect of treatment on the treated is

E6Y1iƒ Y0i

mdashDiD 17

D E 14X 0ibdquo C gt ƒ˜i54X

0ibdquo C C ˜i5

ƒ 14X 0i bdquo gt ƒ˜i54X

0ibdquo C ˜i5mdashX 0

iƒ gt Daggeri90 (7)

Note however that knowledge of the parameters on the rightside of (7) is still not enough to evaluate this expression Thefollowing lemma outlines the identi cation possibilities in thiscontext

Lemma Assume that the covariates 4Xi5 are independentof continuously distributed latent errors 4Daggeri1 ˜i5 Then

1 If Daggeri is not independent of ˜i and the probability oftreatment is always nonzero the effect of treatment on thetreated is not identi ed without further assumptions (Heckman1990)

2 If Daggeri is independent of ˜i the effect of treatment on thetreated is identi ed (exogenous treatment)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 5

3 Suppose there is a covariate denoted Zi with coef cientƒ1 in (3a) which is excluded from (3b) and (3c) Without lossof generality assume ƒ1 gt 0 Then the local average treatmenteffect (LATE) given by E6Y1i

ƒ Y0imdashX 0

iƒ C ƒ1 gt Daggeri gt X 0iƒ7 is

identi ed ( Imbens and Angrist 1994)4 Suppose that LATE is identi ed as in 3 and that P6Di

D1mdashZi

D 07 D 0 Then LATE equals the effect of treatment on thetreated E6Y1i

ƒY0imdashDi

D 17 Similarly if P6DiD 1mdashZi

D 17 D 1LATE equals E6Y1i

ƒ Y0imdashDi

D 07 (Angrist and Imbens 1991)

This set of results can easily be summarized using non-technical language First without additional assumptions theeffect of treatment on the treated is not identi ed in latent-index models Second the three positive identi cation resultsin the lemma for exogeneous treatment the LATE resultand the specialization of LATE to effects on the treated ornontreated require no information about the structural modelother than the distribution of D and Z On the other hand theLATE result for endogenous treatments in part 3 does not gen-erally refer to the effect of treatment on the treated so herethe role played by the structural model in identifying causaleffects merits further discussion

The treatment effect captures the effect of treatment on thetreated for those whose treatment status is changed by theinstrument Zi The data are informative about the effect oftreatment on these people because the instrument changes theirbehavior Thus an exclusion restriction is enough to identifycausal effects for a group directly affected by the ldquoexperi-mentrdquo at hand (Angrist Imbens and Rubin 1996 called thesepeople compliers) In some cases this is the set of all treatedindividuals while in other cases this is only a subset In anycase however this result provides a foundation for crediblecausal inference because the assumptions needed for this nar-row ldquoidenti cation-in-principlerdquo can be separated from mod-eling assumptions required for smoothing and extrapolation toother groups of interest The quality of this extrapolation is ofcourse an open question and undoubtedly differs across appli-cations In my experience however often estimates of LATEdiffer little from estimates based on the stronger assumptionsinvoked to identify effects on the entire treated population (Ofcourse I have to admit that when simple and sophisticatedestimation strategies do differ I invariably prefer the simpleFor an illustration of why this is see the example that fol-lows)

Although extrapolation is an important part of what empir-ical researchers do parameters like LATE and the effectof treatment on the treated provide a minimum-controversyjumping-off point for inference and prediction Structuralparameters are sometimes more closely linked to economictheory than average causal effects But the ultimate goal oftheory-motivated structural estimation seems to differ littlefrom the causal agenda For example Keane and Wolpin(1997 p 111) used structural models to ldquoforecast the behaviorof agents given any change in the state of the world that canbe characterized as a change in their constraintsrdquo A prereq-uisite for this is credible assessment of causal effects of pastchanges Structural parameters that are not linked to causaleffects are not useful for this basic purpose [Mullahy (1998)made a similar point] The rest of my discussion is therefore

limited to models in which the effects of interest are de neddirectly in terms of potential outcomes

2 CAUSAL EFFECTS ON LDVrsquos

21 Average Effects in Experimental Data

Does the fact that a dependent variable is binary or non-negative have any implications for empirical causal analysisA useful starting point for this discussion is the analysis ofrandomized experiments since some of the issues raised bythe presence of LDVrsquos have nothing to do with endogeneitySuppose that Di was randomly assigned or at least assignedby some mechanism that ensures independence between Di

and Y0i In this case a simple difference in means betweenthose with Di

D 1 and with DiD 0 identi es the effect of treat-

ment on the treated

E6YimdashDi

D 17 ƒ E6YimdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17

4by independence of Y0i and Di5

D E6Y1iƒ Y0i

mdashDiD 17

4by linearity of conditional means50 (8)

If Di is also independent of Y1i as would be likely in an exper-iment then E6Y1i

ƒ Y0imdashDi

D 17 D E6Y1iƒ Y0i7 the uncondi-

tional average treatment effect (Usually this ldquounconditionalrdquoaverage still refers to a subpopulation eligible to participate inthe experiment)

Equation (8) shows that the estimation of causal effectsin experiments presents no special challenges whether Yi isbinary nonnegative or continuously distributed If Yi is binarythen the difference in means on the left side of (8) cor-responds to a difference in probabilities while if Yi has amass point at 0 the difference in means is the difference inE6Yi

mdashYi gt 01Di7P6Yi gt 0mdashDi7 But these facts have no bearingon the causal interpretation of estimates or in the absence offurther assumptions or restrictions the choice of estimators

22 Conditional-on-Positive Effects

In many studies with nonnegative dependent variablesresearchers are interested in effects in a subset of the popula-tion with positive outcomes Interest in conditional-on-positiveeffects is sometimes motivated by the following decompo-sition of differences in means discussed by McDonald andMof t (1980)

E6YimdashDi

D 17ƒ E6YimdashDi

D 07

D 8P6Yi gt 0mdashDiD 17ƒ P6Y gt 0mdashDi

D 079

E6YimdashYi gt 01Di

D 17

C 8E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 079

P6Y gt 0mdashDiD 070 (9)

6 Journal of Business amp Economic Statistics January 2001

This decomposition describes how much of the overalltreatment-control difference is due to participation effects (iethe impact on 16Yi gt 07) and how much is due to an increasein intensity for those with Yi gt 0 For a recent example ofthis distinction see Evans Farrelly and Montgomery (1999)who analyzed the impact of workplace smoking restrictionson smoking participation and intensity

In an experimental setting the interpretation of the rst partof (9) as giving the causal effect of treatment on participationis straightforward Does the conditional-on-positive differencein the second part also have a straightforward interpretationThe large literature constrasting two-part and sample-selectionmodels for LDVrsquos suggests not (See for example DuanManning Morris and Newhouse 1984 Hay and Olsen 1984Hay Leu and Roher 1987 Leung and Yu 1996 Maddala1985 Manning Duan and Rogers 1987 Mullahy 1998)

To analyze the conditional-on-positive comparison furtherit is useful to write the mean difference by treatment status asfollows (still assuming Y0i and Di are independent)

E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 17

D E6Y1iƒ Y0i

mdashY1i gt 01DiD 17 (10a)

C 8E6Y0imdashY1i gt 01Di

D 17ƒ E6Y0i gt 01DiD 1790

(10b)

On one hand (10a) suggests that the conditional contrastestimates a potentially interesting effect since this clearlyamounts to a statement about the impact of treatment on thedistribution of potential outcomes (in fact this is somethinglike a comparison of hazard rates) On the other hand from(10b) it is clear that a conditional-on-working comparisondoes not tell us how much of the overall treatment effect isdue to an increase in work among treated workers The prob-lem is that the conditional contrast involves different groups ofpeoplemdashthose with Y1i gt 0 and those with Y0i gt 0 Supposefor example that the treatment effect is a positive constantsay Y1i

D Y0iC Since the second term in (10b) must then be

negative the observed difference E6YimdashYi gt 07ƒ E6Yi

mdashYi gt 07is clearly less than the causal effect on treated workers whichis in the constant-effects model This is the selectivity-biasproblem rst noted by Gronau (1974) and Heckman (1974)

In principle tobit and sample-selection models can be usedto eliminate selectivity bias in conditional-on-positive com-parisons These models depict Yi as the censored observationof an underlying continuously distributed latent variable Sup-pose for example that

YiD 16Yi gt 07Y uuml

i 1 where

Y uumli

D Y uuml01

C 4Y uuml1i

ƒ Y uuml0i5Di

D Y uuml0i

C Di0 (11)

Recent studies with this type of censoring in a female labor-supply model are those of Blundell and Smith (1989) and Lee(1995) both of which include endogeneous regressors Notethat in this context the constant-effects causal model is appliedto the latent variable not the observed outcome

Under a variety of distributional assumptions [eg normal-ity as in Heckman (1974) or weaker assumptions like sym-metry as in Powell (1986a)] the parameter is identi edBut what is the interpretation of in a causal model Onepossible answer is that is the causal effect of Di on Y uuml

i 1

though Y uumli is not observed so this is not usually of intrinsic

interest However a direct calculation using (11) shows that

is also a conditional-on-positive causal effect (for details seethe appendix) In particular

D E6Y1iƒ Y0i

mdashY1i gt 07 if lt 0 (12a)

and

D E6Y1iƒ Y0i

mdashY1i gt 07 if gt 00 (12b)

Thus the censored-regression model does succeed in sepa-rating causal effects from selection effects in conditional-on-positive comparisons [We do not know a priori whether isthe effect in (12a) or (12b) but it seems reasonable to use thesign of the estimated to decide]

Although (11) provides an elegant resolution of theselection-bias dilemma in practice I nd the use of censored-regression models to accomplish this unattractive One prob-lem is conceptual Although a censored-regression modelseems natural for arti cially censored data (eg topcodedvariables in the Current Population Survey) the notion of alatent labor-supply equation that can take on negative values isless clear cut The censoring in this case comes about becausesome people choose to work zero hours and not because ofmeasurement problems [Maddala (1985) made a similar com-ment regarding Tobinrsquos original application] Here an under-lying structural model seems essential to the interpretationof empirical results For example in the labor-supply modelfrom Section 1 the censored latent variable is the differencebetween unobserved offered wages and marginal rates of sub-stitution But even assuming the effect of regressors on thisdifference is of interest the latent index coef cients alone haveno predictive value for observable quantities

Second even if we adopt a theoretical framework thatmakes the latent structure meaningful identi cation of acensored-regression model requires assumption beyond thoseneeded for indenti cation of unconditional causal effects ofthe type described by (8) Semiparametric estimators that donot rely on distributional assumptions fail here because theregressor is discrete and there are no exclusion restrictions onthe selection equation (see Chamberlain 1986) Moreover inaddition to requiring distributional assumptions identi cationof in (12) turns heavily on the additive constant-effectsmodel in (11) This is because E6Y1i

ƒY0imdashY1i gt 07 involves the

joint distribution of Y1 and Y0 The constant-effects assump-tion nails this joint distribution down but the data containinformation on marginal distributions only (which is why evenrandomized trials fail to answer the causal question that moti-vates samples-selection models)

These concerns echoed by many applied researchers stim-ulated investigation of alternative strategies for the analysisof nonnegative outcomes (eg see Mof trsquos 1999 survey)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 3: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

4 Journal of Business amp Economic Statistics January 2001

(eg Rubin 1977 Heckman and Robb 1985 Angrist 1998)In the context of social-program evaluation the effect of treat-ment on the treated tells us whether the program was bene cialfor participants This is not the only average effect of interestwe might also care about the unconditional average effect orthe effect in some subpopulation de ned by covariates (ieE6Y1i

ƒY0imdashX7 for covariates X) Ultimately of course we are

also likely to want to extrapolate from the experiences of thetreated to as-yet-untreated groups Such extrapolation makeslittle sense however unless average causal effects in existingpopulations can be reliably assessed

Simple comparisons of outcomes by Di generally fail toidentify causal effects Rather a comparison between treatedand untreated individuals equals the effect of treatment on thetreated plus a bias term

E6YimdashDi

D 17ƒ E6YimdashDi

D 17

D E6Y1imdashDi

D 17 ƒ E6Y0imdashDi

D 07

D E6Y1iƒ Y0i

mdashDiD 17

C 8E6Y0imdashDi

D 17ƒ E6Y0imdashDi

D 0790 (2)

The bias term disappears in the childbearing example if child-bearing is determined in a manner independent of a womanrsquospotential labor-market behavior if she does not have childrenIn that case 8E6Y0i

mdashDiD 07 ƒ E6Y0i

mdashDiD 179 D 0 and simple

comparisons identify the effect of treatment on the treatedBut this independence assumption seems unrealistic becausechildbearing decisions are made in light of information aboutearnings potential and career plans

12 Structural Models

What connects the causal parameters discussed in the pre-vious section with the parameters in structural economet-ric models Suppose that instead of potential outcomes webegin with a labor-supply model for hours worked alongthe lines of many second-generation labor-supply studies (seeKillingsworth 1983 for a survey) In this setting childbearingis determined by comparing the utility of having a child andnot having a child We can model this process as

DiD 14X 0

iƒ gt Daggeri51 (3a)

where Xi is a K 1 vector of observed characteristics thatdetermine utility and Daggeri is an unobserved variable re ecting aperson-speci c utility contrast

In a simple static model labor supply is given by the com-bination of the participation decision and hours determinationfor workers Workers chose their latent hours 4yi5 by equat-ing offered wages wi with the marginal rate of substitutionof goods for leisure mi4yi5 Participation is determined bythe relationship between wi and the marginal rate of substitu-tion at zero hours mi405 Since offered wages are unobservedfor nonworkers and reservation wages are never observed wedecompose these variables into a linear function of observablecharacteristics and regression error terms (denoted vwi1 vmi) asin the article by Heckman (1974) and many others

wiD X 0

ibdquowC wDi

C vwi (3b)

and

mi4yi5 D X 0ibdquom

C ndashyiC mDi

C vmi0 (3c)

Equating (3b) and (3c) and relabeling parameters and the errorterm we can solve for observed hours

YiD X 0

ibdquo C DiC ˜i if wi gt mi405

YiD 0 otherwise0

Equivalently

YiD 14X 0

ibdquo C Di gt ƒ˜i54X0ibdquo C Di

C ˜i50 (4)

Childbearing is said to be endogenous if the unobserved errordetermining Di depends on the unobserved error in the partic-ipation and hours equations

Since the structural equations tell us what a woman woulddo under alternative values of Di they describe the same sortof potential outcomes referred to in the previous model Theexplicit link is

YiD Y0i41ƒ Di5 C Y1iDi1 (5)

where

Y0iD 14X 0

ibdquo gt ƒ˜i54X0ibdquo C ˜i5 (6a)

and

Y1iD 14X 0

ibdquo C gt ƒ˜i54X0ibdquo C C ˜i50 (6b)

Once the structural parameters are known we can use theserelationships to write down expressions for causal effects Forexample the effect of treatment on the treated is

E6Y1iƒ Y0i

mdashDiD 17

D E 14X 0ibdquo C gt ƒ˜i54X

0ibdquo C C ˜i5

ƒ 14X 0i bdquo gt ƒ˜i54X

0ibdquo C ˜i5mdashX 0

iƒ gt Daggeri90 (7)

Note however that knowledge of the parameters on the rightside of (7) is still not enough to evaluate this expression Thefollowing lemma outlines the identi cation possibilities in thiscontext

Lemma Assume that the covariates 4Xi5 are independentof continuously distributed latent errors 4Daggeri1 ˜i5 Then

1 If Daggeri is not independent of ˜i and the probability oftreatment is always nonzero the effect of treatment on thetreated is not identi ed without further assumptions (Heckman1990)

2 If Daggeri is independent of ˜i the effect of treatment on thetreated is identi ed (exogenous treatment)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 5

3 Suppose there is a covariate denoted Zi with coef cientƒ1 in (3a) which is excluded from (3b) and (3c) Without lossof generality assume ƒ1 gt 0 Then the local average treatmenteffect (LATE) given by E6Y1i

ƒ Y0imdashX 0

iƒ C ƒ1 gt Daggeri gt X 0iƒ7 is

identi ed ( Imbens and Angrist 1994)4 Suppose that LATE is identi ed as in 3 and that P6Di

D1mdashZi

D 07 D 0 Then LATE equals the effect of treatment on thetreated E6Y1i

ƒY0imdashDi

D 17 Similarly if P6DiD 1mdashZi

D 17 D 1LATE equals E6Y1i

ƒ Y0imdashDi

D 07 (Angrist and Imbens 1991)

This set of results can easily be summarized using non-technical language First without additional assumptions theeffect of treatment on the treated is not identi ed in latent-index models Second the three positive identi cation resultsin the lemma for exogeneous treatment the LATE resultand the specialization of LATE to effects on the treated ornontreated require no information about the structural modelother than the distribution of D and Z On the other hand theLATE result for endogenous treatments in part 3 does not gen-erally refer to the effect of treatment on the treated so herethe role played by the structural model in identifying causaleffects merits further discussion

The treatment effect captures the effect of treatment on thetreated for those whose treatment status is changed by theinstrument Zi The data are informative about the effect oftreatment on these people because the instrument changes theirbehavior Thus an exclusion restriction is enough to identifycausal effects for a group directly affected by the ldquoexperi-mentrdquo at hand (Angrist Imbens and Rubin 1996 called thesepeople compliers) In some cases this is the set of all treatedindividuals while in other cases this is only a subset In anycase however this result provides a foundation for crediblecausal inference because the assumptions needed for this nar-row ldquoidenti cation-in-principlerdquo can be separated from mod-eling assumptions required for smoothing and extrapolation toother groups of interest The quality of this extrapolation is ofcourse an open question and undoubtedly differs across appli-cations In my experience however often estimates of LATEdiffer little from estimates based on the stronger assumptionsinvoked to identify effects on the entire treated population (Ofcourse I have to admit that when simple and sophisticatedestimation strategies do differ I invariably prefer the simpleFor an illustration of why this is see the example that fol-lows)

Although extrapolation is an important part of what empir-ical researchers do parameters like LATE and the effectof treatment on the treated provide a minimum-controversyjumping-off point for inference and prediction Structuralparameters are sometimes more closely linked to economictheory than average causal effects But the ultimate goal oftheory-motivated structural estimation seems to differ littlefrom the causal agenda For example Keane and Wolpin(1997 p 111) used structural models to ldquoforecast the behaviorof agents given any change in the state of the world that canbe characterized as a change in their constraintsrdquo A prereq-uisite for this is credible assessment of causal effects of pastchanges Structural parameters that are not linked to causaleffects are not useful for this basic purpose [Mullahy (1998)made a similar point] The rest of my discussion is therefore

limited to models in which the effects of interest are de neddirectly in terms of potential outcomes

2 CAUSAL EFFECTS ON LDVrsquos

21 Average Effects in Experimental Data

Does the fact that a dependent variable is binary or non-negative have any implications for empirical causal analysisA useful starting point for this discussion is the analysis ofrandomized experiments since some of the issues raised bythe presence of LDVrsquos have nothing to do with endogeneitySuppose that Di was randomly assigned or at least assignedby some mechanism that ensures independence between Di

and Y0i In this case a simple difference in means betweenthose with Di

D 1 and with DiD 0 identi es the effect of treat-

ment on the treated

E6YimdashDi

D 17 ƒ E6YimdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17

4by independence of Y0i and Di5

D E6Y1iƒ Y0i

mdashDiD 17

4by linearity of conditional means50 (8)

If Di is also independent of Y1i as would be likely in an exper-iment then E6Y1i

ƒ Y0imdashDi

D 17 D E6Y1iƒ Y0i7 the uncondi-

tional average treatment effect (Usually this ldquounconditionalrdquoaverage still refers to a subpopulation eligible to participate inthe experiment)

Equation (8) shows that the estimation of causal effectsin experiments presents no special challenges whether Yi isbinary nonnegative or continuously distributed If Yi is binarythen the difference in means on the left side of (8) cor-responds to a difference in probabilities while if Yi has amass point at 0 the difference in means is the difference inE6Yi

mdashYi gt 01Di7P6Yi gt 0mdashDi7 But these facts have no bearingon the causal interpretation of estimates or in the absence offurther assumptions or restrictions the choice of estimators

22 Conditional-on-Positive Effects

In many studies with nonnegative dependent variablesresearchers are interested in effects in a subset of the popula-tion with positive outcomes Interest in conditional-on-positiveeffects is sometimes motivated by the following decompo-sition of differences in means discussed by McDonald andMof t (1980)

E6YimdashDi

D 17ƒ E6YimdashDi

D 07

D 8P6Yi gt 0mdashDiD 17ƒ P6Y gt 0mdashDi

D 079

E6YimdashYi gt 01Di

D 17

C 8E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 079

P6Y gt 0mdashDiD 070 (9)

6 Journal of Business amp Economic Statistics January 2001

This decomposition describes how much of the overalltreatment-control difference is due to participation effects (iethe impact on 16Yi gt 07) and how much is due to an increasein intensity for those with Yi gt 0 For a recent example ofthis distinction see Evans Farrelly and Montgomery (1999)who analyzed the impact of workplace smoking restrictionson smoking participation and intensity

In an experimental setting the interpretation of the rst partof (9) as giving the causal effect of treatment on participationis straightforward Does the conditional-on-positive differencein the second part also have a straightforward interpretationThe large literature constrasting two-part and sample-selectionmodels for LDVrsquos suggests not (See for example DuanManning Morris and Newhouse 1984 Hay and Olsen 1984Hay Leu and Roher 1987 Leung and Yu 1996 Maddala1985 Manning Duan and Rogers 1987 Mullahy 1998)

To analyze the conditional-on-positive comparison furtherit is useful to write the mean difference by treatment status asfollows (still assuming Y0i and Di are independent)

E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 17

D E6Y1iƒ Y0i

mdashY1i gt 01DiD 17 (10a)

C 8E6Y0imdashY1i gt 01Di

D 17ƒ E6Y0i gt 01DiD 1790

(10b)

On one hand (10a) suggests that the conditional contrastestimates a potentially interesting effect since this clearlyamounts to a statement about the impact of treatment on thedistribution of potential outcomes (in fact this is somethinglike a comparison of hazard rates) On the other hand from(10b) it is clear that a conditional-on-working comparisondoes not tell us how much of the overall treatment effect isdue to an increase in work among treated workers The prob-lem is that the conditional contrast involves different groups ofpeoplemdashthose with Y1i gt 0 and those with Y0i gt 0 Supposefor example that the treatment effect is a positive constantsay Y1i

D Y0iC Since the second term in (10b) must then be

negative the observed difference E6YimdashYi gt 07ƒ E6Yi

mdashYi gt 07is clearly less than the causal effect on treated workers whichis in the constant-effects model This is the selectivity-biasproblem rst noted by Gronau (1974) and Heckman (1974)

In principle tobit and sample-selection models can be usedto eliminate selectivity bias in conditional-on-positive com-parisons These models depict Yi as the censored observationof an underlying continuously distributed latent variable Sup-pose for example that

YiD 16Yi gt 07Y uuml

i 1 where

Y uumli

D Y uuml01

C 4Y uuml1i

ƒ Y uuml0i5Di

D Y uuml0i

C Di0 (11)

Recent studies with this type of censoring in a female labor-supply model are those of Blundell and Smith (1989) and Lee(1995) both of which include endogeneous regressors Notethat in this context the constant-effects causal model is appliedto the latent variable not the observed outcome

Under a variety of distributional assumptions [eg normal-ity as in Heckman (1974) or weaker assumptions like sym-metry as in Powell (1986a)] the parameter is identi edBut what is the interpretation of in a causal model Onepossible answer is that is the causal effect of Di on Y uuml

i 1

though Y uumli is not observed so this is not usually of intrinsic

interest However a direct calculation using (11) shows that

is also a conditional-on-positive causal effect (for details seethe appendix) In particular

D E6Y1iƒ Y0i

mdashY1i gt 07 if lt 0 (12a)

and

D E6Y1iƒ Y0i

mdashY1i gt 07 if gt 00 (12b)

Thus the censored-regression model does succeed in sepa-rating causal effects from selection effects in conditional-on-positive comparisons [We do not know a priori whether isthe effect in (12a) or (12b) but it seems reasonable to use thesign of the estimated to decide]

Although (11) provides an elegant resolution of theselection-bias dilemma in practice I nd the use of censored-regression models to accomplish this unattractive One prob-lem is conceptual Although a censored-regression modelseems natural for arti cially censored data (eg topcodedvariables in the Current Population Survey) the notion of alatent labor-supply equation that can take on negative values isless clear cut The censoring in this case comes about becausesome people choose to work zero hours and not because ofmeasurement problems [Maddala (1985) made a similar com-ment regarding Tobinrsquos original application] Here an under-lying structural model seems essential to the interpretationof empirical results For example in the labor-supply modelfrom Section 1 the censored latent variable is the differencebetween unobserved offered wages and marginal rates of sub-stitution But even assuming the effect of regressors on thisdifference is of interest the latent index coef cients alone haveno predictive value for observable quantities

Second even if we adopt a theoretical framework thatmakes the latent structure meaningful identi cation of acensored-regression model requires assumption beyond thoseneeded for indenti cation of unconditional causal effects ofthe type described by (8) Semiparametric estimators that donot rely on distributional assumptions fail here because theregressor is discrete and there are no exclusion restrictions onthe selection equation (see Chamberlain 1986) Moreover inaddition to requiring distributional assumptions identi cationof in (12) turns heavily on the additive constant-effectsmodel in (11) This is because E6Y1i

ƒY0imdashY1i gt 07 involves the

joint distribution of Y1 and Y0 The constant-effects assump-tion nails this joint distribution down but the data containinformation on marginal distributions only (which is why evenrandomized trials fail to answer the causal question that moti-vates samples-selection models)

These concerns echoed by many applied researchers stim-ulated investigation of alternative strategies for the analysisof nonnegative outcomes (eg see Mof trsquos 1999 survey)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 4: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 5

3 Suppose there is a covariate denoted Zi with coef cientƒ1 in (3a) which is excluded from (3b) and (3c) Without lossof generality assume ƒ1 gt 0 Then the local average treatmenteffect (LATE) given by E6Y1i

ƒ Y0imdashX 0

iƒ C ƒ1 gt Daggeri gt X 0iƒ7 is

identi ed ( Imbens and Angrist 1994)4 Suppose that LATE is identi ed as in 3 and that P6Di

D1mdashZi

D 07 D 0 Then LATE equals the effect of treatment on thetreated E6Y1i

ƒY0imdashDi

D 17 Similarly if P6DiD 1mdashZi

D 17 D 1LATE equals E6Y1i

ƒ Y0imdashDi

D 07 (Angrist and Imbens 1991)

This set of results can easily be summarized using non-technical language First without additional assumptions theeffect of treatment on the treated is not identi ed in latent-index models Second the three positive identi cation resultsin the lemma for exogeneous treatment the LATE resultand the specialization of LATE to effects on the treated ornontreated require no information about the structural modelother than the distribution of D and Z On the other hand theLATE result for endogenous treatments in part 3 does not gen-erally refer to the effect of treatment on the treated so herethe role played by the structural model in identifying causaleffects merits further discussion

The treatment effect captures the effect of treatment on thetreated for those whose treatment status is changed by theinstrument Zi The data are informative about the effect oftreatment on these people because the instrument changes theirbehavior Thus an exclusion restriction is enough to identifycausal effects for a group directly affected by the ldquoexperi-mentrdquo at hand (Angrist Imbens and Rubin 1996 called thesepeople compliers) In some cases this is the set of all treatedindividuals while in other cases this is only a subset In anycase however this result provides a foundation for crediblecausal inference because the assumptions needed for this nar-row ldquoidenti cation-in-principlerdquo can be separated from mod-eling assumptions required for smoothing and extrapolation toother groups of interest The quality of this extrapolation is ofcourse an open question and undoubtedly differs across appli-cations In my experience however often estimates of LATEdiffer little from estimates based on the stronger assumptionsinvoked to identify effects on the entire treated population (Ofcourse I have to admit that when simple and sophisticatedestimation strategies do differ I invariably prefer the simpleFor an illustration of why this is see the example that fol-lows)

Although extrapolation is an important part of what empir-ical researchers do parameters like LATE and the effectof treatment on the treated provide a minimum-controversyjumping-off point for inference and prediction Structuralparameters are sometimes more closely linked to economictheory than average causal effects But the ultimate goal oftheory-motivated structural estimation seems to differ littlefrom the causal agenda For example Keane and Wolpin(1997 p 111) used structural models to ldquoforecast the behaviorof agents given any change in the state of the world that canbe characterized as a change in their constraintsrdquo A prereq-uisite for this is credible assessment of causal effects of pastchanges Structural parameters that are not linked to causaleffects are not useful for this basic purpose [Mullahy (1998)made a similar point] The rest of my discussion is therefore

limited to models in which the effects of interest are de neddirectly in terms of potential outcomes

2 CAUSAL EFFECTS ON LDVrsquos

21 Average Effects in Experimental Data

Does the fact that a dependent variable is binary or non-negative have any implications for empirical causal analysisA useful starting point for this discussion is the analysis ofrandomized experiments since some of the issues raised bythe presence of LDVrsquos have nothing to do with endogeneitySuppose that Di was randomly assigned or at least assignedby some mechanism that ensures independence between Di

and Y0i In this case a simple difference in means betweenthose with Di

D 1 and with DiD 0 identi es the effect of treat-

ment on the treated

E6YimdashDi

D 17 ƒ E6YimdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 07

D E6Y1imdashDi

D 17ƒ E6Y0imdashDi

D 17

4by independence of Y0i and Di5

D E6Y1iƒ Y0i

mdashDiD 17

4by linearity of conditional means50 (8)

If Di is also independent of Y1i as would be likely in an exper-iment then E6Y1i

ƒ Y0imdashDi

D 17 D E6Y1iƒ Y0i7 the uncondi-

tional average treatment effect (Usually this ldquounconditionalrdquoaverage still refers to a subpopulation eligible to participate inthe experiment)

Equation (8) shows that the estimation of causal effectsin experiments presents no special challenges whether Yi isbinary nonnegative or continuously distributed If Yi is binarythen the difference in means on the left side of (8) cor-responds to a difference in probabilities while if Yi has amass point at 0 the difference in means is the difference inE6Yi

mdashYi gt 01Di7P6Yi gt 0mdashDi7 But these facts have no bearingon the causal interpretation of estimates or in the absence offurther assumptions or restrictions the choice of estimators

22 Conditional-on-Positive Effects

In many studies with nonnegative dependent variablesresearchers are interested in effects in a subset of the popula-tion with positive outcomes Interest in conditional-on-positiveeffects is sometimes motivated by the following decompo-sition of differences in means discussed by McDonald andMof t (1980)

E6YimdashDi

D 17ƒ E6YimdashDi

D 07

D 8P6Yi gt 0mdashDiD 17ƒ P6Y gt 0mdashDi

D 079

E6YimdashYi gt 01Di

D 17

C 8E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 079

P6Y gt 0mdashDiD 070 (9)

6 Journal of Business amp Economic Statistics January 2001

This decomposition describes how much of the overalltreatment-control difference is due to participation effects (iethe impact on 16Yi gt 07) and how much is due to an increasein intensity for those with Yi gt 0 For a recent example ofthis distinction see Evans Farrelly and Montgomery (1999)who analyzed the impact of workplace smoking restrictionson smoking participation and intensity

In an experimental setting the interpretation of the rst partof (9) as giving the causal effect of treatment on participationis straightforward Does the conditional-on-positive differencein the second part also have a straightforward interpretationThe large literature constrasting two-part and sample-selectionmodels for LDVrsquos suggests not (See for example DuanManning Morris and Newhouse 1984 Hay and Olsen 1984Hay Leu and Roher 1987 Leung and Yu 1996 Maddala1985 Manning Duan and Rogers 1987 Mullahy 1998)

To analyze the conditional-on-positive comparison furtherit is useful to write the mean difference by treatment status asfollows (still assuming Y0i and Di are independent)

E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 17

D E6Y1iƒ Y0i

mdashY1i gt 01DiD 17 (10a)

C 8E6Y0imdashY1i gt 01Di

D 17ƒ E6Y0i gt 01DiD 1790

(10b)

On one hand (10a) suggests that the conditional contrastestimates a potentially interesting effect since this clearlyamounts to a statement about the impact of treatment on thedistribution of potential outcomes (in fact this is somethinglike a comparison of hazard rates) On the other hand from(10b) it is clear that a conditional-on-working comparisondoes not tell us how much of the overall treatment effect isdue to an increase in work among treated workers The prob-lem is that the conditional contrast involves different groups ofpeoplemdashthose with Y1i gt 0 and those with Y0i gt 0 Supposefor example that the treatment effect is a positive constantsay Y1i

D Y0iC Since the second term in (10b) must then be

negative the observed difference E6YimdashYi gt 07ƒ E6Yi

mdashYi gt 07is clearly less than the causal effect on treated workers whichis in the constant-effects model This is the selectivity-biasproblem rst noted by Gronau (1974) and Heckman (1974)

In principle tobit and sample-selection models can be usedto eliminate selectivity bias in conditional-on-positive com-parisons These models depict Yi as the censored observationof an underlying continuously distributed latent variable Sup-pose for example that

YiD 16Yi gt 07Y uuml

i 1 where

Y uumli

D Y uuml01

C 4Y uuml1i

ƒ Y uuml0i5Di

D Y uuml0i

C Di0 (11)

Recent studies with this type of censoring in a female labor-supply model are those of Blundell and Smith (1989) and Lee(1995) both of which include endogeneous regressors Notethat in this context the constant-effects causal model is appliedto the latent variable not the observed outcome

Under a variety of distributional assumptions [eg normal-ity as in Heckman (1974) or weaker assumptions like sym-metry as in Powell (1986a)] the parameter is identi edBut what is the interpretation of in a causal model Onepossible answer is that is the causal effect of Di on Y uuml

i 1

though Y uumli is not observed so this is not usually of intrinsic

interest However a direct calculation using (11) shows that

is also a conditional-on-positive causal effect (for details seethe appendix) In particular

D E6Y1iƒ Y0i

mdashY1i gt 07 if lt 0 (12a)

and

D E6Y1iƒ Y0i

mdashY1i gt 07 if gt 00 (12b)

Thus the censored-regression model does succeed in sepa-rating causal effects from selection effects in conditional-on-positive comparisons [We do not know a priori whether isthe effect in (12a) or (12b) but it seems reasonable to use thesign of the estimated to decide]

Although (11) provides an elegant resolution of theselection-bias dilemma in practice I nd the use of censored-regression models to accomplish this unattractive One prob-lem is conceptual Although a censored-regression modelseems natural for arti cially censored data (eg topcodedvariables in the Current Population Survey) the notion of alatent labor-supply equation that can take on negative values isless clear cut The censoring in this case comes about becausesome people choose to work zero hours and not because ofmeasurement problems [Maddala (1985) made a similar com-ment regarding Tobinrsquos original application] Here an under-lying structural model seems essential to the interpretationof empirical results For example in the labor-supply modelfrom Section 1 the censored latent variable is the differencebetween unobserved offered wages and marginal rates of sub-stitution But even assuming the effect of regressors on thisdifference is of interest the latent index coef cients alone haveno predictive value for observable quantities

Second even if we adopt a theoretical framework thatmakes the latent structure meaningful identi cation of acensored-regression model requires assumption beyond thoseneeded for indenti cation of unconditional causal effects ofthe type described by (8) Semiparametric estimators that donot rely on distributional assumptions fail here because theregressor is discrete and there are no exclusion restrictions onthe selection equation (see Chamberlain 1986) Moreover inaddition to requiring distributional assumptions identi cationof in (12) turns heavily on the additive constant-effectsmodel in (11) This is because E6Y1i

ƒY0imdashY1i gt 07 involves the

joint distribution of Y1 and Y0 The constant-effects assump-tion nails this joint distribution down but the data containinformation on marginal distributions only (which is why evenrandomized trials fail to answer the causal question that moti-vates samples-selection models)

These concerns echoed by many applied researchers stim-ulated investigation of alternative strategies for the analysisof nonnegative outcomes (eg see Mof trsquos 1999 survey)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 5: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

6 Journal of Business amp Economic Statistics January 2001

This decomposition describes how much of the overalltreatment-control difference is due to participation effects (iethe impact on 16Yi gt 07) and how much is due to an increasein intensity for those with Yi gt 0 For a recent example ofthis distinction see Evans Farrelly and Montgomery (1999)who analyzed the impact of workplace smoking restrictionson smoking participation and intensity

In an experimental setting the interpretation of the rst partof (9) as giving the causal effect of treatment on participationis straightforward Does the conditional-on-positive differencein the second part also have a straightforward interpretationThe large literature constrasting two-part and sample-selectionmodels for LDVrsquos suggests not (See for example DuanManning Morris and Newhouse 1984 Hay and Olsen 1984Hay Leu and Roher 1987 Leung and Yu 1996 Maddala1985 Manning Duan and Rogers 1987 Mullahy 1998)

To analyze the conditional-on-positive comparison furtherit is useful to write the mean difference by treatment status asfollows (still assuming Y0i and Di are independent)

E6YimdashYi gt 01Di

D 17ƒ E6YimdashYi gt 01 Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 07

D E6Y1imdashY1i gt 01Di

D 17ƒ E6Y0imdashY0i gt 01Di

D 17

D E6Y1iƒ Y0i

mdashY1i gt 01DiD 17 (10a)

C 8E6Y0imdashY1i gt 01Di

D 17ƒ E6Y0i gt 01DiD 1790

(10b)

On one hand (10a) suggests that the conditional contrastestimates a potentially interesting effect since this clearlyamounts to a statement about the impact of treatment on thedistribution of potential outcomes (in fact this is somethinglike a comparison of hazard rates) On the other hand from(10b) it is clear that a conditional-on-working comparisondoes not tell us how much of the overall treatment effect isdue to an increase in work among treated workers The prob-lem is that the conditional contrast involves different groups ofpeoplemdashthose with Y1i gt 0 and those with Y0i gt 0 Supposefor example that the treatment effect is a positive constantsay Y1i

D Y0iC Since the second term in (10b) must then be

negative the observed difference E6YimdashYi gt 07ƒ E6Yi

mdashYi gt 07is clearly less than the causal effect on treated workers whichis in the constant-effects model This is the selectivity-biasproblem rst noted by Gronau (1974) and Heckman (1974)

In principle tobit and sample-selection models can be usedto eliminate selectivity bias in conditional-on-positive com-parisons These models depict Yi as the censored observationof an underlying continuously distributed latent variable Sup-pose for example that

YiD 16Yi gt 07Y uuml

i 1 where

Y uumli

D Y uuml01

C 4Y uuml1i

ƒ Y uuml0i5Di

D Y uuml0i

C Di0 (11)

Recent studies with this type of censoring in a female labor-supply model are those of Blundell and Smith (1989) and Lee(1995) both of which include endogeneous regressors Notethat in this context the constant-effects causal model is appliedto the latent variable not the observed outcome

Under a variety of distributional assumptions [eg normal-ity as in Heckman (1974) or weaker assumptions like sym-metry as in Powell (1986a)] the parameter is identi edBut what is the interpretation of in a causal model Onepossible answer is that is the causal effect of Di on Y uuml

i 1

though Y uumli is not observed so this is not usually of intrinsic

interest However a direct calculation using (11) shows that

is also a conditional-on-positive causal effect (for details seethe appendix) In particular

D E6Y1iƒ Y0i

mdashY1i gt 07 if lt 0 (12a)

and

D E6Y1iƒ Y0i

mdashY1i gt 07 if gt 00 (12b)

Thus the censored-regression model does succeed in sepa-rating causal effects from selection effects in conditional-on-positive comparisons [We do not know a priori whether isthe effect in (12a) or (12b) but it seems reasonable to use thesign of the estimated to decide]

Although (11) provides an elegant resolution of theselection-bias dilemma in practice I nd the use of censored-regression models to accomplish this unattractive One prob-lem is conceptual Although a censored-regression modelseems natural for arti cially censored data (eg topcodedvariables in the Current Population Survey) the notion of alatent labor-supply equation that can take on negative values isless clear cut The censoring in this case comes about becausesome people choose to work zero hours and not because ofmeasurement problems [Maddala (1985) made a similar com-ment regarding Tobinrsquos original application] Here an under-lying structural model seems essential to the interpretationof empirical results For example in the labor-supply modelfrom Section 1 the censored latent variable is the differencebetween unobserved offered wages and marginal rates of sub-stitution But even assuming the effect of regressors on thisdifference is of interest the latent index coef cients alone haveno predictive value for observable quantities

Second even if we adopt a theoretical framework thatmakes the latent structure meaningful identi cation of acensored-regression model requires assumption beyond thoseneeded for indenti cation of unconditional causal effects ofthe type described by (8) Semiparametric estimators that donot rely on distributional assumptions fail here because theregressor is discrete and there are no exclusion restrictions onthe selection equation (see Chamberlain 1986) Moreover inaddition to requiring distributional assumptions identi cationof in (12) turns heavily on the additive constant-effectsmodel in (11) This is because E6Y1i

ƒY0imdashY1i gt 07 involves the

joint distribution of Y1 and Y0 The constant-effects assump-tion nails this joint distribution down but the data containinformation on marginal distributions only (which is why evenrandomized trials fail to answer the causal question that moti-vates samples-selection models)

These concerns echoed by many applied researchers stim-ulated investigation of alternative strategies for the analysisof nonnegative outcomes (eg see Mof trsquos 1999 survey)

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 6: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 7

The two-part model (2PM) introduced by Cragg (1971) andwidely used in health economics seems to provide a lessdemanding framework for the analysis of LDVrsquos than sample-selection models The two parts of the 2PM are P6Yi gt 0mdashDi7

and E6YimdashYi gt 01 Di7 Researchers using this model simply

pick a functional form for each part For example probit or alinear probability model might be used for the rst part anda linear or log-linear model might be used for the secondpart Log-linearity for the second part may be desirable sincethis imposed nonnegativity of tted values (eg see Mullahy1998)

One attraction of the 2PM is that it ts a nonlinear func-tional form to the conditional expectation function (CEF) forLDVrsquos even if both parts are linear On the other hand tobit-type sample-selection models t a nonlinear CEF as well pro-vided there are covariates other than Di For example the CEFimplied by (11) with latent-index Y uuml

0iD X

0

i Œ C Di C ˜i and aNormal homoscedastic error is

E6YimdashXi1Di7 D ecirc64X

0

iΠC Di5=lsquo76X0

i ΠC Di7

C lsquo64X0

i ΠC Di5=lsquo71 (13)

where 4iexcl5 and ecirc4iexcl5 are the standard Normal density anddistribution functions and lsquo is the standard deviation of ˜i

(eg see McDonald and Mof t 1980) The derivative of thisCEF with respect to Di is ecirc64X

0

iΠC Di5=lsquo7 The nonlinearity of (13) notwithstanding at rst blush the

2PM seems to provide a more exible nonlinear speci cationthan tobit or other sample-selection models The latter imposesrestrictions tied to the latent-index structure while the twoparts of the 2PM can be speci ed in whatever form seemsconvenient and ts the data well (a point made by Lin andSchmidt 1984) A signal feature of the 2PM however andthe main point of contrast with sample-selection models isthat the 2PM does not attempt to solve the sample-selectionproblem in (10b) Thus the second part of the 2PM does nothave a clear-cut causal interpretation even if Di is randomlyassigned Similarly and of particular relevance here is thepoint that instrumental variables that are valid for estimatingthe effect of Di on Yi are not valid for estimating the effect ofDi on Yi conditional on Yi gt 0 The 2PM therefore seems illsuited for causal inference

23 Effects on Distributions

Conditional-on-positive effects and sample-selection modelsare sometimes motivated by interest in the consequence of Di

beyond the impact on average outcomes [eg see EichnerMcClellan and Wisersquos (1997) analysis of insurance effects onhealth expenditure] Are there schemes for estimating effectson distributions that are less demanding than sample-selectionmodels Once the basic problem of identifying causal effectsis resolved the impact of Di on the distribution of outcomesis identi ed and can be easily estimated

To see this for the experimental (exogenous Di) case notethat given the assumed independence of Di of Y0i the follow-

ing relationship holds for any point c in the support of Yi

E614Yi micro c5mdashDiD 17 ƒ E614Yi micro c5mdashDi

D 07

D P6Y1i micro cmdashDiD 17 ƒ P6Y0i micro cmdashDi

D 170

In fact the entire marginal distributions of Y1i and Y0i are iden-ti ed for those with Di

D 1 So it is easy to check whether Di

has an impact on the probability that YiD 0 as in the rst part

of the 2PM or whether there is a change in the distribution ofoutcomes at any positive value This information is enough tomake social-welfare comparisons as long as the comparisonsof interest involve marginal distributions only

24 Covariates and Nonlinearity

The conditional expectation of Yi given Di is inherentlylinear as are other conditional relationships involving Di

alone Suppose however that identi cation is based on aldquoselection-on-observablesrdquo assumption instead of presumedrandom assignment This means that causal inference is basedon the presumption that Y0i

q DimdashXi1 and causal effects must

be estimated after conditioning on Xi For example the effectof treatment on the treated can be expressed as

E6Y1iƒ Y0i

mdashDiD 17

D E8E6Y1imdashXi1 Di

D 17 ƒ E6Y0imdashXi1Di

D 17mdashDiD 19

DZ

8E6Y1imdashXi1Di

D 17ƒ E6Y0imdashXi1Di

D 079

P4XiD xmdashD D 15dx1 (14)

where P4X D xmdashD D 15 is a distribution of density Estima-tion using the sample analog of (14) is straightforward if Xi

has discrete support with many observations per cell (eg seeAngrist 1998) Otherwise some sort of smoothing (model-ing) is required to estimate the CEFrsquos E6Y1i

mdashXi1 DiD 17 and

E6Y0imdashXi1 Di

D 07Regression provides a exible and computationally attrac-

tive smoothing device A conceptual justi cation for regres-sion smoothing is that population regression coef cientsprovide the best (minimum mean squared error) linear approx-imation to E6Yi

mdashXi1Di7 (eg see Goldberger 1991) Thisldquoapproximation propertyrdquo holds regardless of the distributionof Yi

Separate regressions can be used to approximateE6Y1i

mdashXi1DiD 17 and E6Y0i

mdashXi1DiD 17 though this leaves

the problem of estimating P4XiD xmdashD D 15 to compute the

average difference in CEFrsquos using (14) On the other handa simple additive modelmdashsay E6Yi

mdashXi1 Di7 D X0

i sbquorC r Dimdash

sometimes works well in the sense that r mdashthe ldquoregressionestimandrdquomdashis close to average effects derived from modelsthat allow for nonlinearity and interactions between Di andXi With discrete covariates and a saturated model for Xi theadditive model can be thought of as implicitly producing aweighted average of covariate-speci c contrasts Although theregression weighting scheme differs from that in (14) in prac-tice treatment-effect heterogeneity may be limited enough thatalternative weighting schemes have little impact on the over-all estimate (see Angrist and Krueger 1999 for more on thispoint)

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 7: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

8 Journal of Business amp Economic Statistics January 2001

3 ENDOGENOUS REGRESSORSTRADITIONAL SOLUTIONS

LDV models with endogenous regressors were rst esti-mated using distributional assumptions and maximum likeli-hood (ML) An early and in uential work in this mold isthat of Heckman (1978) This approach is not wedded toML Heckman (1978) Amemiya (1978 1979) Newey (19861987) and Blundell and Smith (1989) discussed two-stepprocedures minimum-distance estimators generalized leastsquares (GLS) estimators and other variations on the tradi-tional ML framework Semiparametric estimators based onweaker distributional assumptions were discussed by amongothers Newey (1985) and Lee (1996)

The basic idea behind these strategies can be described asfollows Take the censored-regression model from (11) andadd a latent rst stage with instrumental variables Zi So thecomplete model is

YiD 16Yi gt 07Y uuml

i

Y uumli

D Y uuml0i

C 4Y1iƒ Y uuml

0i5DiD Y uuml

0iC Di D Œ C Di C ˜i

DiD 16ƒ0 C ƒ1Zi gt Daggeri70 (15)

The principal identifying assumption here is

4Y0i1 Daggeri5 q Zi0 (16)

Parametric schemes use two-step estimators or ML to esti-mate Semiparametric estimators typically work by substi-tuting an estimated conditional expectation sup1E6Di

mdashZi7 for Di

and then using a nonparametric or semiparametric procedureto estimate the pseueo reduced-form [eg Manskirsquos (1975)maximum score estimator for binary outcomes]

The problems I see with this approach are the same as listedfor censored regression models with exogenous regressorsFirst latent index coef cients are not causal effects If the out-come is binary semiparametric methods estimate scaled indexcoef cients and not average causal effects Similarly censoredregression parameters alone are not enough to determine thecausal effect of Di on the observed Yi [ I should note that thiscriticism does not apply to parametric estimators where dis-tributional assumptions can be used to recover causal effectsor to a recently developed semiparametric method by Blundelland Powell (1999) for continuous endogenous variables] Sec-ond this approach turns heavily on the latent-index setup Wecan add to these points the fact that even weak distributionalassumptions like conditional symmetry fail for the reduced-form error term 4Di

ƒ sup1E6DimdashZi75 ƒ ˜i since Di is binary (a

point made by Lee 1996)A nal observation is that given assumption (16) this

whole setup is unnecessary for causal inference The effectof treatment on the observed Yi is identi ed for those womenwhose childbearing behavior is affected by the instrument Thetwins instrument for example identi es the effect of Di onmothers who would not have had a third child without a mul-tiple second birth (see result 4 in the earlier lemma) It maybe of interest to extrapolate from this grouprsquos experiences tothose of other women but the extrapolation problem is distinctfrom the problem of identifying the causal effect of childbear-ing in the ldquotwins experimentrdquo

4 NEW ECONOMETRIC METHODS

Conditional moments and other probability statementsinvolving Di alone are necessarily linear but causal relation-ships involving covariates are likely to be nonlinear unless thecovariates are discrete and the model is saturated LDV mod-els like probit and tobit are often used because of a concernthat unless the model is saturated LDVrsquos lead to nonlinearCEFrsquos The 2PM is sometimes also motivated this way (egsee Duan et al 1984)

The headaches induced by nonlinearity notwithstandingthere are simple schemes for estimating causal effects in LDVmodels with endogenous regressors and covariates In thissection I discuss three strategies for estimating effects onmeans and two for estimating effects on distributions All butthe rst are based on new models and methods None are tiedto an underlying structural model

The simplest option for estimating effects on means isundoubtedly to ldquopuntrdquo by using a linear constant-effectsmodel to describe the relationship of interest

E6Y0imdashXi7 D X 0

isbquo (17a)

and

Y1iD Y0i

C 0 (17b)

The assumptions lead a linear causal model

YiD X 0

isbquo C DiC ˜i1 (18)

easily estimated by 2SLSAlthough the constant-effects assumption is clearly unreal-

istic in practice more general estimation strategies often leadto similar average effects A second issue that arises in thissetting is that because Di is binary a nonlinear rst-stage suchas probit or logit may seem appropriate for 2SLS estimationof (18) But the resulting second-stage estimates are inconsis-tent unless the model for the rst-stage CEF is actually cor-rect On the other hand conventional 2SLS estimates usinga linear probability model are consistent whether or not the rst-stage CEF is linear So it is generally safer to use a linear rst-stage Alternatively consistent estimates can be obtainedby using a linear or nonlinear estimate of E6Di

mdashXi1 Zi7 asan instrument (This is the same as the plug-in- tted-valuesmethod when the rst-stage is linear) See Kelejian (1971) orHeckman (1978 pp 946ndash947) for a discussion of this pointand additional references ( It is also worth noting that a probit rst-stage cannot even be estimated for the twins instrumentbecause P6Di

D 1mdashXi1 ZiD 17 D 1 for twins)

41 IV for an Exponential Conditional Mean

A linear model like (18) is obviously unrealistic for binaryoutcomes and fails to incorporate natural restrictions on theCEF for nonnegative LDVrsquos This motivated Mullahy (1997)to estimate causal effects on nonnegative LDVrsquos using a multi-plicative model similar to that used by Wooldridge (1999) forpanel data The Mullahy (1997) model can be written in mynotation as follows Let Xi be a vector of observed covariates

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 8: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 9

as before and let mdashi be an unobserved covariate correlatedwith Di and Y0i The fact that this covariate is unobserved isthe reason we need to instrument

Let Zi be a candidate instrument Conditional on observedand unobserved covariates both treatment status and theinstrument are assumed to be independent of potential out-comes

Y0iq 4Di1Zi5 mdash Xi1mdashi (19a)

Moreover conditional on observed covariates the candidateinstrument is independent of the unobserved covariate

Ziq mdashi

mdash Xi1 (19b)

though mdashi and Di are not conditionally independent The CEFfor Y0i is constrained to be nonnegative using an exponentialmodel and (19a)

E6Y0imdashDi1 Zi1 Xi1mdashi7 D exp4X 0

isbquo C mdashi5

D mdashuumli exp4X 0

isbquo51 (19c)

where we also assume that the unobservable covariate hasbeen de ned so that E6mdash uuml

imdashXi7 D 1 (this is a normalization

because we can de ne mdash D ƒ16ldquo ƒ ln4E6eldquo mdashX757 where ldquo isunrestricted)

Finally the conditional-on-X-and-mdash average treatmenteffect is assumed to be proportionally constant again using anexponential model that ensures nonnegative tted values

E6Y1imdashDi1Zi1Xi1mdashi7 D e E6Y0i

mdashDi1Zi1Xi1mdashi7

D e E6Y0imdashXi1 mdashi70 (19d)

Combining (19c) and (19d) we can write YiD exp4X 0

isbquo CDi

C mdashi5 C ˜i where E6˜imdashDi1 Zi1 Xi1 mdashi7 sup2 0 These

assumptions imply

E8exp4ƒX 0isbquo ƒ Di5Yi

ƒ 1mdashXi1Zi9 D 01 (20)

so (20) can be used for estimation provided Zi has an impacton Di The proportional average treatment effect in this modelis e ƒ 1 or approximately for small values of

Estimation based on (20) guarantees nonnegative ttedvalues without dropping zeros as a traditional log-linearregression model would The price for this is a constant-proportional-effects setup and the need for nonlinear estima-tion It is interesting to note however that with a binaryinstrument and no covariates (20) generates a simple closed-form solution for In the appendix I show that with-out covariates the proportional treatment effect in Mullahyrsquosmodel can be written

e ƒ1D E6YimdashZi

D17ƒE6YimdashZi

D07

ƒ8E641ƒDi5YimdashZi

D17ƒE641ƒDi5YimdashZi

D0790

(21)

In light of this simpli cation it seems worth asking if theright side of (21) has an interpretation that is not tied tothe constant-proportional-effects model To develop this inter-pretation let D0i and D1i denote potential treatment assign-ments indexed against the binary instrument For example an

assignment mechanism such as (15) determines D0i as fol-lows D0i

D 16ƒ0 gt Daggeri7 and D1iD 16ƒ0 C ƒ1 gt Daggeri7 Using this

notation Imbens and Angrist (1994) showed that

E6YimdashZi

D 17 ƒ E6YimdashZi

D 07 D E6Y1iƒ Y0i

mdashD1i gt D0i7

8E6DimdashZi

D 17ƒ E6DimdashZi

D 0790 (22)

The term E6Y1iƒ Y0i

mdashD1i gt D0i7 is the LATE parameter men-tioned in Lemma 1

The same argument used to establish (22) can also be usedto show a similar result for the average of Y0i [ie insteadof the average of Y1i

ƒ Y0i see Abadie (2000a) for details] Inparticular

E641 ƒ Di5YimdashZi

D 17 ƒ E641 ƒ Di5YimdashZi

D 07

D ƒE6Y0imdashD1i gt D0i7

8E6DimdashZi

D 17 ƒ E6DimdashZi

D 0790 (23)

Substituting (22) and (23) for the numerator and denominatorin (21) we have

e ƒ 1 D E6Y1iƒ Y0i

mdashD1i gt D0i7=E6Y0imdashD1i gt D0i70 (24)

Thus Mullahyrsquos procedure estimates a proportional LATEparameter in models with no covariates The resulting esti-mates therefore have a causal interpretation under muchweaker assumptions than (19a)ndash(19d) Moreover the exponen-tial model used for covariates in (19c) seems natural for non-negative dependent variables and has a semiparametric avorsimilar to proportional hazard models for duration data

42 Approximating Causal Models

Suppose that the additive constant-effects assump-tions (17a) and (17b) do not hold but we estimate (18) by2SLS anyway It seems reasonable to imagine that the result-ing 2SLS estimates can be interpreted as providing some sortof ldquobest linear approximationrdquo to an underlying nonlinearcausal relationship just as regression provides the best linearpredictor (BLP) for any CEF Perhaps surprisingly however2SLS does not provide this sort of linear approximation ingeneral On the other hand in a recent article Abadie (2000b)introduced a Causal-IV estimator that does have this property

Causal-IV is based on the assumptions used by Imbens andAngrist (1994) to estimate average treatment effects Underthese assumptions it can be shown that treatment is indepen-dent of potential outcomes conditional on being in the groupwhose treatment status is affected by the instrument (ie thosewith D1i gt D0i the group of ldquocompliersrdquo mentioned earlier)This independence can be expressed as

Y0i1 Y1iq Di

mdashXi1D1i gt D0i0 (25)

A consequence of (25) is that for compliers comparisons bytreatment status have a causal interpretation

E6YimdashXi1Di

D 11D1i gt D0i7 ƒ E6YimdashXi1Di

D 01 D1i gt D0i7

D E6Y1iƒ Y0i

mdashXi1D1i gt D0i70

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 9: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

10 Journal of Business amp Economic Statistics January 2001

For this reason Abadie (2000b) called E6YimdashXi1 Di1D1i gt D0i7

the Complier Causal Response Function (CCRF)Now consider choosing parameters b and a to minimize

E64E6YimdashXi1Di1 D1i gt D0i7ƒX 0

i b ƒ aDi52mdashD1i gt D07 or equiv-

alently E64YiƒX 0

ib ƒ aDi52mdashD1i gt D07 This choice of b and a

provides the minimum mean squared error (MMSE) approx-imation to the CCRF Since the set of compliers is not iden-ti ed this minimization problem is not feasible as writtenHowever it can be shown that

E6Ši4E6YimdashXi1 Di1D1i gt D0i7 ƒ X 0

ib ƒ aDi527=P6D1i gt D0i7

D E64E6YimdashXi1Di1 D1i gt D0i7ƒ X 0

ib ƒ aDi52mdashD1i gt D071

where ŠiD 1 ƒ Di41 ƒ Zi5=41 ƒ E6Zi

mdashXi75 ƒ 41 ƒ Di5 šZi=E6Zi

mdashXi7 Since Ši can be estimated the MMSE linearapproximation to the CCRF can also be estimated The resultis a weighted least squares estimation problem with weightsgiven by the estimated Ši

It is also worth noting that although the preceding discus-sion focuses on linear approximation of the CCRF any func-tion can be used for the approximation For binary outcomesfor example we might use ecirc6X 0

ib C aDi7 and choose param-eters to minimize E6Ši4Yi

ƒ ecirc6X 0ib ƒ aDi75

27 Similarly fornonnegative outcomes it seems sensible to use an exponentialmodel exp6X 0

ib C aDi7 and choose parameters to minimizeE6Ši4Yi

ƒexp6Xi0b ƒaDi75

27 Abadiersquos framework allows ex-ible approximation of the CCRF using any functional formthe researcher nds appealing and convenient The resultingestimates have a robust causal interpretation regardless of theshape of the actual CEF for potential outcomes

43 Distribution and Quantile Treatment Effects

If Yi has a mass point at 0 the conditional mean pro-vides an incomplete picture of the causal impact of Di on Yi We might like to know for example how much of the aver-age effect is due to changes in participation and how muchinvolves changes elsewhere in the distribution This sometimesmotivates separate analyses of participation and conditional-on-positive effects In Section 23 I argued that questionsregarding the effect of treatment on the distribution of out-comes are better addressed by comparing distributions Simplycomparing distributions is ne for the analysis of experimentaldata but what if covariates are involved As with the analy-sis of mean outcomes the simplest strategy is 2SLS in thiscase using linear probability models for distribution ordinates16Yi micro c7 D Xi

0sbquocC cDi

C ˜ci Of course the linear model isnot literally correct for conditional distribution except in spe-cial cases (eg a saturated regression parameterization)

Here too the Abadie (2000b) weighting scheme can be usedto generate estimates that provide an MMSE error approxima-tion to the underlying distribution function (see Imbens andRubin 1997 for a related approach to this problem) The esti-mator in this case chooses bc and ac to minimize the sampleanalog of the following population minimand

E6Ši416Yi micro c7 ƒ X 0ibc

ƒ acDi5270 (26)

The resulting estimates provide the BLP for P6Yi microcmdashXi1Di1D1i gt D0i7 The latter quantity has a causal interpre-tation because

P6Yi micro cmdashXi1DiD 11D1i gt D0i7

ƒ P6Yi micro cmdashXi1 DiD 01 D1i gt D0i7

D P6Y1i micro cmdashXi1D1i gt D0i7

ƒ P6Y0i micro cmdashXi1D1i gt D0i70

Since the outcome is binary nonlinear models such asprobit or logit might also be used to approximate P6Yi microcmdashXi1Di1D1i gt D0i7 Likewise it is equally straightforward touse Abadiersquos weighting scheme to approximate the probabilitythat the outcome falls into an interval instead of the cumula-tive distribution function

An alternative to estimation based on (26) postulates a linearmodel for quantiles instead of distribution ordinates Conven-tional quantile regression (QR) models for exogenous regres-sors begin with a linear speci cation Qˆ6Yi

mdashXi1Di7 D X 0iŒˆ0

CŒˆ1Di The parameters 4Œˆ01Œˆ15 can be shown to minimizeE6ˆ4Yi

ƒ X 0im0 ƒ m1Di057 where ˆ4ui5 D ˆuC

iC 41ƒˆ5uƒ

i iscalled the ldquocheck functionrdquo (see Koenker and Basset 1978)This minimization is computationally straightforward since itcan be written as a linear programming problem

The analysis of quantiles has two advantages Firstquantiles like the median quartiles and deciles providebenchmarks that can be used to summarize and compareconditional distributions for different outcomes In contrastthe choice of c for the analysis of distribution ordinatesis application-specic Second since nonnegative LDVrsquos areoften (virtually) continuously distributed away from any masspoints linear models are likely to be more accurate for condi-tional quantiles than for conditional probabilities [For quan-tiles close to the censoring point Powellrsquos (1986b) censoredQR model may be more appropriate]

Abadie Angrist and Imbens (1998) developed a QRestimator for models with a binary endogeneous regres-sor Their quantile treatment effects (QTE) procedure beginswith a linear model for conditional quantiles for compliersQˆ6Yi

mdashXi1Di1 D1i gt D0i7 D X 0isbquoˆ

CˆDi The coef cient ˆ hasa causal interpretation because Di is independent of potentialoutcomes conditional on Xi and the event D1i gt D0i There-fore ˆ

D Qˆ6Y1imdashXi1 D1i gt D0i7 ƒ Qˆ6Y0i

mdashXi1 D1i gt D0i7 Inother words ˆ is the difference in ˆ quantiles for compliers

QTE parameters are estimated by minimizing a sampleanalog of the following weighted check-function minimandE6Šiˆ4Yi

ƒ X 0ib ƒ aDi57 As with the Causal-IV estimators

weighting by Ši transforms the conventional QR minimandinto a problem for compliers only For computational rea-sons however it is useful to rewrite this as E6 QŠiˆ4Yi

ƒX 0ib ƒ

aDi057 where QŠiD E6Ši

mdashXi1 Di1 Yi7 It is possible to show thatE6Ši

mdashXi1Di1 Yi7 D P6D1i gt D0imdashXi1Di1 Yi7 gt 0 This modi ed

estimation problem has a linear programming representationsimilar to conventional quantile regression since the weightsare positive Thus QTE estimates can be computed usingexisting QR software though this approach requires rst-step

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 10: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 11

estimation of QŠi In the following example I use the fact that

QŠiD E6Ši

mdashXi1 Di1 Yi7 D 1 ƒ Di41ƒ E6ZimdashYi1Di1Xi75

41ƒ E6ZimdashXi75

ƒ 41 ƒ Di5E6ZimdashYi Di1Xi7

E6ZimdashXi7

(27)

and estimate E6ZimdashYi1Di1Xi7 and E6Zi

mdashXi7 with a probit rststep Since QŠi is theoretically supposed to be positive negativeestimates of QŠi are set to 0

5 APPLICATION LABORndashSUPPLYCONSEQUENCES OF A THIRD CHILD

The estimation uses a sample of roughly 250000 marriedwomen aged 21ndash35 with at least two children drawn from the1980 Census 5 le About 53 of the women in this sampleworked in 1979 Overall (ie including zeros) women in thesample worked about 17 hours per week This can be seen inthe rst column of Table 1 which reports descriptive statis-tics and repeats some of the OLS and 2SLS estimates fromAngrist and Evans (1998) Roughly 38 of women in thissample had a third child an event indicated by the variableMorekids The OLS estimates in column (2) show that womenwith Morekids D 1 were about 17 percentage points less likelyto have worked in 1979 and worked about 6 hours fewer perweek than women with Morekids D 0 The covariates in thisregression are age age at rst birth a dummy for male rst-born a dummy for male secondborn and Black Hispanic andother race indicators

Table 1 also reports estimates of average effects computedusing nonlinear models still treating Morekids as exogenousThese average effects are approximations to effects of treat-ment on the treated evaluated using derivatives to simplifycomputations A detailed description of the average-effectscalculations appears in the appendix Probit estimates of theaverage impact on employment shown in column (3) arealmost identical to the OLS estimates Similarly the tobit esti-mate of the average effect of Di on hours worked shown incolumn (4) is ƒ6001 remarkably close to the OLS estimateof ƒ6002 (Note however that the tobit coefcient is ƒ1107)Column (5) of Table 1 reports estimated average effects froma two-part model in which both parts of the model are linear

Table 1 Descriptive Statistics and Baseline Results

Morekids exogenous Morekids endogenous

Nonlinear models Reduced forms2SLS

Dependent Mean OLS Probit Tobit 2PM (Morekids) (Depvar) effectvariable (1) (2) (3) (4) (5) (6) (7) (8)

Employment 0528 ƒ0167 ƒ0166 mdash mdash 0627 ƒ0055 ƒ0088(0499) (0002) (0002) (0003) (0011) (0017)

Hours worked 1607 ƒ6002 mdash ƒ6001 ƒ5097 0627 ƒ2023 ƒ3055(1803) (0074) (0073) (0073) (0003) (0371) (0592)

NOTE The sample includes 254654 observations and is the same as that of Angrist and Evans (1998) The instrument is an indicator for multiple births The mean of the endogenousregressor is 381 The probability of a multiple birth is 008 The model includes as covariates age age at rst birth boy rst boy second and race indicators Standard deviations are shownin parentheses in column 1 Standard errors are shown in parentheses in other columns

The 2PM estimate is virtually identical to the tobit and OLSestimates

Roughly 810 of 1 of women in the extract had a twinsecond birth (multiple births are identi ed in the 1980 Cen-sus using age and quarter of birth) Reduced-form estimates ofthe effect of a twin birth are reported in columns (6) and (7)The reduced forms show that women who had a multiple birthwere 63 percentage points more likely to have had a third childthan women who had a singleton second birth Mothers oftwins were also 55 percentage points less likely to be work-ing (standard error D 001) and worked 22 fewer hours perweek (standard error D 037) The 2SLS estimates derived fromthese reduced forms reported in column 8 show an impactof about ƒ009 (standard error D 002) on employment rates andƒ306 (standard error D 06) on weekly hours These estimatesare just over half as large as the OLS estimates suggestingthat the latter exaggerate the causal effects of childbearingOf course the twins instrument is not perfect and the 2SLSestimates may also be biased For example twinning probabil-ities are slightly higher for certain demographic groups ButAngrist and Evans (1998) found that 2SLS estimates usingtwins instruments are largely insensitive to the inclusion ofcontrols for mothersrsquo personal characteristics

Two variations on linear models generate estimates iden-tical or almost identical to the conventional 2SLS estimatesThis can be seen in columns (2) and (3) of Table 2 whichreport 2SLS estimates of a 2PM for hours worked and Abadie(2000b) Causal-IV estimates of linear models for employmentand hours The Causal-IV estimates use probit to estimateE[Z|X] and plug this into the formula for Ši The second stepin Causal-IV estimation is a weighted least squares problemwith some negative weights Since use of negative weights isnonstandard for statistical packages (eg Stata does not cur-rently allow this) I used a MATLAB program available fromAlberto Abadie to compute the estimates The 2PM estimatesin Table 3 were constructed from 2SLS estimates of a linearprobability model for participation and 2SLS estimates of alinear model for hours worked conditional on working In prin-ciple the 2PM estimates do not have a causal interpretationbecause the instruments are not valid conditional on workingIn practice however 2SLS estimates of the 2PM differ littlefrom conventional 2SLS estimates

The estimates of nonlinear=nonstructural models for meaneffects are mostly similar to each other and to conventional

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 11: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

12 Journal of Business amp Economic Statistics January 2001

Table 2 Impact on Mean Outcomes (Morekids endogenous)

Linear models Nonlinear models Structural models

Causal-IV Causal-IV Causal-IV Bivar Endog Mills 2SLSDependent 2SLS 2PM linear Mullahy probit expon probit tobit ratio benchmarkvariable (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

A With covariates

Employment ƒ0088 mdash ƒ0089 mdash ƒ0088 mdash ƒ0124 mdash mdash ƒ0089(0017) (0017) (0016) (0016) (0017)

Hours worked ƒ3055 ƒ3054 ƒ3055 ƒ3082 mdash ƒ3021 mdash ƒ3081 ƒ4051 ƒ3060(0592) (0598) (0592) (0598) (0694) (0580) (0549) (0599)

B No covariates

Employment ƒ0084 mdash ƒ0084 mdash ƒ0084 mdash ƒ0086 mdash mdash ƒ0084(0017) (0017) (0017) (0017) (0018)

Hours worked ƒ3047 ƒ3037 ƒ3047 ƒ3010 mdash ƒ3012 mdash ƒ3035 ƒ3048 ƒ3052(0617) (0614) (0617) (0561) (0616) (0642) (0641) (0624)

NOTE Sample and covariates are the same as in Table 1 Results for nonlinear models are derivative-based approximations to effect on the treated Causal-IV estimates are based on aprocedure discussed by Abadie (2000b) Standard errors are shown in parentheses

2SLS estimates Results from nonlinear models are againreported as marginal effects that approximate average effectson the treated For example a causal probit model for employ-ment status generates an average effect of ƒ0088 identical (upto the reported accuracy) to the 2SLS estimate Causal-IV esti-mation of an exponential model for hours worked the resultof a procedure that minimizes E6 QŠi4Yi

ƒ exp6X0

i b ƒ aDi7527

generates an estimate of ƒ3021 This too differs little from theconventional 2SLS estimate of ƒ3055 Similarly the Mullahyestimate of ƒ3082 in column (4) is less than 8 larger thanconventional 2SLS in absolute value It is noteworthy how-ever that the Mullahy model generates results that changemarkedly (falling by about 20) when the covariates aredropped Since the covariates are not highly correlated withthe twins instrument this lack of robustness to the inclusionof covariates seems undesirable On the other hand with-

Table 3 Impact on the Distribution of Hours Worked

Distribution treatment effects

Exogenous Morekids Endogenous Morekids Quantile treatment effects

OLS Ordered Causal CausalRange (LPM) Probit probit 2SLS LPM probit Quantile QR QTE(mean) (1) (2) (3) (4) (5) (6) (value) (7) (8)

0 0167 0166 0147 0088 0089 0088 5 ƒ8092 ƒ5024(472) (0002) (0002) (0002) (0017) (0017) (0016) (8) (0186) (0686)1ndash10 0001 0001 0002 ƒ0001 ƒ0001 ƒ0002 6 ƒ1207 ƒ7098(046) (0001) (0001) (000003) (0007) (0007) (0006) (20) (0172) (0860)11ndash20 ƒ0015 ƒ0014 ƒ0011 0002 0002 0002 7 ƒ9054 ƒ6019(093) (0001) (0001) (00001) (0010) (0010) (0010) (35) (0184) (1007)21ndash30 ƒ0024 ƒ0022 ƒ0014 ƒ0004 ƒ0005 ƒ0006 75 ƒ6045 ƒ3060(075) (0001) (0001) (00002) (0009) (0009) (0010) (40) (0156) (1017)31ndash40 ƒ0119 ƒ0110 ƒ0097 ƒ0072 ƒ0072 ƒ0071 8 ƒ1000 0000(277) (0002) (0002) (0001) (0014) (0014) (0016) (40) (0286) (1009)41+ ƒ0009 ƒ0008 ƒ0023 ƒ0009 ƒ0009 ƒ0006 9 mdash mdash

(027) (0001) (0001) (00003) (0005) (0005) (0007) (40)

NOTE The table reports probability-model and (QTE) estimates of the impact of childbearing on the distribution of hours worked The sample and covariates are the same as in Panel A ofTable 2 Causal-IV estimates are based on a procedure discussed by Abadie (1999) Quantile treatments are based on a procedure discussed by Abadie et al (1998) Standard errors areshown in parentheses The standard errors in columns (7) and (8) are bootstrapped

out covariates the Mullahy estimates are close to those fromCausal-IV with an exponential model

The bivariate probit estimate of the effect of childbear-ing on employment status reported in column 7 is ƒ012roughly a third larger in absolute value than the conventional2SLS estimate Interestingly in another application Abadie(2000b) also found that bivariate probit estimates are largerthan Causal-IV The gap between bivariate probit and Causal-IV estimates of effects on labor supply appears to be a conse-quence of the probit model for exogenous covariates Withoutcovariates bivariate probit generates estimates that are veryclose to the results from the other estimators It should benoted however that bivariate probit is not really appropriatefor twins instruments because the probability Morekids D 1 isequal to 1 for twins The probit ML estimator does not existin this case Therefore to compute all of the estimates using a

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 12: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 13

probit rst-stage (bivariate probit endogenous tobit and Millsratio) I randomly recoded 1 of Di observations to 0 Inprinciple measurement error in a binary endogenous regressorbiases 2SLS estimates (see Kane Rouse and Staiger 1999)In this case however column (10) shows that 2SLS estimateswith the randomly recoded data differ little from 2SLS esti-mates using the original data

Column (8) of Table 2 reports estimates of a structuraltobit model with endogeneous regressors These estimateswere computed using a two-step estimator that approximatesthe MLE again with recoded data The two-step procedureadds a Mills-ratio type endogeneity correction to a censoredregression and then applies tobit to the censored regres-sion model with the correction term [The correction term islsquo˜6Di4ƒ i=ecirci5C 41ƒDi5 i=41ƒecirci57 where is the corre-lation between the latent error determining treatment assign-ment and the outcome residual lsquo˜ is the standard deviation ofthe outcome residual and i and ecirci are Normal density anddistribution functions evaluated at the probit rst-stage ttedvalues see Heckman and Robb (1985)]

As with the bivariate probit estimate the endogenous tobitestimate is somewhat larger in magnitude than conventional2SLS This may be because of the Mills-ratio procedure forcontrolling for the endogeneity of Di more than the tobitcorrection for nonnegative outcomes To see this note thatthe Mills-ratio estimate ignoring censoring reported in col-umn (9) is considerably larger than the corresponding conven-tional 2SLS estimates Interestingly both the probit and tobitstructural estimators generate results that are more sensitiveto the inclusion of covariates than any of the other estimatorsexcept Mullahyrsquos In fact Panel B of the table shows thatwithout covariates all estimation techniques give very simi-lar results This is not surprising since without covariates theparametric assumptions used in these models are weaker

The last set of results shows that childbearing is associ-ated with marked changes in the distribution of hours workedbeyond the changes in participation already seen This isapparent in the rst columns of Table 3 which report thedistribution of hours worked by interval along with linearprobability estimates of the relationship between childbearingand the probability of falling into each interval (these mod-els include the same covariates used for Table 2) The largestentry in column (1) is for the probability of working zerohours There is also a large negative effect on the probabilityof working 31ndash40 hours per week which shows that womenwho have a third child are much less likely to work full-time Once again probit average effects reported in column2 are almost indistinguishable from the corresponding OLSestimates Estimates from an ordered probit model reportedin column (3) differ from OLS somewhat more but still gen-erate a very similar pattern

Like the 2SLS estimates for average outcomes 2SLS esti-mates of linear probability models for the probability of fallinginto each interval show that models that treat childbearing asexogenous exaggerate the negative impact on labor supply2SLS estimates for the probability of working zero hours areidentical (by construction) to those for employment in Table 1The 2SLS estimates of the impact of childbearing on fulltimework are also considerably less than the corresponding OLSestimates

Nonstructural Causal-IV models treating Morekids asendogenous generate estimates very close to 2SLS estimatesfor effects on the probabilities of hours falling into each inter-val Columns (5) and (6) show that the results are also remark-ably insensitive to whether a linear or probit model is usedto approximate the distribution function The estimates againindicate that childbearing changes the distribution of hours byraising the probability of nonparticipation and by reducing theprobability of fulltime work

Finally the QTE estimator provides useful summary statis-tics for causal effects of childbearing on changes in the distri-bution of hours worked These estimates were computed as thesolution to a weighted quantile regression problem using (27)to construct weights and the reported standard errors are froma bootstrap Quantile regression estimates treating childbear-ing as exogenous show an estimated 9-hour decline in medianhours worked but the QTE estimator suggests that the causaleffect of childbearing on median hours worked is only about ve hours Estimates at higher quantiles are similarly reducedwhen childbearing is treated as endogenous

6 SUMMARY AND CONCLUSIONS

Structural parameters may be of theoretical interest but mustultimately be converted into causal effects if they are to beof use for policy evaluation or determining whether a trendassociation is causal The problem of estimating causal effectsfor LDVrsquos does not differ fundamentally from the analogousproblem for continuously distributed outcomes The key dif-ferences seem to me to be the increased likelihood of inter-est in distributional outcomes and the inherent nonlinearity ofCEFrsquos for LDVrsquos in models with covariates Without covari-ates conventional 2SLS estimates capture both distributionaleffects and effects on means Simple IV strategies devel-oped by Mullahy (1997) and Abadie (2000b) can be used toestimate average effects in nonlinear models with covariateswhile IV strategies for probability models and quantile regres-sion can be used to estimate effects on distributions

These approaches are illustrated here using twin birthsto estimate the labor-supply consequences of childbearingAlternative nonstructural approaches to the estimation ofcausal effects using twins instruments generate similar aver-age effects whether or not the model is nonlinear Structuralestimates tend to be somewhat larger than nonstructural esti-mates when exogenous covariates are included even thoughthe covariates are not strongly related to the twins instru-ment Since the structural models impose additional distribu-tional and functional form assumptions I see no reason toprefer them

Finally the various IV estimates of the effect of childbear-ing on the distribution of hours worked show that the impactof childbearing is characterized by substantially increased non-participation and by an almost equally large shift away fromfulltime work Estimates that treat childbearing as exogenousexaggerate the causal effect of childbearing on changes in dis-tribution as well as on average hours worked These ndingsappear in results from both probability models and quantilemodels with endogenous regressors Both types of modelsprovide an interesting look at the impact of childbearing on

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 13: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

14 Journal of Business amp Economic Statistics January 2001

the distribution of hours worked without the conceptual prob-lems inherent in conditional-on-positive comparisons

ACKNOWLEDGMENTS

I thank the editor and the session discussants for theircomments Special thanks go to Melissa Schettini for out-standing research assistance and Alberto Abadie for exten-sive comments and shared computer programs Thanks also goto Daron Acemoglu Bill Evans Bill Greene Guido ImbensJohn Mullahy Steve Pischke and seminar participants at theUniversity of Texas for helpful discussions and comments onan earlier draft Data and computer programs used here areavailable on request

APPENDIX

A1 Derivation of Equation (12)

Drop the i subscripts Note that

Y0 D 14Y uuml0 gt 05Y uuml

0

Y1D 14Y uuml

0C gt 054Y uuml

0C 5 D 14Y uuml

1 gt 05Y uuml1 0

Since D is independent of Y0 and Y1D 14Y uuml

0C gt 054Y uuml

0C

51D is independent of Y1 Conditional effects on the treatedare therefore the same as conditional effects without condi-tioning on treatment status

E6Y1ƒ Y0

mdashY1 gt 01 D D 17 D E6Y1ƒ Y0

mdashY1 gt 07

E6Y1ƒ Y0

mdashY0 gt 01 D D 17 D E6Y1ƒ Y0

mdashY0 gt 070

The averages on the right side are the causal effects of interestThe conditional expectations on the right side are evaluated asfollows

E6Y1mdashY1 gt 07 D E6Y uuml

1mdashY uuml

1 gt 07 D E6Y uuml0mdashY uuml

1 gt 07C 1 (A1)

E6Y0mdashY1 gt 07 D E6Y uuml0 14Y uuml

0 gt 05mdashY uuml1 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml0 gt 0mdashY uuml

1 gt 050

If 4 gt 05 2 Y uuml0 gt 0 ) Y uuml

1 gt 01 this is E6Y uuml0

mdashY uuml0 gt 07

P4Y uuml0 gt 0mdashY uuml

1 gt 050 If 4 lt 05 2 Y uuml1 gt 0 ) Y uuml

0 gt 01 so this is

E6Y uuml0

mdashY uuml1 gt 070 (A2)

So lt 0 ) E6Y1 ƒ Y0mdashY1 gt 07 D Similarly

E6Y1mdashY0 gt 07 D E614Y uuml

0C gt 054Y uuml

0C 5mdashY uuml

0 gt 07

D E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07P4Y uuml1 gt 0mdashY uuml

0 gt 05

C P Y uuml1 gt 0mdashY uuml

0 gt 0 0

If 4 gt 051 Y uuml0 gt 0 ) Y uuml

1 gt 01 so

P4Y uuml1 gt 0mdashY uuml

0 gt 05 D 1 (A3)

and E6Y uuml0

mdashY uuml1 gt 01 Y uuml

0 gt 07 D E6Y uuml0mdashY uuml

0 gt 070 Finally

E6Y0mdashY0 gt 07 D E6Y uuml0mdashY uuml

0 gt 07 (A4)

so gt 0 ) E6Y1ƒ Y0

mdashY0 gt 07 D

A2 Derivation of Equation (21)

Drop the i subscripts Note that eƒD D 641 ƒ D5 C Deƒ7Let sbquo uuml D eƒsbquo and uuml D eƒ Z is binary and there are nocovariates so we can now write (20) as

sbquo uuml E641ƒ D5Y mdashZ D 17C sbquo uuml uuml E6DY mdashZ D 17 ƒ 1 D 0 (A5)

and

sbquo uuml E641ƒ D5Y mdashZ D 07C sbquo uuml uuml E6DY mdashZ D 07 ƒ 1 D 00 (A6)

Divided (A5) by (A6) to get rid of sbquo uuml then solve for uuml Subtract 1 and rearrange to get Equation (21)

A3 Average Treatment Effects and Standard Errors forNonlinear Models

Average treatment effects were calculated with the aid ofderivative approximations so that all reported effects have theform ldquocoef cient times scaling factorrdquo

Probit Note that ecirc6X 0i sbquo C 7 ƒ ecirc6X 0

isbquo7 6X 0isbquo C Di7 š

1 so the average effect on the treated can be approximatedas 841=N15

Pi Di6X 0

isbquo C Di79 š 1 where N1 D Pi Di This

turns out to be accurate to three decimal places for the esti-mates in Table 1 Standard errors were calculated treating thescaling factor as nonrandom This follows the convention forreporting marginal effects in programs like Stata in practiceany correction for estimation of the scaling factor is likely tobe minor A similar approach was used for ordered probit

Tobit Tobit average treatment effects were approximatedusing a derivative formula that can be found for exam-ple in the work of Greene (1999) E6Yi

mdashXi1DiD 17 ƒ

E6YimdashXi1 Di

D 07 iexclE6YimdashXi1Di7=iexclD D ecirc6X 0

isbquo C Di7 š 0

Average effects on the treated can therefore be approximatedusing 841=N15

Pi Diecirc6X 0

isbquo C Di79 š 0 Standard errors wereagain calculated treating the scaling factor as nonrandom

2PM Let the part-1 coef cient be 1 and the part-2 coef -cient be 2 Since the model multiplies parts and both parts arelinear the average effect is approximated using derivatives as1E6Yi

mdashDiD 11 Yi gt 07 C 2P6Di

D 1mdashY gt 070 Standard errorswere calculated treating the scaling factors E6Y mdashDi

D 11 Yi gt 07

and P6DiD 1mdashY gt 07 as nonrandom and using the fact that

the estimates of 1 and 2 are uncorrelatedMullahy Note that E6Yi

mdashXi1Di1mdash uumli 7 D mdash uuml

i exp6X 0i sbquoC Di7

where this CEF has a causal interpretation Again usingderivatives we have mdashuuml

i exp6X 0isbquo C 7 ƒ mdashuuml

i exp6X 0isbquo7

mdash uumli exp6X 0

i sbquo C Di7 š The model is such that E6mdash uumlimdashXi7

equals 1 but E6mdash uumlimdashXi1Di7 is unrestricted I ignore this prob-

lem and approximate the average effect on the treated as841=N15

Pi Di exp6X 0

i sbquo C Di79 š Standard errors were cal-culated treating the scaling factor as nonrandom

Bivariate Probit This is the same as probit using param-eters from the latent index equation for outcomes

Endogenous Tobit This is the same as tobit but usingcoef cients and predicted probability positive from the modelwith the compound Mills-ratio term included

Mills Ratio Standard errors were calculated treating thecompound Mills-ratio term as known

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 14: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Estimation of Limited Dependent Variables Models With Dummy Endogenous Regressors 15

Causal-IV (nonlinear) Average effects were calculated asdescribed previously for the probit and Mullahy (exponential)functional form The rst-stage estimates of E6Zi

mdashXi7 neededto construct Ši were estimated using probit Standard errorsfor were calculated using the same bootstrap proceduredescribed for QTE Abadie (2000b) also gave analytic formu-las that take account of the rst-step estimation of E6Zi

mdashXi7An assumption implicit in this scheme for reporting Causal-IV results is that it makes sense to convert conditional-on-Xeffects for compliers into overall average effects on the treated

A4 Computation of Quantile Treatment Effectsand Standard Errors

QTErsquos were computed by plugging rst-step estimatesof QŠi

D E6ŠimdashXi1Di1 Yi7 into a weighted quantile regression

calculation performed by Stata Nonnegative estimates ofE6Ši

mdashXi1 Di1 Yi7 were constructed by separately estimatingE6Zi

mdashXi7 and E6ZimdashYi1Di1Xi7 using probit and then trimming

In principle standard errors should take account of this rst-step estimation An additional complication is that the analyticstandard errors for QTE involve a conditional error density Inthis case I sidestepped messy analytic calculations by usinga bootstrap procedure that repeats both the rst-stage estima-tion of QŠi and the second-step estimation of the parameters ofinterest in 100 replicate samples of 2500 observations eachThe 100 replicate samples were sampled without replacementusing the Stata command bsample The reported standarderrors were calculated as 4N =N uuml 51=2bseN where bseN is thestandard deviation of the 100 replicate estimates N D 21500and N uuml is the full sample size

[Received October 1999 Revised July 2000]

REFERENCES

Abadie A (2000a) ldquoBootstrap Tests for the Effect of a Treatment on the Dis-tributions of an Outcome Variablerdquo Technical Working Paper 261 NationalBureau of Economic Research Cambridge MA

mdashmdashmdash (2000b) ldquoSemiparametric Estimation of Instrumental Variable Mod-els for Causal Effectsrdquo Technical Working Paper 260 National Bureau ofEconomic Research Cambridge MA

Abadie A Angrist J and Imbens G (1998) ldquo Instrumental VariablesEstimation of Quantile Treatment Effectsrdquo Working Paper 229 NationalBureau of Economic Research Cambridge MA

Amemiya T (1978) ldquoThe Estimation of a Simultaneous Equation General-ized Probit Modelrdquo Econometrica 46 1193ndash1205

mdashmdashmdash (1979) ldquoThe Estimation of a Simultaneous Equation Tobit ModelrdquoInternational Economic Review 20 169ndash181

Angrist J (1998) ldquoEstimating the Labor Market Impact of Voluntary MilitaryService Using Social Security Data on Military Applicationsrdquo Economet-rica 66 249ndash288

Angrist J and Evans W (1998) ldquoChildren and Their Parentsrdquo Labor SupplyEvidence From Exogenous Variation in Family Sizerdquo American EconomicReview 88 450ndash477

Angrist J and Imbens G (1991) ldquoSources of Identifying Information inEvaluation Modelsrdquo Technical Working Paper 117 National Bureau ofEconomic Research Cambridge MA

Angrist J Imbens G and Rubin B (1996) ldquo Identi cation of CausalEffects Using Instrumental Variablesrdquo Journal of the American StatisticalAssociation 91 444ndash455

Angrist J and Krueger A (1999) ldquoEmpirical Strategies in Labor Eco-nomicsrdquo in The Handbook of Labor Economics Volume IIIA edsO Ashenfelter and D Card Amsterdam North-Holland pp 1277ndash1366

Blundell R W and Smith R J (1989) ldquoEstimation in a Class of Simulta-neous Equation Limited Dependent Variable Modelsrdquo Review of EconomicStudies 56 37ndash58

Blundell R W and Powell J L (1999) ldquoEndogeneity in Single Index Mod-elsrdquo mimeo University College London Dept of Economics

Bronars S and Grogger J (1994) ldquoThe Economic Consequences of UnwedMotherhood Using Twin Births as a Natural Experimentrdquo American Eco-nomic Review 84 1141ndash1156

Chamberlain G (1986) ldquoAsymptotic Ef ciency in Semi-Parametric ModelsWith Censoringrdquo Journal of Econometrics 32 189ndash218

Cragg J G (1971) ldquoSome Statistical Models for Limited Dependent Vari-ables With Application to the Demand for Durable Goodsrdquo Econometrica39 829ndash844

Duan N Manning W G Jr Morris C N and Newhouse J P(1984) ldquoChoosing Between the Sample-Selection Model and the Multi-partModelrdquo Journal of Business amp Economic Statistics 2 283ndash289

Eichner M J McClellan M B and Wise D A (1997) ldquoHealth Expendi-ture Persistence and Feasibility of Medical Savings Accountsrdquo in Tax Pol-icy and the Economy 11 ed J Poterba Cambridge MA National Bureauof Economic Research pp 91ndash128

Evans W Farrelly M C and Montgomery E (1999) ldquoDo Work-place Smoking Bans Reduce Smokingrdquo American Economic Review 89728ndash747

Gangadharan J and Rosenbloom J L (1996) ldquoThe Effects of Child-bearingon Married Womenrsquos Labor Supply and Earnings Using Twin Births as aNatural Experimentrdquo Working Paper 5647 National Bureau of EconomicResearch Cambridge MA

Goldberger A S (1991) A Course in Econometrics Cambridge MAHarvard University Press

Greene W (1999) ldquoMarginal Effects in the Censored Regression ModelrdquoEconomics Letters 64 43ndash49

Gronau R (1974) ldquoWage ComparisonsmdashA Selectivity Biasrdquo Journal ofPolitical Economy 82 1119ndash1143

Hay J W Leu R and Rohrer P (1987) ldquoOrdinary Least Squares andSample-Selection Models of Health-care Demandrdquo Journal of Business ampEconomic Statistics 5 499ndash506

Hay J W and Olsen R J (1984) ldquoLet Them Eat Cake A Note on Com-paring Alternative Models of the Demand for Medical Carerdquo Journal ofBusiness amp Economic Statistics 2 279ndash282

Heckman J J (1974) ldquoShadow Prices Market Wages and Labor SupplyrdquoEconometrica 42 679ndash694

mdashmdashmdash (1978) ldquoDummy Endogenous Variables in a Simultaneous EquationsSystemrdquo Econometrica 46 931ndash960

mdashmdashmdash (1990) ldquoVarieties of Selection Biasrdquo American Economic Review 80313ndash318

Heckman J J and Robb R Jr (1985) ldquoAlternative Methods for Evaluatingthe Impact of Interventionsrdquo in Longitudinal Analysis of Labor MarketData (Econometric Society Monographs Series No 10) eds J Heckmanand B Singer Cambridge UK Cambridge University Press pp 156ndash246

Imbens G and Angrist J D (1994) ldquo Identi cation and Estimation of LocalAverage Treatment Effectsrdquo Econometrica 62 467ndash475

Imbens G and Rubin D B (1997) ldquoEstimating Outcome Distributions forCompliers in Instrumental Variables Modelsrdquo Review of Economic Studies64 555ndash574

Kane T J Rouse C E and Staiger D (1999) ldquoEstimating Returnsto Schooling When Schooling Is Misreportedrdquo Working Paper W7235National Bureau of Economic Research Cambridge MA

Kelejian H H (1971) ldquoTwo-Stage Least Squares and Econometric SystemsLinear in Parameters but Nonlinear in the Endogenous Variablesrdquo Journalof the American Statistical Association 66 373ndash374

Koenker R and Bassett G (1978) ldquoRegression Quantilesrdquo Econometrica46 33ndash50

Killingsworth M (1983) Labor Supply Cambridge UK Cambridge Uni-versity Press

Keane M P and Wolpin K (1997) ldquo Introduction to the JBES Special Issueon Structural Estimation in Applied Microeconomicsrdquo Journal of Businessamp Economic Statistics 15 111ndash114

Lee M J (1995) ldquoSemiparametric Estimation of Simultaneous EquationsWith Limited Dependent Variables A Case Study of Female Labor Sup-plyrdquo Journal of Applied Econometrics 10 187ndash200

mdashmdashmdash (1996) ldquoNonparametric Two-Stage Estimation of Simultaneous Equa-tions With Limited Endogenous Regressorsrdquo Econometric Theory 12305ndash330

Leung S F and Yu S (1996) ldquoOn the Choice Between Sample Selectionand Two-Part Modelsrdquo Journal of Econometrics 72 197ndash229

Lin T-F and Schmidt P (1984) ldquoA Test of the Tobit Speci cation Againstan Alternative Suggested by Craggrdquo Review of Economics and Statistics66 174ndash177

Maddala G S (1985) ldquoA Survey of the Literature on Selectivity Bias AsIt Pertains to Health Care Marketsrdquo Advances in Health Economics andHealth Service Research 6 3ndash18

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 15: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

16 Journal of Business amp Economic Statistics January 2001

REFERENCES

Manning W G Duan N and Rogers W H (1987) ldquoMonte Carlo Evidenceon the Choice Between Sample Selection and Two-part Modelsrdquo Journalof Econometrics 35 59ndash82

Manski C F (1975) ldquoMaximum Score Estimation of the Stochastic UtilityModelrdquo Journal of Econometrics 3 205ndash228

mdashmdashmdash (1996) ldquoLearning About Treatment Effects From Experiments WithRandom Assignment of Treatmentsrdquo Journal of Human Resources 31709ndash733

McDonald J F and Mof tt R A (1980) ldquoThe Uses of Tobit AnalysisrdquoThe Review of Economics and Statistics 62 318ndash321

Mof t R A (1999) ldquoNew Developments in Econometric Methods forLabor Market Analysisrdquo in The Handbook of Labor Economics Vol-ume IIIA eds O Ashenfelter and D Card Amsterdam North-Hollandpp 1367ndash1398

Mullahy J (1997) ldquo Instrumental-Variable Estimation of Count Data ModelsApplications to Models of Cigarette Smoking Behaviourrdquo Review of Eco-nomics and Statistics 11 586ndash593

mdashmdashmdash (1998) ldquoMuch Ado About Two Reconsidering Retransformation andthe Two-Part Model in Health Econometricsrdquo Journal of Health Eco-nomics 17 247ndash281

Newey W K (1985) ldquoSemiparametric Estimation of Limited DependentVariable Models With Endogenous Explanatory Variablesrdquo Annales deLrsquo Insee 5960 219ndash237

mdashmdashmdash (1986) ldquoLinear Instrumental Variable Estimation of Limited Depen-dent Variable Models With Endogenous Explanatory Variablesrdquo Journal ofEconometrics 32 127ndash141

mdashmdashmdash (1987) ldquoEf cient Estimation of Limited Dependent Variable ModelsWith Endogenous Explanatory Variablesrdquo Journal of Econometrics 36231ndash250

Powell J L (1986a) ldquoSymmetrically Trimmed Least Squares Estimation forTobit Modelsrdquo Econometrica 54 1435ndash1460

mdashmdashmdash (1986b) ldquoCensored Regression Quantilesrdquo Journal of Econometrics32 143ndash155

Rosenzweig M R and Wolpin K (1980) ldquoLife-Cycle Labor Supply andFertility Causal Inferences From Household Modelsrdquo Journal of PoliticalEconomy 88 328ndash348

Rubin D B (1974) ldquoEstimating Causal Effects of Treatments in Random-ized and Non-randomized Studiesrdquo Journal of Educational Psychology 66688ndash701

mdashmdashmdash (1977) ldquoAssignment to a Treatment Group on the Basis of a Covari-aterdquo Journal of Educational Statistics 2 1ndash26

Wooldridge J (1999) ldquoDistribution-Free Estimation of Some Nonlinear PanelData Modelsrdquo Journal of Econometrics 90 77ndash97

Comment Binary Regressors in NonlinearPanel-Data Models With Fixed EffectsJinyong Hahn

Department of Economics Brown University Providence RI 02912 ( jinyong hahnbrownedu)

Angrist notes in his abstract that ldquomuch of the dif cultywith limited dependent variables comes from a focus on struc-tural parameters such as index coef cients instead of causaleffects Once the object of estimation is taken to be the causaleffect of treatment several simple strategies are availablerdquo(p 2) I examine the consequence of such a perspective forinference in nonlinear panel-data models with xed effectsI argue that (a) the ldquodif cultyrdquo indeed disappears sometimesand (b) structure of treatment assignment plays a crucial rolefor his strategy to be successful

It is instructive to begin with a dif cult nonlinear panel-datamodel with xed effects Consider a panel probit model with xed effects

Pr6yitD 1mdashci1 xi11 xi27 D ecirc4ci

C ˆ xit51

i D 11 1 n3 t D 1120 (1)

It is assumed that yi1 and yi2 are independent of each othergiven 4ci1 xi11 xi25 Here ci denotes the unobserved xedeffects and xit denotes a binary treatment variable We willassume that 4xi11 xi25 D 40115 for all i The index coef cientof interest is ˆ Therefore traditional econometric analysiswould focus on whether ˆ is identi ed and or

pn-consistently

estimableEstimation of the index coef cient ˆ is dif cult for a num-

ber of reasons

1 So far no consistent estimator for ˆ has been developedfor probit with a nonparametric speci cation of the conditionaldistribution of ci given 4xi11 xi25

2 It is not even clear whether ˆ is identi ed or notManskirsquos (1987) identi cation result requires in nite supportfor xit which cannot be satis ed due to the binary nature ofxit

3 It is not clear whether the semiparametric informationbound for ˆ is positive or not It is quite possible that theinformation is actually 0 For example Chamberlain (1992)showed that the information for ˆ is 0 for models with timedummies

4 Chamberlainrsquos (1984) conditional maximum likelihoodestimator is applicable only to logit models

Now to assess whether changing the object of estimationsimpli es statistical analysis consider the average treatmenteffects which in this particular case can be easily shown tobe equal to

sbquo D E6yi2ƒ yi170 (2)

It is not dif cult to see that a simple estimator Osbquo D1=n

PniD14yi2

ƒ yi15 isp

n-consistent As Angrist argues afocus on average causal effects dramatically reduces the dif -culty of estimation The secret is that 4xi11 xi25 D 40115 for alli which effectively ensures that ci is independent of 4xi11 xi25Dif culty of the index estimation listed previously is because

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 16: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Imbens Comment 17

the independence between 4xi11 xi25 and ci cannot be exploitedwithin the xed-effects framework Unfortunately the addi-tional information that 4xi11 xi25 and ci are independent of eachother does not reduce dif culty in estimating the index ˆ It isnot yet clear whether ˆ can be

pn-consistently estimated even

with the random-effects assumption when the random effectsare nonparametrically speci ed On the other hand estimatingsbquo becomes easier as a result of the constancy of the general-ized propensity score Pr 6xi11 xi2mdashci7 Presence of unobservedci in the model was rendered irrelevant due to the constancyof the generalized propensity score [See Imbens (1999) for adiscussion on the generalized propensity score]

It is interesting to note that in the panel probit model (1)estimation of sbquo is not necessarily simple unless the indexstructure is discarded altogether It is useful to note that thenew target parameter sbquo could be estimated consistently usingindex structure if consistent estimators of ˆ and not are givenHere not denotes the distribution of ci We may alternativelywrite (2) as

Z4ecirc4c C ˆ5 ƒ ecirc4c55 dnot4c51 (3)

which can in principle be estimated by using consistent esti-mators of ˆ and not Estimation of sbquo using the alternativecharacterization (3) requires consistent estimation of an addi-tional parameter not a parameter that was not given too muchattention in the past The problem is that not many consis-tent estimators of not are available It is not yet clear whether

the model satis es the primitive conditions for consistency ofthe nonparametric maximum likelihood estimator (NPMLE)as discussed by Heckman and Singer (1984) The dif culty inestimating the target parameter using the expression (3) whichis based on the index structure is in sharp contrast to the easeof the estimation strategy using the expression (2) for whichthe index structure is irrelevant

The preceding discussion suggests that the success ofAngristrsquos perspective critically hinges on the structure of treat-ment assignment and careful reexpression of the new targetparameter If the joint distribution of ci and 4xi11 xi25 is com-pletely unknown it is clear that changing the target parameterdoes not ease the dif culty of estimation Angristrsquos perspectivetherefore requires substantial effort in modeling such jointdistribution Whether such a modeling effort will be successfulin dealing with nonlinear panel problems remains to be seen

ADDITIONAL REFERENCES

Chamberlain G (1984) ldquoPanel Datardquo in Handbook of Econometricseds Z Griliches and M D Intriligator Amsterdam North-Hollandpp 1247ndash1318

mdashmdashmdash (1992) ldquoBinary Response Models for Panel Data Identi cation andInformationrdquo unpublished manuscript

Heckman J and Singer B (1984) ldquoA Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration DatardquoEconometrica 52 271ndash320

Imbens G (1999) ldquoThe Role of Propensity Score in Estimating Dose-Response Functionsrdquo Technical Working Paper 237 National Bureau ofEconomic Research Cambridge MA

Manski C (1987) ldquoSemiparametric Analysis of Random Effects Linear Mod-els from Binary Panel Datardquo Econometrica 55 357ndash362

CommentGuido W Imbens

Department of Economics University of California at Los Angeles Los Angeles CA 90095(imbenseconuclaedu)

It is a pleasure to comment on this article by JoshuaAngrist whose applications of instrumental-variables methods(Angrist 1989 Angrist and Krueger 1991) have been a sourceof inspiration for my own work in this area As with Angristrsquosprevious work on instrumental variables the current articleraises some controversial issues and makes a number of impor-tant points Here I offer some comments on three of themFirst I shall discuss the issues raised in Section 1 ldquoCausalEffects and Structural Parametersrdquo concerning the goals ofstatistical inference Angrist argues that many questions ofinterest are most easily formulated in terms of comparisonsbetween realized and potential outcomes the latter de nedas outcomes that would have been observed under alternativestates of nature I shall explore some of the implications of thisview for empirical practice and econometric theory SecondI shall offer some remarks on the role of economic theory inspeci cation and identi cation of econometric models againreinforcing Angristrsquos point regarding the importance of for-mulating the key assumptions in terms of potential outcomesThird I shall discuss some of the issues related to the limited

dependent nature of outcome variables for empirical practicein particular in the presence of covariates Partly motivatedby the widespread perception of fundamental dif culties inapplying instrumental-variables methods to data with limiteddependent outcome variables Angrist argues that standard lin-ear model techniques are generally applicable I agree withAngristrsquos position that most of these perceived problems areexaggerated but suggest that principled inference should nev-ertheless take account of the limited dependent nature of theoutcome variables and use nonlinear models

1 CAUSAL ESTIMANDS

In his textbook discussion of the difference betweenstructural and reduced-form estimates Goldberger (1997)wrote following Marshak (1953) that the ultimate goal of

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 17: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

18 Journal of Business amp Economic Statistics January 2001

econometrics is to provide predictions More speci cally inmy view the goal is to provide predictions of policy interven-tions Using both economic theory and data economists wishto inform policy discussions by providing predictions of statesof the world under different policy choices Based on com-parisons of such predictions policy makers can then chooseamong the different policies using some social welfare mea-sure as objective function (eg Heckman and Smith 1997)Angrist argues that such questions are most easily formulatedin terms of potential outcomes Here I want to elaborate onthat view

Consider as an example the problem faced by a policymaker contemplating a new tax in a market To evaluatethis policy the policy maker wishes to take into account theeffect of the tax on the quantity traded Economic theory sug-gests that this effect depends on the slope of the supply anddemand functions The rst step is therefore the estimation ofthese slopes and in the remainder of this discussion I shallfocus on this component of the policy-evaluation problem Inprinciple the policy maker may be interested in the entire dis-tribution of the quantity traded under various taxes Let usassume however that for purposes of evaluation of the poli-cies it is suf cient to know the average effect of the policyon the quantity traded If there are only two values for thepolicymdashfor example no tax or a taxmdashthe difference betweenthese two averages is the key quantity of interest FollowingRubin (1974) I will refer to this as the estimand

Note that the choice of estimand is distinct from the sta-tistical question of the speci cation of the model Often thestatistical model is speci ed in such a way that a single param-eter corresponds to the estimand For example in a structuralinterpretation of the linear regression model the coef cientscorrespond to the effect of changing the covariates by a singleunit Such one-to-one correspondence however is the excep-tion rather than the rule Wooldridge (1992) made this point inthe context of BoxndashCox regression models Such models areoften used when a linear representation for E6Y mdashX7 is inappro-priate The BoxndashCox regression model generalizes this linearform to E6Y 4lsaquo5mdashX7 D X 0sbquo where

Y 4lsaquo5 D(

4Y lsaquo ƒ 15=lsaquo lsaquo 6D 01

ln Y lsaquo D 00

Although consistent estimators for sbquo exist under these assump-tions Wooldridge stressed that because (a) the interpretationof sbquo changes with the value of lsaquo and (b) knowledge of sbquo andlsaquo is not suf cient for recovering E6Y mdashX7 there is no reasonfor economists to be interested in estimates of sbquo under theseassumptions In other words sbquo cannot be the sole focus ofthe researcher because the question it answers changes withthe value of nuisance parameters Wooldridge then suggestedan alternative speci cation that always allows the researcherto recover the conditional expectation E6Y mdashX7

In empirical work this distinction between the estimandand the parameters of the statistical model is consistent withthe now common practice of reporting estimates of averagederivatives in binary response models rather than reportingestimates of the logit or probit coef cients Unlike a linear

regression model there is no direct link from one of the coef- cients in the logit or probit model to average causal effectsand thus there is no intrinsic interest in such coef cients

This view is at odds however with a large part of the semi-parametric literature An exception is the work by Stoker (egStoker 1986) who focused on estimation of index coef cientsin settings where these are proportional to average derivativesand thus directly linked to changes in predictions Considerfor example the work on semiparametric estimation of binaryresponse models In this literature such models are esti-mated without making logistic or probit assumptions insteadonly making conditional mean or median assumptions in alatent index interpretation (eg Manski 1985) This literaturehowever has begged the question of why economists shouldbe interested in the coef cient estimates in these models inthe absence of a direct link between these coef cients andthe choice probabilities or their derivatives Similarly someof the models with xed effects in panel data with limiteddependent variables have focused on estimation of parametersthat in themselves do not allow for estimation of conditionalexpectations or their derivatives and thus do not allow for esti-mation of causal effects See Arellano and Honoreacute (in press)for a survey of many of these methods

2 IDENTIFICATION

After deciding on the estimand the next step is to makesubstantive assumptions on the process that generated the dataThis is where economic as opposed to statistical theory playsa key role Theoretical considerations may suggest that certainvariables have no direct causal effect on others because theydo not enter into agentsrsquo utility function nor do they affectthe constraints these agents face For example in some mar-kets it may be reasonable to postulate the existence of demandand supply function and assume that their intersection deter-mines observed prices and quantities In that case it may beargued that certain variablesmdashfor example weather conditionsin agricultural marketsmdashaffect supply at xed prices but notdemand because weather conditions do not affect utility ofthe buyers nor do they constrain their choices given pricesSimilarly theoretical considerations may suggest which vari-ables determine agentsrsquo fertility choices and which variablesare excluded from such choices as in the structural modelsdescribed in Section 12 of Angrist

For the purpose of considering such exclusion restrictionsas well as other assumptions it is important to formulate themin a way that economic theory can be brought to bear on themThis makes the formulation in terms of counterfactuals orpotential outcomes that Angrist advocates particularly appro-priate The potential outcomes describe outcomes in differ-ent environments and as such are the primitives of economicanalyses as well as choices under different sets of constraintswhich are the result of agents solving constrained optimizationproblems Since economic theory studies such optimizationproblems it is therefore well equipped to assess assumptionsformulated directly in terms of these potential outcomes Anexample of the formulation of the critical assumptions in termsof such potential outcomes is Angrist Imbens and Rubin(1996 AIR from here on) In contrast latent index modelsalthough under some conditions mathematically equivalent to

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 18: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Imbens Comment 19

the potential outcome framework (eg Vytlacil 1999) formu-late the critical assumptions in terms of associations betweenobserved variables and unobserved residuals which appearsmore dif cult to contemplate [see Imbens (1997) for a dis-cussion of the confusion such formulations have caused in thestatistics literature]

It is rare that economic theory is speci c enough to deter-mine the exact value of the estimand More typical is thatthe theory is consistent with a range of values for the esti-mand Observations on agentsrsquo choices and outcomes may behelpful in narrowing down this range The econometricianrsquostask is to link the data to the estimand Typically a num-ber of additional assumptions are made at this stage Almostalways it is assumed that there is only limited dependenceor no dependence at all between choices made by differentagents and identi cation focuses on the link between the jointdistribution of the observables estimable in large samplesand the estimand Two possibilities arise at this stage Some-times the estimand can be expressed as a functional of thejoint distribution of the observables in which case the esti-mand is identi ed A leading example is where the estimandis the average treatment effect and theory suggests that assign-ment to treatment is random or at least random conditionalon a set of observed covariates (unconfounded assignmentselection on observables) Alternatively the assumptions sug-gested by economic theory do not allow for the direct linkbetween the distribution of observables and the estimand Inthat case the researcher faces some choices One option advo-cated in a series of papers by Manski (see for a general dis-cussion Manski 1995) is to estimate the range of values ofthe estimand consistent with the data given the substantiveassumptions Another option followed in the current article byAngrist is the local average-treatment-effect approach devel-oped by Imbens and Angrist (1994) to consider what aspectsof the estimand are identi ed given data and assumptions Ininstrumental-variables settings the population average treat-ment effect is often not identi ed but the average effect for aspeci c subpopulation may be In that case one may chooseto estimate the average treatment effect for this subpopula-tion and leave the extrapolation to the principal estimand tothe researcher possibly aided by theoretical considerations AsHeckman wrote ldquo It is a great virtue of the LATE parameterthat it makes the investigator stick to the data at hand and sep-arate out the aspects of an estimation that require out of sam-ple extrapolation or theorizing from aspects of an estimationthat are based on observable datardquo (Heckman 1999 p 832)

Let us consider the case studied by Angrist with its focuson the effect of having more than two children on labor sup-ply Angrist argues that the second birth being a multiple birth(eg twins) is a valid instrument for this effect In terms ofthe AIR formulation this requires a multiple birth to be asgood as randomly assigned and the absence of a systematicdirect effect on labor supply other than through its effect onthe number of children Such assumptions may be contro-versial For example fertility treatments may lead to a sys-tematic association between multiple births and choices madeby couples violating the rst assumption Even if we acceptthese assumptions however they only imply that the aver-age causal effect of more kids on labor supply is identi ed

for women who had a third child solely because their sec-ond birth was a multiple birth (compliers in the AIR termi-nology) In my view it is unlikely that this is the populationof primary interest Nevertheless it is the only subpopulationthe data are informative about in the sense of point identi ca-tion under the substantive assumptions and it would appear tooffer some guidance regarding the population average causaleffect to policy makers similar to the way in the medical worldresults from clinical trials in homogenous subpopulations areregarded as useful because they are viewed as indicative ofpopulation average causal effects

3 LIMITED DEPENDENT VARIABLES

Typically economic theory offers some guidance concern-ing the determinants of certain outcomes without specifyingthe exact form or strength of their relationship In that casestatistical modeling is required to complete the speci cationConsider the example Angrist studies with binary outcomebinary endogenous regressor a binary instrument and covari-ates Angrist suggests as one possible approach estimating theaverage treatment effect through a linear probability modelwith instrumenting for an endogenous regressor The bene tsof the linear probability approach stemming from the linear-ity and robustness against misspeci cation of the rst stageappear to me largely illusory At this point the statistical mod-eling is only intended to provide exible approximations to theunderlying conditional distributions This is a fundamentallydifferent role from that played by the substantive assumptionsthat are essential for identi cation Appeals to consistencyunder speci c parameterizations therefore appear irrelevantmdashin a larger sample one may well wish to use a more exiblespeci cation because less smoothing is required In additionto nding the alleged bene ts of the linear probability modelunpersuasive I nd its disadvantages troubling Within smallsubpopulations characterized by extreme values of the covari-ates the smoothing implicit in linear probability models islikely to lead to unattractive predictions compared to pre-dictions based on nonlinear models that respect the limited-dependent-variable nature of the outcomes

An alternative approach is followed in the study of the effectof u shots on hospitalization rates using randomized incen-tives for vaccination by Hirano Imbens Rubin and Zhou(2000 HIRZ from here on) Given their assumptions exten-sions of those made by AIR to the case with exogenous covari-ates there are three subpopulationsmdashcompliers (units whochange treatment status in response to a change in the valueof the instrument) always-takers (who always take the treat-ment irrespective of the value of the instrument) and never-takers (who never take the treatment irrespective of the valueof the instrument) H IRZ modeled the conditional distributionof these three ldquotypesrdquo conditional on covariates as a trinomialdistribution

Pr4TypeiD cmdashXi

D x5 D exp4x0ndashc5

1C exp4x0ndashc5C exp4x0ndasha51

Pr4TypeiD amdashXi

D x5 D exp4x0ndasha5

1C exp4x0ndashc5 C exp4x0ndasha51

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 19: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

20 Journal of Business amp Economic Statistics January 2001

and

Pr4TypeiD nmdashXi

D x5

D 1 ƒ Pr4TypeiD cmdashXi

D x5 ƒ Pr4TypeiD amdashXi

D x50

Now compare this setup to the selection models Angristdescribes in Section 3 In the selection models the equationdescribing the endogenous regressor is Di

D 18ƒ0C ƒ1Zi

Cƒ 0

2Xi gt Daggeri9 Suppose that the instrument is binary and that ƒ1 ispositive Then the two models are very similar with units withƒ0 Cƒ1 Cƒ 0

2Xi gt Daggeri in the selection model classi ed as always-takers in the potential outcome framework (because irrespec-tive of the value of the instrument Di

D 1 for such units) unitswith ƒ0

C ƒ 02Xi lt Daggeri classi ed as never-takers (because irre-

spective of the value of the instrument DiD 0 for such units)

and the units with ƒ0C ƒ 0

2Xi lt Daggeri lt ƒ0C ƒ1

C ƒ 02Xi classi ed

as compliersOne advantage of the trinomial model is that it easily gen-

eralizes to provide an arbitrarily good t to any conditionaltrinomial distribution by including higher-order terms andinteractions in the covariates If there are no substantive rea-sons to impose additional restrictions one should not imposethem implicitly in the speci cation of the statistical modelIn particular in the selection model it is not suf cient to addhigher-order terms to the covariate vector to provide an arbi-trarily good t to the trinomial distribution Such an approxi-mation would have to involve heteroscedasticity and other dis-tributional extensions that are not straightforward to implementin the selection model

Conditional on the individualrsquos type H IRZ speci ed theoutcome distributions given covariates as logistic regressionmodels Again the aim is to provide a exible approxima-tion to the conditional distribution in a manner that does notimpose any implicit restrictions Given that for a binomialdistribution the logistic regression model can be thought ofas providing a linear approximation to the log odds ratiothis choice is again an appealing one An alternative is theprobit model which also provides a good approximation Less

attractive here is the linear probability model since it requiresinequality restrictions on the parameters if the implicit esti-mates of the probabilities are to be bounded between 0 and 1

In cases with other limited dependent variables alternativenonlinear models may be appropriate For example if the out-comes are durations subject to censoring models speci ed interms of hazard functions (eg Lancaster 1979) may be con-venient for dealing with such data

ADDITIONAL REFERENCES

Angrist J (1989) ldquoLifetime Earnings and the Vietnam Era Draft LotteryEvidence From Social Security Administrative Recordsrdquo American Eco-nomic Review 80 313ndash335

Angrist J and Krueger A (1991) ldquoDoes Compulsory School AttendanceAffect Schooling and Earningsrdquo Quarterly Journal of Ecomonics 106979ndash1014

Arellano M and Honoreacute B (in press) ldquoPanel Datardquo in Handbook of Econo-metrics eds J Heckman and E Leamer Amsterdam Elsevier North-Holland

Goldberger A (1991) A Course in Econometrics Cambridge MACambridge University Press

Heckman J (1999) ldquo Instrumental Variables Response to Angrist andImbensrdquo Journal of Human Resources 34 828ndash837

Heckman J and Smith J (1997) ldquoEvaluating the Welfare Staterdquo in Econo-metrics and Economics in the 20th Century The Ragnar Frisch Centenaryed S Strom New York Cambridge University Press pp 214ndash318

Hirano K Imbens G Rubin D and Zhou A (2000) ldquoEstimating theEffect of Flu Shots in a Randomized Encouragement Designrdquo Biostatistics1 69ndash88

Imbens G (1997) Book Review of The Foundations of Econometric Analy-sis by D Hendry and M Morgan Journal of Applied Econometrtics 1291ndash94

Lancaster T (1979) ldquoEconometric Methods for the Analysis of DurationDatardquo Econometrica

Manski C (1985) ldquoSemiparametric Analysis of Discrete Responserdquo Journalof Econometrics 27 313ndash333

mdashmdashmdash (1995) Identi cation Problems in the Social Sciences CambridgeMA Harvard University Press

Marshak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Stoker T (1986) ldquoConsistent Estimation of Scaled Coef cientsrdquo Economet-rica 54 1461ndash1481

Vytlacil E (1999) ldquo Independence Monotonicity and Latent Index ModelsAn Equivalency Resultrdquo unpublished manuscript University of ChicagoDept of Economics

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

CommentRobert A Mof tt

Department of Economics Johns Hopkins University Baltimore MD 21218 and National Bureau of EconomicResearch Cambridge MA (mofttjhuedu)

The Problem Although the article by Angrist rangesacross a number of issues much of the discussion and thearticle title suggests that the problem of concern is that instru-mental variables ( IV) cannot be used in one of three commonmodels Let the rst model be y D Csbquod CxbdquoChellip where y isan absolutely continuous variable but d is binary and wherex is independent of hellip but d is not Then sbquo can be consistentlyestimated with IV (Heckman and Robb 1985) Let the secondmodel be y uuml D C sbquod uuml C xbdquo C hellip where y uuml and d uuml are contin-

uous and where y D 14y uuml gt 05 and d D d uuml are the observedvariables The parameters of this model can likewise be esti-mated by IV with some auxiliary assumptions (Newey 1986see Blundell and Smith 1993 for a review of alternative meth-ods) But let the third model be y uuml D C sbquod C xbdquo C hellip where

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 20: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Moftt Comment 21

again y D 14y uuml gt 05 but now d is binary The parameters ofthis model cannot be estimated by standard IV Many of theexamples in the Angrist article involve the censored regres-sion model rather than the binary choice model but this andthe other main points in the article apply to both

The Angrist Solution The Angrist solution to the problemposed by the third of these models where IV is not applica-ble is to declare sbquo an uninteresting parameter and not worthyof estimation [or in his words to ldquopuntrdquo (p 8)] This is remi-niscent of the solution to the Vietnam War suggested by some1960s commentators The United States should have simplydeclared victory quickly withdrawn and hoped that no onenoticed that they had in fact lost

If sbquo is uninteresting what is interesting according toAngrist His answer is that interest should center on leastsquares approximations with y as the dependent variable andd and x as regressors If endogeneity of d is a problem linearIV should be applied with linear IV given a LATE interpre-tation He provides an empirical illustration

I will comment on three questions raised by the Angristarticle (1) Is sbquo an interesting parameter (2) Is the Linear IVestimator using observed y and d which Imbens and Angrist(1994) relabeled the LATE an interesting quantity (3) Is theillustrative application given in the Angrist article interesting

Is sbquo an Interesting Parameter As Angrist notes manyof the issues raised by his solution have nothing to do withthe endogeneity of d Whether sbquo is an interesting parame-ter is one of these at least in part Given Angristrsquos prefer-ence for linear projections his position necessarily implies thatall nonlinear models of which the latent index model (L IM)is just one example are uninteresting (Angrist is not com-pletely consistent in his position because he does propose anonlinear model with an exponential conditional mean func-tion in one section of his article that model is subject to thesame objections that he subjects the LIM tomdasharbitrary non-linearities etc) Thus his position boils down to a preferencein the binary dependent-variable case for the linear proba-bility model (LPM) over the LIM and other nonlinear mod-els This issue has been discussed for many years even in asimultaneous-equations context [eg Heckman and MaCurdy(1985) which Angrist does not reference]

The dif culty with Angristrsquos polar position on this issuemdashthat the LPM is the only model of interestmdashis that it is funda-mentally untenable The LPM is attractive because it is easyto interpret providing parameter estimates that do not requiretransformation to learn the effects of a regressor on the meanof the dependent variable It has a role to play in empiricalwork in summarizing the data as regards the conditional meanfunction and for initial explorations of the data and virtu-ally all practitioners use it for this purpose But beyond thisit has no defense Either it is equivalent to the latent-variablemodel if the model is saturated or if the model is not satu-rated imposes functional form restrictions that may not holdand may t the data worse than the latent-variable model

Thus if the model is y D 14 C sbquod C hellip gt 05 where d is asingle dummy variable regressor independent of hellip the probit(say) estimates of and sbquo map one-to-one onto the leastsquares intercept and coef cient on d in the LPM and hencethere is no gain to estimating the model either way because the

estimators are equivalent The same point extends to the caseof a multinomial d If the model is y D 14 C sbquod C xbdquo C hellip gt

05 where x and bdquo are vectors of exogenous covariates andcoef cients respectively the LPM estimate of the effect ofd on E4y5 conditional on x may be demonstrably worsethan the probit estimator of the same effect even for theobject of estimating E4ymdashd1 x5 at points 8d1 x9 observed inthe data (Angrist falls back on the weak principle that leastsquares estimates provide best linear approximations but theissue is that the true model might be nonlinear) Nothingin Angristrsquos article provides evidence that the probitor anyother latent index formulation provides an inferior estimate of6E4ymdashd11 x5 ƒ E4ymdashd01 x57 compared to the LPM for the twoestimators smooth the joint response function with respect tod and x in different ways

Although the latent-variable model has no necessary claimfor superiority in the class of all nonlinear models its pop-ularity rests on its ability to generate a wide variety of non-linearities with a relatively parsimonious speci cation arisingfrom the convolution of the latent index with the cdf of hellipExpansion of the LPM to incorporate equivalent nonlinearitiesis cumbersome and inef cient

The possible inferiority of the LPM in capturing nonlinear-ities in nonsaturated models is also important for interpolationand extrapolation to points not in the observed data Gettingthe nonlinearities right in the observed data is important ininterpolation and extrapolation if the true model is nonlinearas the binary choice model necessarily is (since y is boundedby 0 and 1) Although Angrist at one point does mentionpredictionmdashstating that the linearized models he proposes area good ldquojumping-off point for any prediction exerciserdquo a state-ment without supportmdashhe is for the most part not interestedin prediction so much as summarizing the observed data Thisis ne but unfortunately prediction outside the observed his-torical data is the pervasive concern of the policy makers whoare the ultimate consumers of applied economic research forthey must make predictions of new policies in new environ-ments on a daily basis as part of their jobs The methods pro-posed in Angristrsquos article are therefore not very useful for theformulation of new policy

It should be noted that some make the argument that theoryand economic models must play a role in guiding the formu-lation of empirical relationships used for prediction outsidethe range of the observed data That position is not taken herebecause it is not necessary to establish the potential superior-ity of the L IM over the LPM in nonsaturated models and forprediction Occamrsquos razor makes it unnecessary

It should also be noted that these issues have nothing todo with causal effects because d is assumed exogenous Thusthe opposition that Angrist poses between L IMrsquos and ldquocausaleffectsrdquo models is a false one at least in general If a ldquocausaleffectrdquo is de ned as the true effect of a variable d on onlythe rst moment of a variable y (a rather restricted de nitionof causal effect) that causal effect can be derived from thelatent-variable model just as well The issue is instead merelynonlinearity of the function E4ymdashd1 x5

Angrist also states that the L IM requires ldquoconstant coef- cientrdquo and ldquodistributional assumptionsrdquo for identi cation

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 21: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

22 Journal of Business amp Economic Statistics January 2001

unlike the LPM Angrist does not make suf cient quali ca-tions to these assertions and they are indeed false for theL IM A random coef cient sbquo in the LIM is permissible with-out distributional assumption ( Ichimura and Thompson 1998)and the coef cients in the standard constant-coef cient L IMare identi ed semiparametrically (ie under unknown distri-bution of hellip) under the restriction that d and hellip are independentan assumption generally made for causal interpretation of theLPM model as well (Manski 1988 Horowitz 1993 Powell1994) In addition as already noted in a fully saturated modelthe LPM and the L IM are equivalent in any case

Is the LATE an Interesting Statistic The Angrist assertionthat LATE is the only interesting statistic if d is endogenousor perhaps the only statistic worth bothering about because itis the only one identi ed by the data has the same partialvalidity as his preference for the LPM over the LIM IV isone of the most popular techniques in applied econometricsand has a natural intuition indeed one that does not requirethe LATE interpretation per se It is one of the most usefultools in the applied economistrsquos kit The simple IV estimatordiscussed by Angrist makes minimal assumptions and givesminimal information back to the analyst as a result But tosay that it produces the only statistic of interest does not havedefense both because it is equivalent to an LIM that is fullysaturated and may t the data worse than an LIM if not satu-rated and because it implies that there is no value to makingadditional assumptions to obtain additional information

The limiting nature of the LATE statistic is again in itsinattention to nonlinearities interpolation and extrapolationThat nonlinearities can be important is as true in this caseas it was in the exogenous case just discussed when thereare additional x covariates and when the model is not satu-rated As for interpolation and extrapolation the LATE statis-tic 6 Ny4z D z15ƒ Ny4z D z057=6 Nd4z D z15ƒ Nd4z D z057 denotes theeffect on a change in z from value z0 to z1 on E4y5 scaledby the change in the E4d5 It does not have implications forthe effect on E4y5 of any other change in z or for a changein any other policy variable Thus it is not particularly usefulfor policy changes other than a change of z0 to z1 in the sameenvironment (ie conditioned on the same x) This stands incontrast to an L IM for E4dmdashx1 z5mdashor any parametric modelfor d for that mattermdashthat allows for the change from z0 toz1 to inform policy makers and others of the likely changesof other values of z and of other variables x In fact theLATE statistic which is not a parameter in the usual senseof the word can always be expressed as a nonlinear combi-nation of parameters of a latent-variable model but not viceversa hence the latter model is more general than the former[See Heckman and Vytlacil (1999) for a discussion of how toexpress the LATE in terms of an L IM see also Angrist andImbens (1999) and Heckman (1997 1999) for a discussion ofsome of these issues]

In addition Angristrsquos focus on the advantages of the LATEfor model-free estimation of the effect of a change in z fromz0 to z1 on y reveals a philosophical weakness in his posi-tion This is because the effect of a change in z from z0 toz1 on y can be estimated from the reduced form the struc-tural form is not needed It is not explained why the statistic

6E4ymdashz D z15 ƒ E4ymdashz D z057 is not the only quantity of inter-est The answer that most economists would give is to say thatstructural coef cients are of interest because they can be usedto extrapolate the effects of changes in d on y to other changesin z than from z0 to z1 in the same environment indeedthis is the basic rationale for structural estimation given inthe famous essay by Marschak [1953 see also Christ (1994)and Heckman (2000) for discussions Marschakrsquos argumentwas more subtle than this arguing that restrictions on struc-tural parameters are needed for out-of-sample prediction butthis translates into restrictions on the reduced form as well]However by ruling out the possibility of learning about anyof the effects of changes in z or d other than those inducedby a change in z from z0 to z1 Angrist removes the need todo structural estimation in the rst place Angrist implicitlyassumes that a structural coef cient of interest exists on thevariable d in the y equation and that that coef cient has mean-ing independent of a particular change in z but the rest of hisdiscussion contravenes that interest

A reduced-form research program which is where theAngrist position leads is of considerable value There is noth-ing wrong with a research program to collect a large body ofinformation on the effects of a wide variety of changes in dif-ferent policy variables 4z5 from particular values (z0) to otherparticular values (z1) each taking place in particular environ-ments (x) But if the research program stops there very littleuseful has been learned other than a collection of facts aboutthe effects of particular policies in particular environments

Is the Application in the Angrist Article Interesting TheAngrist article uses a recently popular exclusion restriction ornatural experiment to identify the effect of fertility on laborsupply and to illustrate his preferred methodsmdashnamely theuse of twins The use of twins as an exclusion restrictionstands in contrast to variations in a government policy or lawwhich are often used as exclusion restrictions both in recentwork on natural experiments as well as a much older litera-ture that uses cross-sectional and overtime variation in state-or country-speci c taxes and transfer rules to identify modelparameters However although it can be of interest to estimatereduced-form policy effects 6E4ymdashz D z15 ƒ E4ymdashz D z057 asjust discussed it is not so obvious that there is any interest inestimating the effects of having twins on the expected valueof y Creating twins is not a variable directly subject to policymanipulation

The dif culty in the use of twins arises for the same reasonalready discussedmdashnamely that the preference for model-freeestimation of policy effects leads to a lack of interest in thefunction E4dmdashz1 x5 and hence to a lack of interest in what canbe learned from a study of twins about the effects of someother more relevant policy variable that might be manipulatedA model for the E4dmdashz1 x5 is needed to make that connectionand hence to make a study of twins of any interest yet that isintentionally eschewed in the approach proposed by AngristOnce again this leads to an uninteresting and quite limitingset of exercises

Conclusions The set of methods laid out in the Angristarticle at least if a more relevant instrument were used yieldsa set of model-free statistics that are suitable for exploratorywork on a research question prior to the (possibly nonlinear)

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 22: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Mullahy Comment 23

structural estimation and prediction that should be the ultimateobject of applied economics research

ACKNOWLEDGMENTS

This comment is a revision of remarks delivered at theJoint Statistical Meetings Baltimore August 1999 I thankCarl Christ James Heckman Joel Horowitz Michael KeaneThomas Mroz Geert Ridder and Edward Vyltacil for com-ments with the usual disclaimer that the views expressed hereshould not be taken as representing those of any of theseindividuals

ADDITIONAL REFERENCES

Angrist J and Imbens G (1999) Comment on ldquo Instrumental VariablesA Study of Implicit Behavioral Assumptions Used in Making Pro-gram Evaluationsrdquo by J J Heckman Journal of Human Resources 34283ndash827

Blundell R and Smith R (1993) ldquoSimultaneous Microeconometric ModelsWith Censored or Qualitative Dependent Variablesrdquo in Handbook of Statis-tics (Vol 1) eds G S Maddala C R Rao and H D Vinod AmsterdamNorth-Holland pp 117ndash141

Christ C (1994) ldquoThe Cowles Commissionrsquos Contributions to Econometricsat Chicago 1939ndash1955rdquo Journal of Economic Literature 32 30ndash59

Heckman J (1997) ldquoInstrumental Variables A Study of Implicit BehavioralAssumptions Used in Making Program Evaluationsrdquo Journal of HumanResources 32 441ndash462

mdashmdashmdash (1999) ldquo Instrumental Variables A Reply to Angrist and ImbensrdquoJournal of Human Resources 34 828ndash837

mdashmdashmdash (2000) ldquoCausal Parameters and Policy Analysis in Economics ATwentieth Century Retrospectiverdquo Quarterly Journal of Economics 11545ndash97

Heckman J and MaCurdy T (1985) ldquoA Simultaneous Equations LinearProbability Modelrdquo International Economic Review 18 21ndash37

Heckman J and Vytlacil E (1999) ldquoLocal Instrumental Variables andLatent Variable Models for Identifying and Bounding Treatment EffectsrdquoProceedings of the National Academy of Sciences 96 4730ndash4734

Horowitz J (1993) ldquoSemiparametric and Nonparametric Estimation of Quan-tal Response Modelsrdquo in Handbook of Statistics (Vol 11) eds G SMaddala C R Rao and H D Vinod Amsterdam North-Holland pp45ndash72

Ichimura H and Thompson T (1998) ldquoMaximum Likelihood Estimation ofa Binary Choice Model With Random Coef cients of Unknown Distribu-tionrdquo Journal of Econometrics 86 269ndash295

Manski C (1988) ldquoIdenti cation of Binary Choice Modelsrdquo Journal of theAmerican Statistical Association 83 729ndash738

Marschak J (1953) ldquoEconomic Measurements for Policy and Predictionrdquoin Studies in Econometric Method eds W Hood and T KoopmansNew York Wiley pp 1ndash26

Powell J (1994) ldquoEstimation of Semiparametric Modelsrdquo in Handbook ofEconometrics (Vol 4) eds R Engle and D McFadden Amsterdam North-Holland pp 2443ndash2521

CommentJohn Mullahy

Departments of Preventive Medicine and Economics University of Wisconsin Madison WI 53705 andNational Bureau of Economic Research Cambridge MA (jmullahyfacstaffwiscedu)

Articles bearing titles containing phrases like ldquoSimpleStrategies for Empirical Practicerdquo are often wolves masquerad-ing as sheep the methodologies they devise being far fromldquosimplerdquo and far removed from what most humble practition-ers perceive in the realm of ldquoempirical practicerdquo Not so hereJoshua Angrist has written a tight comprehensive article thatis stimulating and important yet also eminently useful

As typi es much of Angristrsquos work the main concern in thisarticle is on how to elicit interesting characterizations of causaleffects from microdata with the particular twist here being afocus on outcomes measured as ldquolimited dependent variablesrdquo(LDVrsquos) The main take-away message I glean from this arti-cle is that applied analysts working on causal effect or struc-tural analyses in LDV contextsmdashtraditionally vexing contextsinsofar as consistent estimation and inference are concernedmdashhave considerable grounds for optimism Angrist lays out andinterprets systematically a set of issues and methods that pro-vide practitioners with a variety of implementable strategiesthat might be brought to bear on such empirical problemsA corollary take-away message is that in some respects ldquothisstuff is not really as hard as wersquove tended to make itrdquo withAngrist demonstrating for instance the potential merits ofsimple linear instrumental variable ( IV) methods for estimat-ing causal effects in a variety of LDV contexts In no eventcan applied analysts escape the requirement of nding theo-retically sound instruments but Angrist exposits compellingly

a broad description of how such instruments might be used toelicit interesting causal inferences in LDV contexts

I have no real quibbles with any of the substance ofAngristrsquos arguments Rather I will devote my commentarymainly to amplifying and expanding several of the themes hedevelops throughout the article

1 FOCUS ON CAUSAL OR PARTIAL EFFECTSVERSUS FOCUS ON CONDITIONAL

EXPECTATIONS FUNCTIONS

It seems fair to suggest that much of applied microecono-metrics is concerned primarily with understanding the signsand magnitudes of quantities like iexclE6Y mdashX1 D7=iexcl4X1 D5 oriexclE6Y mdashX7=iexclX0 Yet much of the actual dirty work in under-taking causal analysis in LDV contexts seems to result fromdecisions to undertake analyses in settings where E6Y mdashX1D7

or E6Y mdashX7 are restricted to be positive without a priori restric-tions on parameter values [thus the tradition of using tobit-class conditional expectations functions (CEFrsquos) two-stepselection models exponential CEFrsquos etc]

As practitioners however we should pause to assesswhether speci cations akin to E6Y mdashX1D7 D exp4Xsbquo C D5 gt

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 23: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

24 Journal of Business amp Economic Statistics January 2001

0 are ultimately buying anything important with respect tothe rst-order questions being explored It is only com-mon sense that analysts should spend relatively more energyworking on understanding quantities of primary interest(eg iexclE6Y mdashX1D7=iexcl4X1 D5) and worrying relatively lessabout the formulation and implications of assumptions (egE6Y mdashX1 D7 gt 0) that may have little or no ultimate bearingon the particular questions being addressed This is espe-cially so when the latter effort (restricting via functional formchoice E6Y mdashX1 D7 gt 0) will tend to complicate rather thanfacilitate the former effort (understanding the partial or causaleffects) This is not to suggest that quantities like E6Y mdashX7 andE6Y mdashX1 D7 may not themselves be interesting to estimate (egmodeling conditional mean health care expenditures for use inforecasting future expenditure levels) but whether E6Y mdashX1D7

is parametrically and parsimoniously better approximated byexp4XsbquoC D5 or by XsbquoC D or by some other g4Xsbquo CD5

or h4X1D1 ƒ5 ismdasheven in LDV settingsmdashultimately an empir-ical matter to be assessed by goodness of t (eg via condi-tional moment tests)

I note this important angle somewhat sheepishly sincein both articles of mine that Angrist cites (Mullahy 19971998) there is considerable emphasis on the use of exponen-tial CEF speci cations as enforcers of positive conditionalmean ldquorequirementsrdquo In fact the main motivation for theearlier article was to nd a general approach to obtainingconsistent estimators of structural parameters in parametricsettings less distribution-bound than tobit-class models whenthe key requirement or side constraint is E6Y mdashX1 D7 gt 0 withthe resulting strategy a more or less brute-force method foraccomplishing this in a nonlinear IV setting Angristrsquos arti-cle takes this idea quite a bit further and demonstrates how(and with the addition of a further normalization) the struc-tural parameter estimation approach I discussed can be devel-oped into a model to analyze causal effects when the latter arecharacterized as ldquoproportional treatment effectsrdquo

Whether a proportional as opposed to an additive treatmenteffect is interesting (it may or may not be) remains to be seenin any particular application Wooldridgersquos recent work onaverage causal effects (ACErsquos Wooldridge 1999) would seempertinent in contexts in which the role of the X covariates ismore ldquointrusiverdquo than in the simple additivelinear treatment-effect setup As a general matter the extent to which infer-ences about causal effectsmdashhowever formulatedmdashhinge onthe inclusion of particular covariates as conditioning regres-sors is an interesting and potentially important angle thatAngristrsquos article begins to explore and is one that likely tomerit additional research For the particular issues at handthe very nature of proportional treatment effects brings intofocus explicitly the role of the X rsquos so how their correlationwith the treatment indicators 4D5 in uences inferences aboutthe causal impacts of the latter would seem to be a rst-orderconsideration

2 TWO-PART MODELS

I found Angristrsquos analysis of the interpretation of causaleffects in two-part models to be extremely illuminatingIn a reduced-form setting I argued (Mullahy 1998) that amid

all the concerns about zeros robustness outliers transforma-tion retransformation and such that have typically attendedmodeling efforts involving two-part models it would be sur-prising in a well-structured empirical investigation if therewere not stated or lurking concerns about partial effectsiexclE6Y mdashX7=iexclX andor CEFrsquos E6Y mdashX7 themselves It may be insome cases that such concerns are not articulated and it mayeven be in less well-structured problems that they are not obvi-ous to analysts themselves Yet for instance unless robustlyestimated sbquorsquos from a log-linear part-2 speci cation of a two-part model could inform a rst-order question about a par-tial effect or a CEF of what practical use were they likelyto be My rather modest and rather obvious argument wasjust that the concerns about the iexclE6Y mdashX7=iexclX or the E6Y mdashX7mdashif they are the reason the analysis is being conducted in the rst placemdashshould enjoy rst-order prominence in the estima-tion exercise and that concerns about zeros outliers and suchshould be relegated in some sense to second-order status Insome cases two-part models will serve nicely to address such rst-order concernsmdashat least in reduced-form settingsmdashbut inothers they will not

Angrist appears to have some sympathy with such argu-ments but more importantly he provides a valuable serviceto users of two-part models by unearthing one major limi-tation in the analysis of causal effects The problem is notwith part 1 (ie the logit probit or linear probability com-ponent) because part 1 of the two-part model falls within themain lines of Angristrsquos LDV analysis Rather the complicationarises with part 2 of the two-part model in which the additionalldquoY gt 0rdquo conditioning arises In the standard (reduced-form)two-part model quantities like E6Y mdashX1Y gt 07 are prominentand identi ed readily But in the counterfactual settings inwhich causal effects are manifested wherein the realized Y

arises from self-selection into or out of treatment such selec-tion effects introduce an ambiguity [Eq (12)] and therebyconfound the ability to glean causal effects from part-2 esti-mates (unless as Angrist notes censored regression methodsare used but Angrist also offers some compelling argumentsagainst blind reliance on censored regression approaches) Assuch if estimation of and inference about causal effects arethe primary analytical concerns in data settings with Y para 0and Pr4Y D 05 gt 01 analysts may be well advised to avoidtwo-part modeling strategies and pursue some of the moredirect linear and nonlinear estimation approaches discussedby Angrist in which one-part estimation approachesmdashzerosand allmdashyield estimates of causal effects that will be directlyinterpretable

3 BEYOND CEFrsquoS

One noteworthy feature of the methods Angrist discussesmdashwith the main results attributed to Abadie (1999)mdashare theirapplicability to estimation of causal effects for characteristicsof the conditional distribution rdquo4Y mdashX1D5 beyond just CEFrsquosEstimation of causal or treatment effects for conditional quan-tiles (QTErsquos) and conditional distribution function ordinates(ldquodistilesrdquo) Pr4Y lt cmdashX1D5 is demonstrated to t properlywithin the weighting strategies advanced by Abadie Since

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 24: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Todd Comment 25

quantile and distile analyses may be more relevant in address-ing particular analytical questions or policy issues than CEFrsquosthe ability to extend the mechanisms of causal-effect esti-mation strategies in such directions should be welcomed byapplied researchers

I might suggest that Angristrsquos claim about an advantage ofquantile causal analysis over distile causal analysis being thatthe latter relies on application-specic values of ordinates (c)is perhaps oversold or underdeveloped In many interestingand policy-relevant applications focus on the application-speci c ordinates is fundamental What factors cause employ-ers to offer more than one health insurance plan to employeesWhat factors cause consumers of alcoholic beverages to drinkmore than two drinks per day What factors cause elderlyindividuals to utilize more than one prescription drug prod-uct What factors cause second earners to work more than17 hours per week Each of these concerns an importantreal-world causal question for which the application-specicordinatemdashwhether for institutional legal physiological orother reasonsmdashis a fundamental ldquopivot pointrdquo for the analysisAnalysis of such distile relationshipsmdashwhether in causal or inreduced-form settingsmdashis a potentially powerful method forinforming decision makers and it seems to me that its mer-its relative to quantile analyses would have to be judged on acase-by-case basis

4 CONCLUSIONS

Practitioners of applied microeconometrics will nd thatthe time invested in a careful reading of Angristrsquos article hassizable returns not merely because of the analytical and con-ceptual insights it offers but ultimately also because it sug-gests a variety of important practical innovations that appliedresearchers can exploit empirically to more clearly understandthe nature of causal relationships in their data Applied microresearchers in areas like health labor development public nance and such work commonly with data wherein (a) out-come measures are limitedmdashoften most importantly becausethey are nonnegative with mass points at 0mdashand (b) the rolesof conditioning covariates (X rsquos) are most conveniently summa-rized via regression methods Angristrsquos article provides appliedresearchers so endowed with a set of powerful tools for elic-iting causal inferences from such data Of course the punchline is familiar and inevitable All bets are off without theo-retically solid instruments But with such instruments in handthe strategies exposited by Angrist offer analysts a rich set ofperspectives on estimation

ADDITIONAL REFERENCE

Wooldridge J M (1999) ldquoEstimating Average Partial Effects Under Con-ditional Moment Independence Assumptionsrdquo mimeo Michigan StateUniversity Dept of Economics

CommentPetra Todd

Department of Economics University of Pennsylvania Philadelphia PA 19104 (petraathonasasupennedu)

This article considers ways of estimating the effect ofbinary endogenous regressors in models with limited depen-dent variables It questions the usefulness of conventionalestimation strategies aimed at recovering structural modelparameters and advocates the use of simple instrumental vari-ables ( IV) estimators as an alternative on the grounds thatthese estimators invoke weaker assumptions and often suf ceto answer questions of interest in empirical studies In theauthorrsquos view the main challenge facing empirical researchersis the problem of identi cation of ldquotreatment effectsrdquo throughIV Here I expand the discussion by considering two ques-tions (1) What are the limitations of the causal model asa paradigm for policy analysis (2) When simple estimatorsare suitable for answering a question of interest what are thetrade-offs that need to be considered in using them

1 WHEN IS THE ldquoCAUSAL MODELldquoA USEFULPARADIGM FOR POLICY ANALYSIS

The article gives the impression that most interesting ques-tions in economics can be answered within the context ofthe ldquocausal modelrdquo [variously attributed to Neyman (1923)Fisher (1935) Cox (1958) Roy (1951) and Rubin (1978)]The causal model is a very general framework that assumesthat there are potential outcome states associated with having

received some treatment and having received no treatment (orpossibly a different treatment) For each individual only oneof the potential outcome states is observed which leads to amissing-data problem in attempting to draw inferences aboutaspects of the treatment effect distribution

The language of potential outcomes is very general soalmost any economic problem could be formulated in theseterms However this does not mean that estimators proposedin the literature for the causal model are useful for all or evenmost economic problems A major limitation of the modelis that it assumes that the state ldquowith treatmentrdquo has beenobserved for at least a subset of people In medical trials orbiological experiments this assumption is probably reason-able but in economics we often are interested in evaluatingeffects of treatments that have not yet been implemented Forexample we might be interested in predicting the effect ofraising the age receiving Social Security bene ts to a new ageor the effect of introducing new term limits on welfare par-ticipation An implicit assumption needed to apply any of theidenti cation results described in this article is that both Y1

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 25: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

26 Journal of Business amp Economic Statistics January 2001

and Y0 are observed which would not be satis ed in manycases Thus the causal model and the estimators developedfor it are inadequate when it comes to any economic questioninvolving a state of the world that has not been observed

Despite these limitations treatment effect estimates aresometimes used to predict consequences of new treatments orof existing treatments for new populations When the policychange being considered is very close to changes observedin the data and is to be applied to similar individuals thenextrapolation may be reasonable But if the policy change is(a) of a magnitude outside the range of experience (b) alonga new dimension or (c) intended for a new population then itis an open question as to how reliable the estimates would beTreatment impact estimates are not intended to capture any-thing invariant to changing conditions The assumptions thatwould be required to justify extrapolations or generalizationsare often quite strong To some extent it is an illusion thatthese estimators allow for general kinds of inference underweak assumptions because the assumptions required to justifytheir application for particular purposes are often not madeexplicit

Angrist argues that the identi cation results for the localaverage treatment effect (LATE) estimator provide a ldquofounda-tion for credible causal inferencerdquo (p 5) and a ldquominimum con-troversy jumping-off point for any prediction exerciserdquo (p 5)But as discussed previously the causal model maintains sucha high level of generality that it offers little guidance for manykinds of problems It is important to recognize when and whennot the LATE estimator is likely to be useful This articleclaims that ldquoin practice estimates of LATE differ little fromestimates based on the stronger assumptions invoked to iden-tify effects on the entire treated populationrdquo (p 5) But suchan extrapolation is not justi ed under the theory and is surelyunlikely to hold in all circumstances As shown in a num-ber of works by various authors LATE provides the averageeffect of treatment for the group of ldquocompliersrdquo who are peo-ple induced by the instrument to receive treatment This groupdoes not except under very special circumstances correspondto the group of treated persons and may in fact representonly a small fraction of persons receiving treatment So notonly does LATE not suf ce for answering problems of the sortdescribed by (a) (b) and (c) it also does not generally pro-vide the average effect of treatment on the treatedmdashthe mostcommon parameter of interest in analyzing social programsAnother not-so-attractive feature of the estimator is that thepopulation of ldquocompliersrdquo is usually not identi able so wecannot say exactly whose treatment impact is being estimatedApplying the LATE parameter estimate to the full populationof treated persons would generally require ruling out certaintypes of heterogeneity in individual responses to treatmentas discussed by Heckman (1997) The explicit assumptionsthat could justify this type of extrapolation were shown byHeckman and Vytlacil (2000 in press)

2 ESTIMATORS SHOULD BE TAILOREDTO THEIR APPLICATION

A general theme throughout the article is that wheneverpossible researchers should restrain from imposing paramet-ric restrictions in estimation because the structure they impose

may be wrong resulting in biased estimates This view isexpressed for example in the discussion of the two-partmodel The article advocates the use of a two-part approachinstead of a conventional selection-model approach maintain-ing that the two-part approach is less likely to lead to biasedparameter estimates because it does not impose cross-equationrestrictions as a conventional tobit estimator would

How do researchers decide whether or not to impose restric-tions across estimating equations that are structurally linkedThe most agnostic approach would be to estimate both equa-tions completely nonparametrically but this approach is rarelypractical Nonparametric estimators are consistent under themost general conditions but in conventional size samples thestandard errors would be so large that the estimates would forpractical purposes be useless Researchers impose structurebecause they are willing to restrict the class of models underwhich their estimators are consistent in exchange for greaterprecision These ef ciency considerations would also apply inthe deciding whether or not to impose cross-equation restric-tions in estimating simultaneous-equations models If we areright about the structure there is an ef ciency gain fromimposing the restrictions If we are wrong imposing them maylead to bias When there are overidentifying restrictions it isat least possible to develop tests of the model speci cation thatcan serve as a guide in choosing a structural model that is sup-ported by the data This article presents a dichotomy betweenthe two-part model and a fully parametric traditional selec-tion speci cation but there are many other modeling choicesranging from fully nonparametric to semiparametric to fullyparametric In modest size samples ef ciency is an impor-tant consideration and a parametric model may be the mostappropriate one For very large samples fewer parametricapproaches are feasible Both the probability-of-working andthe hours-worked equations could for example be estimatedby a semiparametric method such as semiparametric leastsquares ( Ichimura 1993) Which estimation method is mostappropriate ultimately depends on how much data is avail-able and how many parameters are to be estimated It doesnot make sense to criticize the use of structured approaches infavor of more exible modeling approaches without regard tothe context in which they are being applied

3 ON THE GAP BETWEEN THEORY ANDEMPIRICAL PRACTICE

Finally another recurring emphasis in the article is on thevalue of shortcuts in empirical practice In the authorrsquos viewthere is a gap between the econometric theory and what isfeasible in practice and shortcuts and approximations help tobridge this gap To this end the article extols for examplethe virtues of ordinary least squares (OLS) as a way of approx-imating any unknown conditional expectation function with-out much concern as to whether a best linear approximation isa good approximation If the model has discrete regressors andis fully saturated it is well known that OLS is equivalent to anonparametric estimator But in other cases a linear approxi-mation could be highly biased and the bias will not go awayas the sample size gets large There are a variety of alterna-tive asymptotically unbiased estimators Along similar lines

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 26: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

Angrist Reply 27

the article suggests that in implementing a ldquoselection-on-observablesrdquo estimator it may be all right simply to ignore theweighting scheme Such claims seem to me fairly outrageousbecause failure to take into account weighting could lead tovery different estimates and would not provide a consistentestimator of the parameter of interest Heckman Ichimura andTodd (1998) discussed alternative estimators that are not dif -cult to implement and take into account the weighting scheme

Finally the appendix presents a variety of shortcuts forobtaining estimates and standard errorsmdashsome of which leadto inconsistent estimates When shortcuts lead to very crudeapproximations or inconsistent estimates I question their use-fulness I do think it unreasonable to expect that estimatesbased on simple IV estimators be accompanied by correctstandard errors that correctly take into account the in uence ofestimated parameters of estimated regressors I see little valuein reporting incorrect standard-errors estimates on the groundsthat it saves time in computation especially when there areseveral works long available in the literature that develop sim-ple ways of solving these kinds of problems (eg see Newey1984) Although there is surely a gap between the estimatorsdeveloped in the theoretic econometric literature and the esti-mators used in the empirical literature the gap is not as greatas this article supposes

ADDITIONAL REFERENCES

Cox D R (1958) The Planning of Experiments New York WileyFisher R A (1935) Design of Experiments London Oliver and BoydHeckman J J (1997) ldquo Instrumental Variables Evidence From Evaluating a

Job Training Programmerdquo Journal of Human Resources 32 441ndash462Heckman J Ichimura H and Todd P (1997) ldquoMatching as an Econometric

Evaluation Estimator Theory and Evidence on its Performance Appliedto the JTPA Program Part I Theory and Methodsrdquo Review of EconomicStudies 64 605ndash654

mdashmdashmdash (1998) ldquoMatching as an Econometric Evaluationrdquo Review of Eco-nomic Studies 65 261ndash294

Heckman J and Vytlacil E (2000) ldquoCausal Parameters Structural Equa-tions Treatment Effects and Randomized Evaluations of Social Programsrdquounpublished manuscript University of Chicago Dept of Economics

mdashmdashmdash (in press) ldquoLocal Instrumental Variablesrdquo in Nonlinear StatisticalInference Essays in Honor of Takeshi Amemiya eds C Hsiao KMorimune and J Powell Cambridge UK Cambridge University Press

Ichimura H (1993) ldquoSemiparametric Least Squares (SLS) and Weighted SLSEstimation of Single Index Modelsrdquo Journal of Econometrics 58 71ndash120

Neyman J (1923) ldquoStatistical Problems in Agricultural Experimentsrdquo Sup-plement to the Journal of the Royal Statistical Society 2 107ndash180

Newey W (1984) ldquoA Method of Moments Interpretation of Sequential Esti-matorsrdquo Economic Letters 14 201ndash206

Roy A (1951) ldquoSome Thoughts on the Distribution of Earningsrdquo OxfordEconomic Papers 3 135ndash146

Rubin D (1978) ldquoBaysian Inference for Causal Effects The Role of Ran-domizationrdquo The Annals of Statistics 6 34ndash58

ReplyJoshua D Angrist

I thank the discussants and the session organizer outgoingJBES editor Jeff Wooldridge who will be missed by associateeditors and JBES authors alike The comments in this sessionoffer a variety of viewpoints some sympathetic others morecritical I am pleased to be part of this stimulating and con-structive exchange and look forward to more interaction ofthis type in the future

Hahn My article starts by suggesting that a focus ontreatment effects simpli es the problem of causal inferencein limited dependent variable (LDV) models with endoge-nous regressors Endogeneityrsquos half-sister is the assumptionof unobserved individual effects since the omitted-variablesmotivation for xed effects in panel data is similar Hahnasks whether a shift in focus from index parameters to aver-age causal effects might also simplify the notoriously dif -cult problem of working with nonlinear models for binarypanel data (eg as in Card and Sullivan 1988) Hahn showsthat here too the causal-effects framework pays off demon-strating that average causal effects are easily estimated in a xed-effects probit model with a binary regressor even thoughthe probit coef cient is not A second noteworthy featureof Hahnrsquos comment is his observation that identi cation inthis framework turns on careful modeling of the relationshipbetween the assignment variable and the xed effects Tradi-tional econometric models are primarily concerned with thestochastic process generating outcomes A shift of attentiontoward modeling the assignment mechanism is consistent with

the notion that observational studies should try to mimic thesort of controlled comparisons generated by experiments

Imbens Imbens begins by endorsing the view that the sub-stantive goals of empirical research are better served by thepotential outcomescausal framework than by structural mod-eling Going one step further Imbens notes that the problemswith ldquomodel- rst econometricsrdquo are not limited to LDVrsquos Inthe same spirit he sensibly wonders (as did Wooldridge 1992)why so much attention has been devoted to estimating theBoxndashCox transformation model which does not actually iden-tify E6Y mdashX7 without a distributional assumption Imbens alsonicely articulates the rationale for looking at the twins exam-ple and notes that this is similar to the rationale for extrap-olating from clinical trials in narrow subpopulations And infact I believe the ldquotwins experimentrdquo is likely to have pre-dictive value in a variety of situations though any extrapo-lation will probably be improved by control for demographiccharacteristics

In spite of (or perhaps because of) our years of discussionand collaboration Imbens and I still do not agree on certainthings He feels strongly that inference with discrete outcomesshould use models and procedures that respect inherent non-

copy 2001 American Statistical AssociationJournal of Business amp Economic Statistics

January 2001 Vol 19 No 1

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955

Page 27: New Estimation of Limited Dependent Variable Models With Dummy …piketty.pse.ens.fr/…ecoineg/articl/Angrist2001b.pdf · 2011. 12. 5. · sequences of changes in contraceptive technology

28 Journal of Business amp Economic Statistics January 2001

linearities The HIRZ procedure discussed in his comment isan intriguing way to do this and my article discusses othersI remain unconvinced however that nonlinearity is of majorsubstantive importance in empirical work with binary regres-sors The proof of the pudding I suppose would be to showthat H IRZ or some of the methods I discuss in the article leadto better predictions or decisions than the linear workhorse

Mof tt Mof t provides a valuable counterpointHe beginsby arguing that latent-index models have more predictivepower than reduced-form causal models Such claims seemhard to establish a priori though I do not doubt that in somecases this may be true But empirical strategies in which thebulk of the research effort goes into estimating index coef -cients may fail to answer the most basic question in empiricalwork What happened here Causal models and local aver-age treatment effect (LATE) answer this question while indexcoef cients alone are not suf cient for causal inference Iagree however that structural modeling can be useful forgoing beyond basic questions and I have used index and otherstructural models for this purpose Mof t is right howeverin suggesting that I nd the model-free analysis he refers toas ldquoexploratoryrdquo more interesting than structural extrapolationThis seems to me to be where the potential for real discoverylies

Mof tt asks tough but fair questions about the twins appli-cation He is correct to point out that twinning is not a manip-ulable policy instrument That is one reason I do not limit theanalysis to reduced-form effects of multiple births Yet to mymind the best evidence we have on the controversial questionof the consequences of out-of-wedlock childbearing comesfrom the twins experiment (Bronars and Grogger 1994) Andthe economic consequences of childbearing are of interest inmany other contexts As noted in the introduction to my arti-cle the twins experiment can help determine whether fertilityreductions are responsible for trend increases in female laborsupply Beyond this sort of question the twins results seemuseful for assessing the economic consequences of increasedabortion access in the United States the dissemination of con-traceptive implants in developing countries (eg see KaneFarr and Janowitz 1990) and various programs to encour-age or require implant use by welfare mothers (Haveman andWolfe 1993)

Finally it is fair to question the external validity of twinsestimates as Mof t does toward the end of his commentAngrist and Evans (1998) showed that the labor-supply con-sequences of twinning can be reconciled with estimates ofthe labor-supply consequences of childbearing resulting froma very different natural experiment induced by preferences formixed-sex sibling pairs This suggests that results from bothexperiments may be quite robust and may even be ldquostructuralrdquoin the sense that they tell us something general about the eco-nomic response to childbearing

Mullahy Mullahy brings a welcome health economistrsquosperspective to the discussion Health economists have longgrappled with the issues discussed in my article and asI hope I made clear I drew ideas and inspiration fromMullahyrsquos work on models with nonnegative dependent vari-ables Mullahyrsquos comment further highlights exactly the sort

of utilitarian trade-off that underlies the choice of empiricalstrategy On one hand we would like our models to be con-sistent with any functional form restrictions known to be trueOn the other attempts at this may complicate estimation andinference for little payoff Worst of all the technical complex-ity of nonlinear models seems to cause authors to ldquowax econo-metricrdquo an effort that may come at the expense of attention tosubstantive issues of importance (an example here is the longseries of papers discussing exactly how data from the RANDhealth-insurance experiment should be analyzed) Mullahyrsquosheretical discussions of ldquoone-part modelsrdquo here and in earlierwork represent a refreshing effort to keep the empirical trainon track Mullahyrsquos point that application-specic distributionordinates can provide more useful summary information thanquantiles is also clearly worth thinking about

Todd Todd provides a useful and clear overview of thecausal framework which she suggests is better suited formedical questions than economic questions The thrust of herargument is that there is too much heterogeneity in economicinterventions and outcomes for the causal approach to bevery useful for economists But my perception is that medicalresearch is even more explicitly concerned with heterogene-ity than economics This is evidenced by the many quali ca-tions regarding patient characteristics conditions and inter-actions in medical treatment protocols A difference betweenthe prevailing medical research paradigm and the traditionaleconometric perspective however is whether the question ofheterogeneous response is to be addressed by better theoryor more evidence Of course theory helps us decide whereto look for heterogeneity but as Rosenbaum (1999 p 301)put it ldquo If a treatment has substantially different effects in dif-ferent types of people then we need to discover this (ital-ics mine) Rosenbaum also noted that ldquorandomized clinicaltrials routinely look for heterogeneous treatment effects andthey do this without representative samples from nationalpopulationsrdquo

On the technical side Todd suggests I advocated use ofthe two-part model because it exibly approximates nonlinearfunctional forms for nonnegative outcomes Although I notedthe exibility the upshot of my discussion was that the two-part model is poorly suited to causal inference because of thedif culty of interpreting part 2 Finally Todd points out thefailure to allow for sampling variance in the scaling factorsused to generate effects for nonlinear models This is indeed ahard-to-defend shortcut though I shall take a stab at defenseby noting that randomness of the scaling factor which typi-cally estimates a sample average is likely to be a minor sourceof sampling uncertainty

ADDITIONAL REFERENCES

Card D E and Sullivan D (1988) ldquoEffects of Subsidized Training onMovements In and Out of Employmentrdquo Econometrica 56 497ndash530

Haveman R and Wolfe B (1993) ldquoChildrenrsquos Prospects and ChildrenrsquosPolicyrdquo Journal of Economic Perspectives 7 153ndash174

Kane T Farr G and Janowitz B (1990) ldquo Initial Acceptability of Con-traceptive Implants in Four Developing Countriesrdquo International FamilyPlanning Perspectives 16 49ndash54

Rosenbaum P R (1999) ldquoChoice as an Alternative to Control in Observa-tional Studiesrdquo Statistical Science 14 259ndash304

Wooldridge J (1992) ldquoSome Alternatives to the BoxndashCox RegressionModelrdquo International Economic Review 33 935ndash955