economics 7427: econometric methods ii - carlospitta.com info/econometric methods 2... · economics...

103
Economics 7427: Econometric Methods II University of Florida Department of Economics Spring 2007 Professor Sarah Hamersma 304 Matherly Hall 846-1988 [email protected] Class Time: Tuesdays and Thursdays, 11:45-1:40 (periods 5-6) Classroom: MAT 51 Office Hours: Tuesdays and Thursdays 2:30 - 4:00 or by appointment Credits: 3 Course description and objectives: The purpose of this course is to help prepare you for empirical research by investigating the use of a variety of econometric tools and practicing their implementation. It is critically important that we learn to do this well, because when we “get it wrong” we may make flawed policy recommendations or, more generally, profess that we have found “truth” that is not true. This course will be different from your past econometrics courses in several key ways. We will: 1) spend little time discussing statistical properties of estimators (this is not because they are unimportant – never! – but because the necessary knowledge will be assumed) 2) read primarily academic articles that exemplify the use of techniques you have learned about in the past (your textbook will be used only as a reference) 3) practice several important aspects of being an empirical researcher, including estimation using publicly-available data, class discussion/critique of empirical research, writing a referee report on an empirical paper, and presenting an empirical paper 4) address some of the practical issues of dealing with imperfect data 5) discuss and practice some non-parametric and semi-parametric approaches that are becoming more common in applied work 6) look very carefully at the assumptions made using each method, recognizing that the appropriate use of a method is fundamentally based on the assumptions we must make to use it If we think of econometric methods as tools to do applied research, the goal of this course is to help you learn to choose the right tools and use them appropriately. To stretch the analogy, I hope the course will help you avoid situations where you are trying to pound a nail into a wall using the back end of a screwdriver – because that’s hard to do and things don’t always turn out so well. 1

Upload: ngokhanh

Post on 26-Jun-2018

245 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Economics 7427: Econometric Methods II University of Florida

Department of Economics

Spring 2007

Professor Sarah Hamersma 304 Matherly Hall

846-1988 [email protected]

Class Time: Tuesdays and Thursdays, 11:45-1:40 (periods 5-6) Classroom: MAT 51 Office Hours: Tuesdays and Thursdays 2:30 - 4:00 or by appointment Credits: 3 Course description and objectives: The purpose of this course is to help prepare you for empirical research by investigating the use of a variety of econometric tools and practicing their implementation. It is critically important that we learn to do this well, because when we “get it wrong” we may make flawed policy recommendations or, more generally, profess that we have found “truth” that is not true. This course will be different from your past econometrics courses in several key ways. We will:

1) spend little time discussing statistical properties of estimators (this is not because they are unimportant – never! – but because the necessary knowledge will be assumed)

2) read primarily academic articles that exemplify the use of techniques you have learned about in the past (your textbook will be used only as a reference)

3) practice several important aspects of being an empirical researcher, including estimation using publicly-available data, class discussion/critique of empirical research, writing a referee report on an empirical paper, and presenting an empirical paper

4) address some of the practical issues of dealing with imperfect data 5) discuss and practice some non-parametric and semi-parametric approaches

that are becoming more common in applied work 6) look very carefully at the assumptions made using each method, recognizing

that the appropriate use of a method is fundamentally based on the assumptions we must make to use it

If we think of econometric methods as tools to do applied research, the goal of this course is to help you learn to choose the right tools and use them appropriately. To stretch the analogy, I hope the course will help you avoid situations where you are trying to pound a nail into a wall using the back end of a screwdriver – because that’s hard to do and things don’t always turn out so well.

1

Page 2: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Prerequisites: The official prerequisites for this class are ECO 7424 (2nd semester of first year graduate econometrics) or AEB 6571. Those who have taken ECO 7426 will likely have an easier time in the course than others since they will be more familiar with the estimators we will discuss - we will spend only limited time going over the characteristics of various estimators. However, with the textbook as a reference, the class should be accessible to those without ECO 7426. Textbook & Readings: Most of the reading for this course will be in the form of academic articles (listed at the end of the syllabus). However, you will probably want to have the econometrics textbooks “Econometric Analysis of Cross Section and Panel Data” by Jeffrey M. Wooldridge (MIT Press, 2002) and/or “Microeconometrics” by Cameron and Trivedi for reference throughout the course. Evaluation: Your grade in this course will be determined by your performance on problem sets, your participation in a mock (practice) conference, a written referee report on a paper presented at the UF Economics Seminar, and your performance on a final exam. Incompletes: No incomplete grades will be given except under extraordinary circumstances. Calendar of Important Dates:

Tuesday Thursday January 9 first day of class 11 16 18 23 25 30 1 February 6 8 13 15 mock conference paper chosen 20 22 27 1 March 6 8 13 S---P---R---I---N---G 15 B---R---E---A---K 20 22 27 29 April 3 mock conference – day 1 5 mock conference – day 2 10 dinner/makeup class@ 6:30 12 referee report due 17 19 24 final exam (in class) ---

2

Page 3: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Reading List for Applied Econometrics (ECO 7427) (* indicates we will discuss in class) A few must-reads: Hamermesh, Daniel. "A Young Economist’s Guide to Professional Etiquette," Journal of Economic Perspectives, Winter 1992 J. Angrist and A. Krueger, “Empirical Strategies in Labor Economics,” chapter in volume III of The Handbook of Labor Economics (1999). Journal of Economic Perspectives, Vol 15, No 4, Fall 2001 (Symposium: Econometric Methods) Fundamentals (1 class period) *J. Bradford DeLong and Kevin Lang (1992), "Are All Economic Hypotheses False?" Journal of Political Economy 100:6 (December), pp. 1257-72. Ziliak, Stephen T. and Deirdre N. McCloskey “Size Matters: The Standard Error of Regressions in the American Economic Review” Econ Journal Watch Vol. 1 No. 2, August 2004 (331-358) *Glaeser, Edward L. “Researcher Incentives and Empirical Methods.” Harvard Institute of Economic Research, Discussion Paper 2122, October 2006. Difference-in-Differences (& Pitfalls) (1 week) *Gruber, Jonathan “The Incidence of Mandated Maternity Benefits,” American Economic Review, 84(3), June 1994, 622-641. Bitler, Marianne, Jonah Gelbach, and Hilary Hoynes “Welfare Reform and Health” NBER Working Paper # 10549 Aizer, Anna and Jeffrey Grogger. “Parental Medicaid Expansions and Health Insurance Coverage” NBER Working Paper # 9907 *Reiss, Peter C. and Matthew W. White. “Demand and Pricing in Electricity Markets: Evidence from San Diego During California’s Energy Crisis” NBER Working Paper # 9986 Bertrand, Duflo, and Mullainathan "How Much Should We Trust Differences-in-Differences Estimates?" Quarterly Journal of Economics, 2004, 119(1), pp. 249-75. Helland, Eric and Alexander Tabarrok. “Using Placebo Laws to Test ‘More Guns, Less Crime’”, Advances in Economic Analysis and Policy: Vol. 4, No. 1, Article 1. 2004. *Moulton, Brent R. "An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit," The Review of Economics and Statistics, MIT Press, vol. 72(2), pages 334-38, May 1990.

3

Page 4: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Instrumental Variables (1-1.5 weeks) Angrist, Joshua D. and Alan B. Krueger. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments." Journal of Economic Perspectives, Vol 15, No 4, Fall 2001. *Angrist, Joshua D. and Alan B. Krueger. “Does Compulsory School Attendance Affect Schooling and Earnings?” Quarterly Journal of Economics, Vol. 106(4) November 1991. *Bound, John, David A. Jaeger, and Regina M. Baker. 1995. “Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable is Weak.” Journal of the American Statistical Association 90:443-450. *William N. Evans and Diana Stech Lien, “Does Prenatal Care Improve Birth Outcomes? Evidence from the PAT Bus Strike,” Journal of Econometrics, forthcoming. Lutz, Byron F. “Taxation with Representation: The Incidence of Intergovernmental Grants in A Plebiscite Democracy” Working Paper, October, 2004 *Rosenzweig, Mark R. and Kenneth I. Wolpin. “Natural ‘Natural’ Experiments in Economics” Journal of Economic Literature, Vol. 38, December 2000 (827-874) Angrist, Joshua. “Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants” Econometrica, 1998, 66, (2), 249-288 Staiger, D. and Stock, JH (1997). "Instrumental variables regression with weak instruments," Econometrica, 65, 557-586. Angrist, Joshua, Alberto Abadie and Guido Imbens. “Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings. January 2002, Econometrica, Pg. 91-117. Regression Discontinuity (1 week) Black, Sandra E. 1999. "Do Better Schools Matter?: Parental Valuation of Elementary Education." Quarterly Journal of Economics. 114: 577-599. van der Klaauw “Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression-Discontinuity Approach” International Economic Review, Vol. 43, pp. 1249-1287, 2002 *Lemieux, Thomas and Kevin Milligan. Incentive Effects of Social Assistance: A Regression Discontinuity Approach” NBER Working Paper # 10541 Angrist, Joshua and Victor Lavy “Using Maimonides' Rule to Estimate the Effect of Class Size on Student Achievement. May 1999, Quarterly Journal of Economics, 533-575. Assessing the Performance of Nonexperimental Estimators (1 week) *LaLonde, Robert J. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data." The American Economic Review, 1986, 76(4), pp. 604-20.

4

Page 5: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Heckman, J., LaLonde, R., & Smith, J. (1999) "The Economics and Econometrics of Active Labor Market Programs" Handbook of Labor Economics , 3A, 1865-2097, North-Holland *Heckman, James J. and V. Joseph Hotz, 1989. Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training. Journal of The American Statistical Association, 84(408):862-874. Meyer, Bruce D. (1995): "Natural and Quasi- Experiments in Economics," Journal of Business & Economic Statistics 13, 151-162. Angrist, J. and Guido Imbens Identification and Estimation of Local Average Treatment Effects, 1994, Econometrica, Vol. 62, No. 2, 467-476 Angrist, J., G. W. Imbens and D. Rubin, (1996), “Identification of Causal Effects Using Instrumental Variables”, (with discussion) Journal of the American Statistical Association vol 91, no 434, 444-472. Richard Blundell & Lorraine Dearden & Barbara Sianesi, 2003. "Evaluating the impact of education on earnings in the UK: Models, methods and results from the NCDS," IFS Working Papers W03/20, Institute for Fiscal Studies. Nonparametric and Semiparametric Methods (2-2.5 weeks) Matching Estimation: *J. Angrist and A. Krueger, “Empirical Strategies in Labor Economics,” chapter in volume III of The Handbook of Labor Economics. *Dehejia, Rajeev H. & Sadek Wahba. 1999. “Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Programs.” Journal of the American Statistical Association. 94:1053-62. *Smith, Jeffrey and Petra Todd. “Does Matching Overcome Lalonde’s Critique of Nonexperimental Estimators?” Journal of Econometrics, 2004. (also read response by Dehejia and rejoinder by Smith and Todd) Abadie, A., and G. Imbens, (2001), “A Simple and Bias-corrected Matching Estimator for Average Treatment Effects”. Bootstrapping: *Brownstone, David and Robert Valletta. “The Bootstrap and Multiple Imputations: Harnessing Increased Computing Power for Improved Statistical Tests.” Journal of Economic Perspectives, Fall 2001, 129-142. Kernel Estimation: *DiNardo, John and Justin L. Tobias, “Nonparametric density and Regression Estimation” Journal of Economic Perspectives, Fall 2001, 11-28 *Levinsohn, James and Margaret McMillan. “Does Food Aid Harm the Poor? Household Evidence from Ethiopia” Manuscript, December 2004.

5

Page 6: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Quantile Regression: *Bitler, Marianne, Jonah Gelbach, and Hilary Hoynes. “What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments.” American Economic Review, 2006. Buchinsky, M. (1994): “Changes in the US Wage Structure 1963-1987: Application of Quantile Regression,” Econometrica, 62, 405—458. Data Issues (1-1.5 weeks) Attrition: Ziliak, James P., and Thomas J. Kniesner. 1998. "The Importance of Sample Attrition in Life Cycle Labor Supply Estimation." Journal of Human Resources 33(2):507-530. Measurement Error: *Hausman, J., “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left,” Journal of Economic Perspectives (Fall 2001). Bollinger, C. and Amitabh Chandra. “Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data.” forthcoming, Journal of Labor Economics *Ashenfelter, Orley and Alan Krueger. “Estimates of the Economic Return to Schooling from a New Sample of Twins.” American Economic Review 84 (Dec. 1994): 1157-1173 Missing Data and Imputation: *Bollinger, C. and Barry Hirsch. “Match Bias Due to Earnings Imputation: When Does It Matter?” Journal of Labor Economics, 24, 2006, pp. 483-519. Duration Models (1 week) Ham, John C & LaLonde, Robert J, 1996. "The Effect of Sample Selection and Initial Conditions in Duration Models: Evidence from Experimental Data on Training," Econometrica, Econometric Society, vol. 64(1), pages 175-205, January. *Meyer, B. D., W. K. Viscusi, and D. L. Durbin. 1995. Workers' compensation and injury duration: evidence from a natural experiment. American Economic Review 85: 322–340. Meyer, Bruce D. "Unemployment Insurance and Unemployment Spells" Econometrica, 58, July 1990, 757-782 *Kiefer, Nicholas M. “Economic Duration Data and Hazard Functions.” Journal of Economic Literature, 26, No. 2, pp. 646-679. June 1988 Count Data (1 class period) *Cameron, A. Colin. “Count Data Regression Made Simple” http://www.econ.ucdavis.edu/faculty/cameron/racd/simplepoisson.pdf

6

Page 7: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Hausman, Jerry A., Bronwyn Hall, and Zvi Griliches. "Econometric Models for Count Data with an Application to the Patents-R&D Relationship." Econometrica, Vol. 52, No. 4, pp.909-938, July 1984. *Mullahy, John. “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior” The Review of Economics and Statistics, Vol. LXXIX, No. 4 (November 1997) Basic Structural Estimation (w/ linearity assumptions) (1 class period) *Mankiw, N. Gregory, David Romer, and David Weil (1992) "A Contribution to the Empirics of Economic Growth" Quarterly Journal of Economics, 107: 407-38. *Elias Dinopoulos & Peter Thompson, 1999. "Reassessing the empirical validity of the human-capital augmented neoclassical growth model," Journal of Evolutionary Economics, vol. 9(1), pages 135-154. Discrete Choice Models (1 week) Michael B. Gordy, 1999. "Hedging Winner's Curse With Multiple Bids: Evidence From The Portuguese Treasury Bill Auction," The Review of Economics and Statistics, MIT Press, vol. 81(3), pages 448-465, August *Haan, Peter. “Much Ado About Nothing: Conditional Logit vs. Random Coefficient Models for Estimating Labor Supply Elasticities” Applied Economics Letters, Volume 13, Number 4, 15 March 2006, pp. 251-256(6). Nevo A., “A Practitioners Guide to Estimation of Random Coefficients Logit Models of Demand”, 2000, Journal of Economics & Management Strategy, 513-548 *Bresnahan and Reiss, "Entry and Competition in Concentrated Markets" Journal of Political Economy v99, n5 (Oct 1991):977-1009 Steven T. Berry, 1994. "Estimating Discrete-Choice Models of Product Differentiation," RAND Journal of Economics, RAND, vol. 25(2), pages 242-262, Summer. *Imai, Susumu, Hajime Katayama and Kala Krishna “Crime and Young Men: The Role of Arrest, Criminal Experience, and Heterogeneity” NBER Working Paper No. 12221, May 2006.

7

Page 8: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Mock Conference We will be creating a “conference” atmosphere for two days of class (April 3 and April 5) to give you a chance to practice presenting and critiquing academic studies orally with a very short time limit. This is how a lot of academic discourse occurs, so it is an important skill. Preparation Step # 1: Choose an empirical working paper that you find interesting (perhaps from NBER, but other sources are fine too). You are welcome to use a paper you yourself are writing – this is an opportunity to present it and get feedback. However, you do not need to present your own work. The presentation will be 10-15 minutes (depending on the number of students we have). Send me an email with the paper attached so that I can organize the papers into “conference sessions”. You should send it by February 15. Preparation Step # 2: I will organize the conference into sessions with (at least vaguely) related papers. I will then assign a “discussant” to every paper by early March. This means that along with presenting your chosen paper, you will give a critique of one of the papers chosen by a classmate. You should read the paper very carefully and prepare a concise set of helpful comments. This should include your thoughts on the main contributions of the paper and the things that should be improved. You will have 5-7 minutes. Preparation Step # 3: You should make Power Point or overhead projector slides for your presentations (both presenter and discussant). You will turn in a printed copy of these on the day of your presentations. I am happy to help you during my office hours if you want to discuss your presentation, the number of slides, etc. I recommend practicing both presentations before the “conference” so that you have a sense of the timing. I will keep time very strictly during the mock conference. Referee Report As part of this course, you need to attend at least one empirical seminar in the Department of Economics. The seminars are typically at 10:30 on Friday mornings, but a detailed schedule is available on the department website. You will read the paper prior to the seminar, attend the seminar, and then write a referee report as if the paper had been submitted to a journal. I will give you a sample of a referee report and will provide the criteria I use for grading. The report is due April 12. This will allow me enough time to grade and return the reports before the final exam.

8

Page 9: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/24/05 Notes/Summary of

“Does Compulsory School Attendance Affect Schooling and Earnings?” Angrist and Krueger, 1991 QJE

The point: We have trouble understanding the effects of education on earnings because we are concerned that in the typical wage equation (with education as a regressor) there are omitted variables correlated with both education and wages. One way to get around this problem is to identify an instrument that predicts schooling but does not (independently) affect wages. Angrist and Krueger argue that they have found such an instrument: the quarter of one’s birth is correlated with completed schooling due to the institutional rules of school start ages and compulsory school attendance rules; however, there is no reason to think it will affect wages on its own, since it is “random” enough to be considered exogenous to most employment-related outcomes. What is the outcome of interest? Wages What is the parameter of interest? The coefficient on education in the structural equation What is the instrument? Quarter of birth (and various interactions with it) Note that this doesn’t exactly seem to line up with the title – I’m not sure why. They do a DD estimation procedure in the middle of the paper that lines up with the title, but it is not the focus of the paper. Structure of the Paper:

I. Season of Birth, Compulsory Schooling, and Years of Education A. Direct Evidence on the Effect of Compulsory Schooling Laws B. Why Do Compulsory Schooling Laws Work?

II. Estimating the Return to Education A. TSLS Estimates B. Allowing the Seasonal Pattern in Education to Vary by State of Birth C. Estimates for Black Men

III. Other Possible Effects of Season of Birth IV. Conclusion

I think this paper is organized very nicely. Let’s look at one section at a time.

Season of Birth, Compulsory Schooling, and Years of Education The authors first spend some time convincing us that season of birth seems to be related to education. I found their evidence quite convincing. They then try to explain why this would be the case, based on school-start-age rules and compulsory-schooling rules.

Page 10: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

A. Direct Evidence on the Effect of Compulsory Schooling Laws Why do they need to provide direct evidence on this? I think partly because they find it independently interesting (alongside the returns to schooling question). But also consider the following: Suppose compulsory schooling didn’t affect anyone (no one wanted to drop out anyway). What will our season-of-birth indicator predict about schooling? Nothing. Suppose compulsory schooling laws were like the following: one must remain in school until the completion of the 10th grade. (This is instead of using an age requirement). How does this affect the predictive power of season-of-birth indicators? They would be ruined, because now the time between starting school and being allowed to drop out would be standard across all students, regardless of their season of birth. This section of the paper is primarily about establishing the instrument as being correlated with the endogenous variable that we hope to represent with it. They go to great lengths here to convince us that the mechanism of the effect of season of birth on education is an artifact of the educational system’s unique rules. The argument for the instrument being uncorrelated with the error term in the structural equation comes later, although even at this point it seems that the season of one’s birth could hardly be expected to affect much of anything without these school rules. The authors can show that compulsory schooling laws are binding, at least for earlier cohorts. This means they are not surprised that season of birth has explanatory power in predicting years of education. They also check their work by looking at effects on other measures of education – particularly those that should not be affected by compulsory schooling laws (such as college graduation). This is a good idea. See Table 1.

Estimating the Return to Education They first plot a graph of season of birth against earnings. (Figure V) What do they find helpful about this graph, in terms of making their case? - looks like 1st quarter people have lower earnings What do they find difficult about this graph, and how do they solve the problem?

- the life cycle changes in earnings can confound things for them - they just keep the people in the flat part of the age-earnings profile

Their first cut Wald estimator (this is IV in its simplest form) is in Table III Wage difference (1st births vs. everyone else): - .009 (0.9%) lower wages Education difference (1st births vs. everyone else): .126 fewer years Wald estimator: .009/.126 = .072 lower wages per year of education ( so the return to education is about 7% per year)

Page 11: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

To accommodate age-related trends, they go on to 2-stage-least-squares estimates.

A. TSLS Estimates Their setup looks just like what we did in class (see equations 1 and 2 in their paper). Note that they are now using the variation across all 4 quarters (not just quarter 1 vs. every other quarter) to identify their estimates. See Table IV for results. They don’t find much distinction between OLS and IV estimates. They also try controlling for states differences in the effect of season of birth on education by adding a bunch of season-state interactions to the first stage. Doesn’t make a huge difference, though now TLS is statistically significantly different from OLS (8.3% instead of 6.3%). They also try some subsets to see if their findings are robust – they are.

Other Possible Effects of Season of Birth Now they get to their defense of the exogeneity of their instrument with respect to the structural equation. Psychology: Kids who start later are more mature and perform better in school. Angrist and Krueger’s reply: This would imply better outcomes for them, but the effect of compulsory schooling (which lowers their avg education relative to their peers) is negative. This means the AK estimator could be biased downward (real returns to ed are higher, because those with higher ed were younger for their grade, thus face an extra hurdle to obtain higher earnings).

- note that Bound, Jaeger, and Baker are going to be more concerned about this kind of thing

Lam and Miron [1987] find evidence that quarter of birth is unrelated to parents’ socio-economic status (SES). Specification tests: Finally, they also do a simple “exogeneity” test – they do OLS with education in it and they add dummies for season of birth. If they do anything, we might worry that they have independent explanatory power. They don’t. They also run a spec test with only college graduates. They point out that season of birth should not matter for wages with these guys, since their education was unaffected by compulsory schooling laws. They leave education out of the OLS regression and just use year and season-of-birth dummies. This is another way of seeing if season of birth has an

Page 12: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

independent effect. It doesn’t. “We take this as strong evidence that, in the absence of compulsory schooling, season of birth would have no effect on earnings.” Issues/Concerns: We will discuss most of them in Bound, Jaeger, and Baker (1995) Let me also note, though, that this seems to be a case where we should think of the results as a local average treatment effect (LATE). We are looking at the effects of additional schooling on earnings for those who were affected by compulsory schooling laws (many students do not find them binding). Keep this in mind.

Page 13: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/9/05

“Match Bias from Earnings Imputations in the CPS: The Case of Imperfect Matching” by Chris Bollinger and Barry Hirsch (draft 2/15/05)

The Point: A lot of survey data has significant levels of nonresponse for certain questions (like those related to income). The Bureau of the Census uses imputations, based on other observations with available data on income, to fill in the missing data. This paper explores the consequences of using these data (including the imputed values) for the dependent variable in a regression. The general result is that coefficients on variables that were not taken into account in assigning the imputed value will be severely underestimated (attenuated toward zero, as with measurement error). This paper suggests ways of correcting for this bias. Outline:

1) Intro 2) Census Imputation Methods for CPS 3) Imputation Match Bias

a) general b) non-match c) imperfect match on multiple categories d) imperfect match on ordinal variables

4) Additional Imputation Issues: Longitudinal Analysis and Dated Donors 5) Conclusion

Intro & Census Hot-Deck Methods: 1) The current level of imputation of earnings in the CPS has risen in recent years to about 30%. This should make us very, very uncomfortable! Who are these people? Are they randomly selected (probably not)? Is there any way for us to use them in an earnings regression, or do we need to throw them out? To try to help us, the Census Bureau creates imputed earnings values for this missing data. “The appeal of imputation is that it allows data users to retain the full sample of individuals which, with application of appropriate weights, blows up the sample into the full population.” **************************** A sidebar on population weights: Data from the CPS or Census (or many other survey data sets) requires proper weights in order to be used properly. The concept of these population weights is simple: each observation should receive a weight equal to the number of people like them in the population we are studying. For instance, if our population contained 20 men and 10 women, and our sample

1

Page 14: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

contained 5 men and 5 women, the men would each get a population weight of 4 and the women would each get a population weight of 2. If we were to pull a perfectly representative sample from the population, all of the weights would be the same (Npop/Nsamp) but this is seldom the case. There are at least two reasons for this.

a) Samples are sometimes chosen to be intentionally non-representative to boost our ability to do inference on small groups in the population. For instance, blacks and Hispanics are often oversampled relative to their proportion in the population so that we can have more precision in estimates within these groups than we would otherwise have.

b) After a survey is completed, there is some analysis of the response rate and estimation of how the weight of each observation should be altered (for instance, if half of the economists in a CPS survey did not respond, we should count the ones that did respond as two each (roughly)). I believe the CPS also uses the most recent Census to try to see how to weight their sample to the population level.

Happily for us, Stata has the capability to take these weights into account. In the data, you just need to find the weight variable (it’s just like any other variable, with a value for each observation) and then add the weighting command to the end of your estimation command. Stata will take care of the weight for the estimates and will get the standard errors right. There are several types of weights supported by Stata – the ones I have described are just one of them (you can look at Stata help for “weights” for details). ********************************** 2) The Census uses a “cell hot-deck” method to impute earnings for people with missing earnings data. It makes sense on a basic level: find someone who looks like the problem observation on several other dimensions, and use them as a “donor,” plugging in their earnings for the person with missing earnings. See Table 1 for the “other dimensions” used for CPS. How is the donor chosen for CPS? For each possible cell, there is always one donor filling the slot. As they go through the data file, which is sorted by geographic location, they replace the donor each time they get another person with the same characteristics (and without missing earnings data). Similarly, each time they encounter an observation with missing earnings data, they use the donor that is “in stock” to impute earnings. This means that the imputations are hopefully associated with donors of similar geographic background. However, given the number of different cells, there is often no one in the same cell who is very close in the data file (i.e. geographically). Moreover, there may be no one in the current time period who is in a given cell, so the last available person from a previous time period is used as a donor (83% of nonrespondents have donors in a different month). If the donor is from a previous month, there is no link between their location and that of the nonrespondent. (Ouch). Tradeoff: it is easier to find more nearby matches if there are fewer cells – but this lowers the quality of the matches.

2

Page 15: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Imputation Match Bias: 3a) The general approach to understanding match bias is to analytically derive an expression for the amount of bias that mismatches (or imperfect matches) will generate. Let y be earnings, Z be regressors of interest, and X be the categories used for making the imputation matches. (Note that X and Z can overlap, or even be the same). There are three assumptions of the method they develop:

1) there’s no selection into or out of the respondent group based on Y itself (though it can be based on other X’s – the assumption is just that conditional on a given set of X’s, the observed and unobserved groups have the same average Y).

2) we know the relationship between X and Z (i.e. we know which elements of Z were used to determine X, and how they were used – for example, educ may be in Z and may have been recoded into broader categories to generate one variable in X).

3) The conditional expectation of Y is linear in Z. Note that if X = Z, there will be no bias (all relevant variables were taken into account in the hot-deck process). The authors present a general formula for the bias, but we’ll just look at it conceptually: b = β – p[ Cov(Z, (Z-E(Z|X)hat) / Var(Z)]β where p is the probability of having missing earnings (estimated by the fraction missing earnings data in the sample). The whole term can be thought of as “the variation in Z that is not accounted for by the match variables X.” If Z is highly correlated with some match variable X, then there might not be a lot of unexplained variance in Z, thus not too much match bias. (There is brief mention of SEs here, but it’s not a focus.) b) The authors then go on to provide more details on several particular cases of interest. For the case with just one regressor in Z2 that is not in X, they show that the amount of bias depends on the ratio of (1) unexplained variation in Z2 (with X as explanatory vars) for the missing population to (2) the same unexplained variance in the full population. In section 3.3.2, “Evidence,” Bollinger and Hirsch use the CPS to see how their bias-correction plays out (i.e. whether it makes a difference). The difference is huge! See Table 2. Some features of these results:

1) about 25% bias for many of the coefficients if the full sample w/ imputations is used 2) this is quite similar to the rate of nonresponse (in other words, the ratio of covariance

to variance is nearly 1, so that the bias is just p*β) 3) the estimates with bias-correction are remarkably similar to those in which only the

responders are used in the sample (nonresponders are thrown out entirely). They are almost always a little larger, but differences are small.

3

Page 16: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

What should we do? (p. 11 has a wonderful paragraph on this) We can use the bias correction, or just drop the nonrespondents. The WRONG thing to do is to use the imputed observations and fail to make a bias correction (column 1 of Table 2). The reason the other two are slightly different from each other could be because:

a) our assumption of conditional missing at random may be violated b) different groups have different values of the parameter of interest (heterogeneous

treatment effects – our equation is thus misspecified if it forces an average effect to be estimated)

Note: there is not a discussion of the different SEs that may result from each of the two good strategies. Perhaps this would help us decide which is better (?). The only mention of this issue seems to appear in the last part of the Appendix. Imperfect Match Bias (3c and d): Type 1: Categorical variables that are matched at a level more aggregated than the included regressors If we put in the full set of education dummies when matches were based on aggregating into 2 categories, the coefficients on the dummies for the lower education levels (which were assigned a 0 for the match) will be underestimated, while the others will be biased in an uncertain direction. The example used is returns to schooling. See Tables 1A and 1B – fascinating! Within each large education category, returns to ed are overestimated for the lowest ed and overestimated for the highest ed – this makes the relationship overly flat within groups and the differences between groups are then overestimated. It makes sense that this would happen, given that the imputed people are receiving, on average, the wages of a donor with the average ed level in each group. Type 2: Ordinal variables that are matched at a level more aggregated than the included regressors The example here is age. The slope coefficient again becomes attenuated (flattened) relative to what it should be when age is used as a set of categories rather than as a continuous variable. Same type of problem is clear in Tables 2A and 2B. Additional Imputation Issues: Longitudinal Analysis and Dated Donors (4) Longitudinal analysis: Imputation causes a sort of hidden problem when the researcher uses multiple observations of the same respondent over time, where earnings have been imputed in one or more time periods. The essence of the problem is that when examining changes over time, one will NOT actually be “netting out” the person fixed-effect as one would hope. For example, consider the variable of

4

Page 17: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

interest (which must change over time in order to be identified); the fact that a given person is a “switcher” on that X is not taken into account when their value of Y is imputed (even if matching is done conditional on X!). This is very problematic. It will bias the results toward the cross-sectional results where unobserved heterogeneity may exist. Fortunately, most researchers toss out the imputed people when using the data longitudinally, so that is good. Dated donors: It’s problematic, but the effect isn’t too bad. It is not of first-order concern. Conclusion:

1) Bias is of first-order concern when estimating wage gaps. Two options to fix it: a. Use bias-corrected estimates b. Use only respondent sample - note that bias is approximately equal to the proportion of earners missing data

2) Imperfect matching also causes problems, flattening estimated relationships between the

imperfectly-matched variables and wage outcomes 3) Longitudinal analysis can involve severe biases because fixed-effects doesn’t really

work. Dated donors, however, don’t seem to be much of a problem. Two important issues left to the reader (or maybe their future work?): Standard errors and sampling weights (see Appendix 1.7 and footnote 23)

5

Page 18: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/23/05 Summary/Notes on

“The Bootstrap and Multiple Imputations: Harnessing Increased Computing Power for Improved Statistical Tests”

by David Brownstone and Robert Valletta

The Point There are ways of using data in computer-intensive ways to better understand underlying features of our data and to avoid making assumptions about its distribution. Two important methods in this field are the bootstrap and multiple imputations. (I will focus on bootstrap in the lecture, both because I am more familiar with it and it seems to be more widely used). The Bootstrap: An introduction The name “bootstrap” refers to an old idiom “to pull yourself up by your own bootstraps,” that is to succeed using only your own efforts and resources. In some ways, it doesn’t make sense (we can’t actually do it – one foot will always stay on the ground!) but part of the concept is that we can do more than we think with what we have. Supposed we wanted to estimate the sampling distribution of the mean height in some country (ex. standard errors on the mean). One way to do this is to just use a sample mean and sample variance/N. Our accuracy will depend on our sample size relative to the population and upon the sampling distribution (which defines how the sample mean would vary across a large number of indp’t samples from the same population). This is not a bad idea! However, if our sample isn’t large enough, our usual use of the Central Limit Theorem (CLT) to defend these might be flawed (i.e. we assume the sampling distribution can be approximated by a normal distribution, which is true if samples are large enough but isn’t guaranteed for smaller samples). An alternative way of estimating the mean is through the use of bootstrap sampling. Here is how it works:

a) put all of your N observations in a bucket b) draw from the bucket “N” times, with replacement (this will give you a “new”

sample, the same size as your original, which will have some observations from the old sample multiple times, and some none at all)

c) calculate the sample mean (and save it) d) do this a whole bunch of times (say, 100 or 1000) e) put all of the sample means together. This will allow you to look at the

distribution of the sample means for various subsampling from your data. You can take the mean of these means, and that’s an estimate of the population mean. Similarly, if you look at the distribution of the means, you will get some idea of the standard error.

Page 19: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

When will this work well? When the original sample is large enough to accurately reflect the true population (because we are essentially treating it as if it were the population in developing estimates of the sample mean and its standard error). What’s so great about this? 1) Note that the sample-size threshold is lower than the CLT threshold, so it works more often. 2) Suppose we do the same thing in a regression framework. We’re interested in a particular coefficient. We can resample the data and run the regression over and over again; then we can average our coefficient estimates to get an estimate of the population coefficient. Similarly, we can look at the distribution to get standard errors. This will give us accurate estimates even in the presence of heteroskedasticity. 3) It can be used as a diagnostic tool. If results are a lot different with bootstrapping than with usual large-sample theory, one or both of the estimates is not reliable for the problem and data at hand. This is an area of continuing research. Application See Table 1. The authors demonstrate different methods of generating a confidence interval around regression coefficients. For the purpose of the exercise, the authors assume that the PSID data set is a population of interest, and that we draw a sample of 50 from that distribution. Given our assumption that the PSID is the population (not a sample) we can calculate the “true” confidence intervals by drawing lots (1000) of bootstrap samples of n=50 from the PSID larger data and calculating the means and std. errors for the coefficients. The remaining columns provide alternative estimates that do not allow use of anything more than the 50 observations we have. Note: The bootstrap confidence interval here uses the “percentile method” which allows the confidence interval to be asymmetric if the distribution is skewed. Note that this has uses beyond OLS supplements. It can be used to calculate std errors for quantile regression, for instance. Or std errors for matching estimates of treatment effects. The principle is always the same – resample from your own data and run the estimates, repeat this many times, and then look at the mean and spread of the estimates of interest. [Note that reliability and accuracy of bootstrap hasn’t been officially shown in the case of matching estimation yet, but it’s not clear that there’s any better option for now.]

Page 20: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/25/05 Notes/Summary of:

Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable is Weak

By Bound, Jaeger, and Baker (1995)

The Point: Using IVs with little correlation to the endogenous variable they are supposed to represent creates two big problems. The first is that it makes it very easy to get inconsistent estimates. The second is that it also makes it easy to get biased estimates, and the more biased they are the closer to the (intentionally avoided) OLS results you will get. # 1: Inconsistency Recall that consistency of an estimator has to do with whether the estimator will gradually get closer to the true value as the sample size gets larger. Note the probability limits of the OLS estimator and the IV estimator, as shown in the paper. The reason we’re using IV is because we think it will do a better job here. The problem is easiest to see if you think of the single-instrument case. We talked before about how the IV coefficient can be estimated by a fraction, with the numerator being the coefficient from an outcome regression that uses the instrument instead of the endogenous regressor and the denominator being the coefficient on the instrument in the first-stage regression. If that denominator is small and/or imprecisely estimated, there is risk of large effects on the IV coefficient. In this paper, the authors explain this in terms of R-squared, showing that a low R-squared has this effect. They show that even a small correlation between the instrument and the endogenous regressor can create HUGE problems if the instrument is weak (i.e. low partial R-squared). This can be even worse than OLS. # 2: Finite-Sample Bias Assume that there isn’t a problem with correlation b/w the instrument and the endogenous regressor. We still might have a serious problem with bias. The IV estimator is biased in finite samples, and this bias pushes it toward the OLS estimate. The bias is larger: (a) the smaller the sample and (b) the lower the R2 in the first stage. Why is there bias? We don’t know the exact parameter values in the first stage (we have to estimate them).

Page 21: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

An example: Suppose our Z is actually unrelated to endogenous X. We can still run a regression and we’ll get some coefficient anyway (though R2 will be low). We can put fitted values and find an IV estimate as always. But the relationship between X and Xhat (the fitted value) is somewhat arbitrary, since there’s no actual relationship in the population. The fitted will be somehow “close” to X, but whether it’s a little bigger or smaller is basically random. Imagine doing this with a bunch of samples from this same population where Z isn’t correlated with X. On average for these samples, X will be equal to its fitted values (whether it’s above or below for a given sample is simply random). This means that the average IV estimates will look just like the average result for OLS, because the IV are using the fitted X-value that are, on average, just equal to the old endogenous X. Angrist and Krueger Revisited Bound, Jaeger, and Baker argue that quarter of birth seems related to family income. Mean income is 2.38% lower for 1st quarter kids. Also, a 1% increase in fathers’ earnings is associated with a .014-year rise in education. So the parents’ income can explain about (2.38*.014)= .03 years less education. Angrist and Krueger found a difference between the 1st quarter and others of only about .10 years, so this would explain 1/3 of it. Eeek! Talk through tables of results.

Page 22: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/21/05

Intro to Count Data Models (based on Wooldridge Ch. 19 and Colin Cameron’s “Count Data Regression Made Simple”) What is Count Data and Why is it Special? Some features: always non-negative, typically some are zero, always integers (in other words, they are counting something!), no obvious upper bound Why can’t we just use OLS when we have count data for our dependent variable?

- potential for negative predicted values (just like with binary dependent variables) - common solution of taking logs is not useful when there are zeros (people them use ad-

hoc transformations like adding a small amount to every value, etc.) (more on this in a few minutes)

The Poisson Model The idea is that we want to estimate the conditional mean, but OLS isn’t going to work here. We instead assume the data have the Poisson distribution from statistics – this is a model of the probability of various counts, and is (at least eventually) decreasing in probability as counts get higher. This can be a convenient model, although it has some restrictions which we will mention. The nice thing about the Poisson assumption is that it has only one parameter – a conditional mean λ (unlike, say, the normal distribution, which has two parameters). If we model the conditional mean in a simple way – most commonly λ = exp(xβ) – we get the regression equation: E(yi|xi) = λi = exp(xi’β) = exp(β1 + β2x2i + β3x3i…) As noted above, we don’t use OLS here – we use maximum likelihood. Another way to write the model is as a log-linear: lnE(yi|xi) = lnλi = xi’β = β1 + β2x2i + β3x3i… An important note here: this is NOT the same thing as taking the log of the dependent variable and regressing it on the x’s. Why? Aside from the potential for negative predicted values, there is an important statistical reason that they are not the same thing: lnE(yi|xi) ≠ E(ln(yi|xi))

1

Page 23: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

We are trying to model the first one, and we would instead be modeling the second one. Stata can do these regressions with the command “poisson”. Interpreting coefficients: The interpretation of the coefficients is different from OLS. The coefficient reflects the % change in the mean caused by a one-unit change in the explanatory variable (i.e. a coefficient of .03 on x is interpreted as predicting a 3% increase in y per unit increase in x). If the RHS variable is in log form, the coefficient is an elasticity (a 1% change in x causes a (β*100)% change in y) After making appropriate variance adjustments (see next section), hypothesis testing works the same way as always, using t-statistics. Limitations of this model: There is only one parameter of the assumed distribution, and it turns out (we discussed this with duration models as well) that we are assuming: E(y|x) = Var(y|x) This isn’t very appealing in most contexts. Often data are “overdispersed” (σ2 > 1, below) or “underdispersed” (σ2 < 1, below) so that a weaker assumption is more likely to hold: σ2E(y|x) = Var(y|x) Fortunately, if this is the case, we just need to adjust our standard errors (after estimating σ2). It’s sort of like a heteroskedasticity correction (but usually a bigger deal). If we know our data are overdispersed, a better model is a negative binomial model. This model assumes that the data are generated by a negative binomial process rather than a Poisson process. This is a two-parameter distribution defined by a mean μ and variance μ + kμ2 , where k>= 0 is a dispersion parameter. It will be more efficient, and it allows one to predict probabilities as well as the mean. Stata has this command: “nbreg”.

2

Page 24: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/5/04 Summary/Notes on: De Long and Lang, “Are All Economic Hypotheses False?” JPE 1992 This is a catchy title, though a little over the top (I’ll explain later). The Point: This paper looks at those null hypotheses that are NOT rejected and estimates how many of them we would expect to be true. The authors’ point estimate is that none of the null hypotheses in their sample is true, and they can reject the hypothesis that more than 1/3 of them are true (at the 95% level). In a sense, they seem to be doing an inverted sort of hypothesis test, where are trying to think about the significance level regarding evidence for the null hypothesis, rather than looking at the significance of deviations from it. (It’s almost as if the null is treated as an alternative to be rejected or not rejected). They conclude that most of the null hypotheses that are unrejected are unlikely to be true due to publication bias: the only unrejected nulls that are published are those that are unexpected, i.e. where other work has suggested that the null SHOULD be rejected. These are cases in which, based on previous work, it is unlikely that the null is actually true, even though it was not rejected in one specific piece of work. It follows that most of these papers with a surprising non-rejection of the null do not reject it at high levels of significance (i.e. p is seldom very much higher than 0.1, and almost never more than 0.5). Note: I thought the title was a bit much, because (as the authors even note) much of hypothesis testing is being done to see if we can show that an economic model’s prediction of some relationship is actually correct. In a sense, the researcher’s choice of the null is not her “economic hypothesis” in any real sense – it is a way of stacking the cards against her finding so that if the finding can still be shown, it is more certain to be correct. Review of marginal significance level: “marginal significance level” = p-value If p-value is very low (ex. Less than 5%) we typically reject the null; if it is higher we fail to reject the null The p-value represents the probability of a TYPE I error – that is, the probability of accidentally rejecting the null even though it is true. In analysis, we get to set the level of the type-I error in our estimation. We often choose 5%, although this is completely arbitrary (and should be adjusted for sample size). Our choice is called the “size of the test”.

Page 25: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The other side of the coin – type II error – is not as nice. This is typically because we don’t have a specific alternative hypothesis (H1 = 5) and so thinking about the probability of accidentally failing to reject the null when the alternative is “true” is tricky – what does it mean that the alternative is “true”? The “power” of a test is a measure of how much it is able to avoid this type II error, but it’s not always clear how to estimate power. The authors conclude that oftentimes, we fail to reject the null because we don’t have enough power (not because the null is actually true). (My work on employment effects is a good example of this issue). The Estimation Strategy: Technical version: The authors use the distribution of the test statistic (p-value) under the null and alternative hypotheses to come up with the probability that a p-value will exceed the critical value chosen: P(f(a)>= p) = π(1-p) + (1-π)[1-G(p)] The first term is the distribution when the null is true (which happens π portion of the time) and the second term is when the alternative is true. This allows them to come up with a neat equation: π <= [P(f(a)>=p)] / [1-p] They are interested in knowing π (portion of true nulls) and they can estimate it for any given choice of the critical value p by

1) plugging in p in the denominator and 2) counting the fraction of reported p-values in the data that exceed p and plugging that into

the numerator Intuitive version: Suppose there are N true null hypotheses in our sample (remember, we’re sampling a bunch of studies). It should be that 10% of the time, our p-value will be > 0.9. In other words, 10% of the time we should be very far from rejecting the null. This means we could say that the number of true nulls is about 10 times the number of estimates with p-values > 0.9. Given that a few false nulls could sneak into the > 0.9 category, this calculation will actually give us an extra-high upper bound on the number of potentially true nulls. The Estimation Results: Note: It is not a surprise that many null hypotheses are false. The authors only focus on unrejected null hypotheses, to see what portion of THEM they estimate are true. (The overall

Page 26: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

portion of true nulls would be much smaller, since so many are basically straw men designed to be rejected). The authors therefore have to change the 10-times calculation to a 9-times calculation, because they are eliminating those with p-values between 0 and 0.1 (those are the ones where the null was rejected, and we aren’t using them in the sample). The data are from AER, Ecta, JPE, QJE, and REStat. They found 276 main hypothesis tests, 78 of which left the null unrejected. Their table is the following. The top is the key:

p-value Number of hypothesis tests Est. upper-bound on % true nulls 1.0-.9 0 0 .9-.8 4 23 .8-.7 7 42 .7-.6 7 52 .6-.5 6 54 .5-.4 11 66 .4-.1 (condensed) 43 (can’t condense)

Details of calculation, using the example of the second line: 2/9 of true nulls should fall in the range .8 to 1, so multiply the number in this category (4) by 9/2 (= 4.5) to get the upper bound on the number of true nulls. It is 18. Divide by the total number of unrejected nulls in the sample (78) to get 23 %. They also do a different test, using a sort of flipped-around 95% significance level: they find that there is only a 5% chance of getting that zero at the top of the middle column above if the true percent of nulls that are true is about one-third. So they can reject, at the 95% level, the hypothesis that more than one-third of the unrejected nulls are true. Conclusions: This paper argues that authors face a catch-22 (a no-win, circular situation): if they are able to get a paper published with a non-rejected null, it is most likely because it is in contradiction with other work that rejected that null – which suggests that it’s pretty likely the null is false. The key question should not really be “is the null true or false” so much as “how closely does the null approximate reality, as far as we can judge?”. This is an issue of magnitudes. A null may be rejected but the magnitude of the effect of the variable may not be much different from that which was assumed under the null (recall the study of retirement age, where a positive wealth shock caused a statistically significant change in the time of retirement, but the effect was only to make it happen a few hours sooner). This is one reason that the authors (along with Ziliak and McCloskey, and others) argue that confidence intervals are more useful. Does the confidence interval eliminate anything interesting? This is a question worth knowing the answer to.

Page 27: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/11/05

Difference-in-Differences Notes 1. Angrist and Krueger on Difference-in-Differences (in Handbook of Labor) They use the Card (1990) Mariel Boatlift paper to describe most of the issues. The key assumption is that interaction term is zero in the absence of the intervention. DD is one attempt to get things into a treatment and control framework

- we want to know what the effects of a policy are, maybe in particular on those who are subject to it (avg. treatment on treated, or E(Y1-Y0| D = 1))

- we don’t have data on exactly what would have happened to people affected by the policy if it hadn’t happened (we can’t get E(Y0| D = 1) – it’s sort of a missing data problem)

- we try to find a way to approximate this by using other people somehow, whose outcomes are good estimates for the ones we can’t observe (though we’ll never know how good a match they are – this is where one must make an argument)

One way to at least check if the DD assumption should be ruled out is to look at the previous trajectories of the populations you are comparing. (talk about Card’s paper) DD is an assumption on the conditional mean - Card is looking at prob. of employment:

E[Y0i | c(ity), t(ime)] = βt + γc (time effect, city effect)

With the Mariel boatlift (immigration), he thinks the probability of empl. may change:

E[Y1i | c, t] = βt + γc + δ

We are interested in δ. A logical way to do this is just with comparisons of means across our four relevant groups: First the Miami group: {E[Yi | City = Miami, time = after] - E[Yi | City = Miami, time = before]} =

(βafter + γmiami + δ) – (βbefore + γmiami) = βafter – βbefore + δ Then the comparison group: {E[Yi | City = comparison, time = after] - E[Yi | City = comparison, time = before]} = (βafter + γcomparison) – (βbefore + γcomparison) = βafter – βbefore Taking the difference gives us the δ we are looking for.

Page 28: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

You might notice that this is the same as the following regression:

Yi = βt + γc + δ*Mi + εi, where Mi = 1 only for people in Miami in the after period (indicator for treatment) and E(εi | c, t) = 0 This can be expanded to include other variables that may determine employment – this gives us our usual regression equation with an interaction term. 2. Other comments: Depending on the time structure, a DD estimate is like a fixed-effect estimator, where the fixed-effect is a single level for each group (not for each individual). DD is good for use with repeated cross-sections. You can only do individual fixed-effects if you have a panel. (this is generally what you’d prefer, if you have the option)

- in that case, you essentially have a dummy for every person DD important assumption: time effects must be common across experimentals and controls

- in other words, both groups need to be on the same trajectory If people switch between relevant treatment groups over time, you have a mess (ex. a paper trying to distinguish the effects of a tax cut on rich and poor people needs people to stay in the same category before and after the cut, but this isn’t always the case – you can identify where the bias will be). Re: Gruber’s Maternity paper specifically: If everyone has coverage prior to the law changes, it’s not interesting (he shows us that this isn’t true) If the mandate isn’t costly, it’s also not interesting (he shows us that it is)

Page 29: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/17/05 Summary of

“Nonparametric Density and Regression Estimation” by John DiNardo and Justin L. Tobias

The Point: Applied researchers would have a lot to gain by adding nonparametric techniques to their toolboxes. It’s not really that complicated and offers some nice advantages to linear regression. They describe nonparametric density and regression estimation methods. Why don’t economists use these more?

a) they think the parametric models are good enough (they are, for some applications) b) they don’t have the computing power (this is no longer an issue)

Density Estimation The idea is to let the data tell us exactly what the distribution looks like. This could be very important if the data are not roughly “normal” in shape, because we often think in terms of a mean and standard deviation as sufficiently informative statistics about the distribution of a variable. A nonparametric density estimator might give us information about specific oddities in the distribution (see their example about wages, where minimum wage influences the density). The most basic nonparametric density estimator is the histogram. What makes the histogram so nice? It could pick up these oddities in the distribution. What makes it not so nice? Its success in doing so depends greatly on bin size and choice of midpoints at which to evaluate the density. What makes a kernel density estimator better?

a) bins are allowed to overlap b) diminishing weight is put on points as they move further from the point of interest

Stata command: kdensity Two decisions to be made with a kernel density estimator: bandwidth and kernel Bandwidth: determines size of neighborhood to be used around the point you are estimating Kernel: the weight-assigning function to be used in creating estimates General form of a kernel density estimator:

00

1( ) (1/ )

Ni

i

x xpdf x Nh Kh=

−⎛ ⎞= ⎜ ⎟⎝ ⎠

1

Page 30: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The kernel will give the most weight to xi’s that are near the point of interest. If h is large, it will give a lot of weight to a lot of points (because the distance will seem smaller – if h = 5, then a distance of “1” is treated as “1/5”). That’s why we need to divide through by h afterward. Some sample kernels are given in the article (Table 1). The things they have in common are decreasing weights with distance, and the area under the function equals 1 (i.e. it is a legitimate pdf). Note that the choice of the kernel used does NOT impose anything on the shape of the pdf we are estimating – for example, a normal kernel will not restrict the resulting pdf to look normal. With all of these methods, the pdf will look like a smoothed histogram. The important choice: bandwidth Having a large bandwidth results in oversmoothed estimates that will have low variance but potentially high bias because we are “smoothing away” features of the data. Having a small bandwidth will be less biased (maintaining the features of the data) but will also have high variance. In the extreme, we are just jaggedly “connecting the dots” and not really summarizing our information at all. There are some “optimal” bandwidth calculations (ex. cross-validation) that impose a particular way of balancing the tradeoff between bias and variance. In practice, choosing bandwidths is often an art – choosing some starting point (based perhaps on a rule-of-thumb in the literature), you can try different bandwidths on each side until it’s clear that the one extreme is undersmoothed and the other is oversmoothed. If there isn’t anything very different going on between the two estimates (ex. oddities in the shape of the distribution that get smoothed away) then it won’t matter much which bandwidth you choose in that range. You can use the visual display of the data to decide which bandwidth you find most reasonable, and just report that any findings are robust to bandwidths in the range (x,y). You can also use kernels to estimate joint density functions, as discussed in the article. We won’t get into that in the lecture. Nonparametric Regression Estimation Focus is on local linear regression. The econometric model is: yi = m(xi) + εi We want to estimate the function “m” without assuming that y is simply linear in x. Estimation, just like with the density, is “pointwise” – that is, we will estimate the value of the function “m” at some particular x and then put all the estimates together to get a picture of the whole “m” function.

2

Page 31: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Procedure:

1) choose some point x0 2) choose a bandwidth around that x0 3) use weighted least squares, where the data points farther from x0 get less weight. The

weighting of the points is done through a kernel function. Choosing the kernel doesn’t matter much (again) but the bandwidth can make a big difference. Different bandwidths will results in “oversmoothed” or “undersmoothed” estimates. Example in the text uses two variables, education and aptitude test score, to predict log wages. You can see that nonparametric regression requires use to discretize any variable that is not already discrete, because we need to estimate the m(x) function at a finite number of points. With two variables, we can then plot the regression “surface.” Note:

- a regular regression surface would be a plane (flat) so this generalizes that technique to allow for other patterns (but if the real regression is linear, our surface will look linear).

- Estimation in the tails seems to be messy/imprecise. For instance, there are not many observations with very high education and very low test scores. This is the same old common support problem we discussed in matching – it has been found (footnote 7) that parametric models are the only ones that can get estimates in these sparse parts of the distribution, because they impose the structure needed to extrapolate.

Limitations: this gets ugly fast! Trying to include a lot of variables could be a real mess. That’s why we sometimes go to a semi-parametric alternative called “partially linear” regression. We allow some variables “z” to enter linearly (in the usual parametric way) and then only certain ones “x” to enter through the m(x) nonparametric function. The procedure is described in the text – we won’t go through it here. See Fig. 6 for demonstration of the power of this technique. Note: the “real” regression function is not something we’d expect to see very often in reality, based on any usual kind of economic model. Conclusion We still need to come to a consensus on a good way to choose bandwidth and to calculate standard errors. But these estimation techniques are nice, and in particular can be very helpful in visualizing and describing data to help use avoid foolish inference that relies on parametric assumptions.

3

Page 32: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/30/05

“Reassessing the empirical validity of the human-capital augmented neoclassical growth model”

by Elias Dinopoulos and Peter Thompson The Point: This paper argues that the estimates presented in Mankiw, Romer, and Weil (1992) are fundamentally flawed. This paper introduces improved measures of human capital levels and argues that using these measures makes it clear that the MRW model is insufficient to explain the data, and that allowing technological differences is needed to make the model more realistic (and, notably, brings it away from the neoclassical Solow growth model into the territory of the Schumpeterian endogenous growth model). Assumptions in MRW: A key assumption in MRW is that technology can be described as: A(t) = a(t) + εi and that (importantly!) the country-specific error is not correlated with other regressors (human capital, capital/output ratio, effective labor growth rate). In other words, they assume no endogeneity. Recall that human capital is treated symmetrically with physical capital. Also, note that they calculate their “plausible” β using U.S. data only – a little concerning. Measuring Human Capital: The authors argue that “secondary school enrollment” is a poor proxy for human capital’s share of output (i.e the HC saving rate).

1) is more of a proxy for level than for savings rate 2) ignores other levels of schooling 3) ignores correlation between ability and education 4) ignores informal HC accumulation 5) assumes constant MP of formal educ; diminishing is more likely (there’s empirical

evidence on this). These cause MRW to overestimate the spread of HC savings rates – the countries without much secondary schooling get 0 and the countries with a lot of secondary schooling are assigned more HC than is reasonable (due to decreasing returns).

Page 33: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The authors develop two new measures of HC that they argue are better, and the rescale them to make them have the same means, under the MRW assumptions, as the MRW measures. See Table 2 for results: the MRW model is rejected when their variable is used as a proxy for h* rather than for sh. However, the other two models fall in line with the restriction of the equality of the ln(sk) and the ln(n + γ + δ) coefficients. Note the size of the other coefficient (β) though – it suggests that the assumption α + β < 1, on which the model was based, is no good. Key result: for either case, we have to reject the neoclassical growth model. Why did the MRW results fall apart with the new HC measures? Technology: Assume that a country’s relative technology level depends on its endowment of HC per effective workers. This is the endogenous growth idea: AR = φhψ

Note the change from earlier – this implies that A is not independent of h (endogeneity issue). New equation is:

ln[Y/L] = ln(φ) + (α(1+ψ) /(1-α-β)) ln(sk) + ((β+ (1-α)ψ) /(1-α-β)) ln(sh) - ((α+ β + ψ) /(1-α- β)) ln(n+g+δ) + ε

Or the other version:

ln[Y/L] = ln(φ) + (α/(1-α)) ln(sk) + ((β+ (1-α)ψ) /(1-α)) ln(h*) - (α /(1-α)) ln(n+g+δ) + ε

The new proxies are for h*, so we estimate the second one with them note we’ve already done this, but we haven’t properly backed out the ψ (so our others may have been wrong too). Since it isn’t separately identified, assume the “reasonable” restriction of MRW that α+β=.6. Results are in Table 3. When we make the “reasonable” assumption and use our new HC measures, the hypothesis that ψ = 0 is soundly rejected. Note: reason for good MRW fit is that their h is correlated with income. How did MRW get such plausible income shares w/ a bad proxy and a misspecified model? The two effects exactly cancel out. Suppose the researcher believes that he is estimating: ln[Y/L] = α0 + αln(k) + βln(h)

Page 34: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

But the true model is the Schumpeterian one where: ln[Y/L] = φ + αln(k) + (β + ψ) ln(h) If h is measured well, the researcher will get a value for βhat that is actually estimating (β + ψ). If we interpret it as an estimate of β, we will be overestimating β. But MRW do not seem to overestimate β – they get a plausible results. Why? Consider the issue with h. Suppose what they are observing for h is actually: h’ = μhv so that ln(h’) = ln(μ) + v*ln(h) This means the “true” coefficient when we are using ln(h’) is (β + ψ)/(1 + v) (although I get just (β + ψ)/ v – but it doesn’t matter much anyway) (or in earlier draft) ln(h’) = (1 + v)ln(h) The coefficient will only be a good estimator of β if it happens to be that β = (β + ψ)/(1 + v). So given a plausible level of β = .3, the estimator will work if v ≈ 3ψ. We got ψ = 2.6 in Table 3, column 2. (assume DST is a good measure of h) We noted earlier that their measure of h seems to have a wider spread than the new ones presented here. Assuming the DST one is more accurate, we can run that small log equation as a regression and we get vhat = 8.5. So we see that the condition approximately holds – the misspecification and the particular form of measurement error cancel out, leaving MRW with a reasonable estimate of β! Conclusion: This reassessment of the MRW model generates a lot of concerns about the model and about the variables used in the MRW paper. The reader is left questioning the value of the MRW model.

Page 35: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/15/05

Intro to Economic Duration Data and Hazard Functions (based mostly on Kiefer 1988 JEL)

The Point: Economists are interested in some problems that involve durations (ex. unemployment, time between capital equipment replacement, lifetimes of firms, elected officials’ time in office, time remaining in school, etc.). These do not always fit nicely into a regression framework, where the central concept is the unconditional probability of an event (ex. probability of an unemployment spell lasting exactly 10 weeks). In duration models, the central concept is the conditional probability of a spell ending at time t conditional on it already lasting through t-1. We will discuss some econometric setups that are conducive to this approach. Note: (from p. 649 of the article) “For any specification in terms of a hazard function there is a mathematically equivalent specification in terms of a probability distribution. The two specifications involve the same parameters and are simply two different ways of describing the same system of probabilities. Thus, the hazard function approach does not identify new parameters.” Why then do we bother with a new approach? Our usual probability distributions don’t have nice clean conditional probabilities (hazard functions). We would like to have a parameterization that allows more transparent estimation and which allows relevant special cases (like a constant hazard). Duration Notation/Concepts: Defining a duration:

1) time origin of event 2) time scale 3) time of event ending

Note that with almost any data, you will run into right-censoring, left-censoring, or both. This will come up a bit today, but it is something to think more about if you think you will want to use duration modeling in your empirical work. The maintained assumption regarding censoring is typically that those with censored spells of length c are representative of all those who have spell lengths of at least c. Let T denote the random variable indicating duration. Let t indicate a realization of that variable.

Page 36: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The following 4 concepts are 4 different ways of characterizing the same duration data – any one implies all 3 of the others. 1) F(t) = Pr (T < t) the cdf The empirical version will tell us what proportion of spells have ended by any duration length t. 2) f(t) = dF(t)/dt the pdf This is roughly (b/c it’s continuous) the probability of a spell ending at duration length t. 3) S(t) = 1 – F(t) = Pr(T>= t) the survivor function The proportion of spells that will continue past duration length t. 4) λ(t) = f(t) / S(t) the hazard function This is roughly the rate at which spells will end at duration length t, given that they have lasted up until length t. Duration dependence: A useful conceptual tool to help choose how to characterize a given economic duration problem is to think about whether it would have a form of “duration dependence.” Positive duration dependence at point t*: λ’(t) > 0 at t* Means that probability of ending, conditional on still being there, gets higher as spells increase in length. (ex. strike length, life length, Valentine’s day “longest kiss” contest”) Negative duration dependence: λ’(t) < 0 at t* The probability of ending, conditional on still being there, gets lower as spells increase in length (ex. welfare spells, unemployment, firm lifetimes) No duration dependence: λ’(t) = 0 implies that the probability of ending the spell doesn’t depend at all on how long the spell has been going on (ex. random, independent draws from a wage distribution - for a simple model). Modeling Durations: Parametric Models

1) The exponential distribution This distribution has just one parameter, γ > 0, that defines the distribution. The cdf is: F(t) = 1 – e-γt

Resulting in a hazard of just: h(t) = λ

Page 37: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Notice that there is no duration dependence; this is sometimes called “memoryless.” Note that this the exponential is the only distribution with this property (because recall that the constant hazard implies the f(t), etc., uniquely). Note a limitation: E(T) = Var(T) = 1/γ (only one parameter to work with)

2) The Weibull distribution This is a broader version of the exponential model; the exponential model is a special case. This requires γ > 0 and α> 0. In this case, F(t) = 1 – exp(-γtα) This results in a hazard of: λ(t) = γαtα-1

Note that if α = 1, this is just the exponential model. This model allows either positive duration dependence (if α > 1) or negative (if α < 1) but not both.

3) The log-logistic distribution This one allows a non-monotonic hazard. F(t) = 1 – [1/(1+ tαγ)] λ(t) = γαtα-1 / (1+ tαγ)

For α <= 1, the hazard decreases with duration (negative duration dependence). For α > 1, the hazard first increases with duration, then later decreases. How do I choose a distribution? It’s good to start just by plotting the non-parametric survivor function (Kaplan Meier) and the corresponding hazard to see what the patterns in the data look like in terms of duration dependence. Relevant Stata Commands: stset lengthunemp sts graph gen haz = hj/nj stsum bysort lengthunemp: gen hj

= _N graph twoway (line haz lengthunemp

Page 38: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Including Covariates: What we have talked about so far are ways of thinking about the distribution of our duration data. None of these took other variables into account at all. Adding covariates will require us to model the hazard in a slightly different way. We’ll just talk about time-invariant covariates. A common specification is a “proportional hazard model”: λ(t;X) = λ0(t) φ(X, β) The idea here is that the X’s affect durations proportional to some “baseline” hazard rate. (your various X’s either speed up or slow down your process of ending your spell). We often parameterize φ (X, β) as exp(xβ). When we do this, we can conveniently rewrite the model as:

lnλ(t;X) = xβ + ln(λ0(t)) Note something nice here: the β’s can be interpreted as “the constant proportional effect o X on the conditional probability of completing a spell.” This is parallel to a regression β. If we add covariates to the Weibull model, we can get a proportional hazard model, though this isn’t true with every possible hazard model. Weibull: replace gamma with exp(xβ) Then we get a proportional hazard model, where λ0(t) = αtα-1 Note that making the same substitution in the log-logistic does not result in a proportional hazard. Estimating a proportional hazard model: There are a couple of ways to use the linear formulation above and estimate β. We could use MLE. Below I discuss using OLS. (We’ll need to be careful about SEs if there is censoring). Note that we can only identify the baseline hazard up to a constant if there is a constant in X. One example in the paper, for the Weibull model, is the specification:

-α*ln(duration) = xβ + ε

The error here is fully specified but not normal. If we drop the α, it’s exponential.

Page 39: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

We can also run essentially the same thing (without α), but be more flexible on some assumptions and more stringent on some others, using an “accelerated lifetime” model. (These two approaches only really match up in the Weibull family of estimators.) Comparing the results of OLS (either one) with the MLE estimates will help us know if the exponential/Weibull distribution does a good job with our data. Stata has capabilities of doing both hazard and accelerated lifetime forms. Be sure you know which you are doing, so you know if the coefficient is estimating β of β/α. That’s all we have time for, so you can get more details in the article, in Wooldridge, or in Jeff Smith’s lecture notes online.

Page 40: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/14/05 (Happy Valentine’s Day!) Lecture on

“Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs”

(Dehejia and Wahba)

AND

“Does Matching Overcome LaLonde’s Critique of Nonexperimental Estimators?” (Smith and Todd)

Both of these papers deal with re-estimating the NSW treatment effects (discussed in LaLonde 1986 and Heckman and Hotz 1989) using variations on the method of matching. In this lecture, we will talk about several ways of implementing the method of propensity score matching, and we will put them into the context of these two papers. We will also discuss the fundamental features of the data that are needed to have any hope of getting reasonable treatment effect estimates via matching. General Matching Methodology The Propensity Score We noted last time that it is usually impractical to actually implement matching estimation for all of the relevant variables. First, it will typically require too many cells (many of which will be empty or small). Second, it presents a problem if some of the X’s are continuous rather than discrete. Fortunately, Rosenbaum and Rubin (1983) showed that if the conditional independence condition holds (selection on observables, X -- which you will recall is required for matching on X), then it follows that the same holds for the function P(X), defined as: P(X) = Pr(D = 1 | X) In other words, we can match people on a single-dimensional index – P – rather than trying to match them on all of their various characteristics. The intuition for this (as stated by Jeff Smith, Professor at U of Maryland) is that two groups with the same probability of participation will show up in the treated and untreated samples in equal proportions. Thus, they can be combined for purposes of comparison. (So even if these two groups have different outcomes, those outcomes will be reflected in equal proportions among the treated and untreated).

1

Page 41: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Think of it this way: it doesn’t matter if you add and then average, or average and then add. Suppose there are two different types (A and B) with the same propensity scores of .25. We could theoretically split these into their two groups, calculate an average treatment effect in each group, and average them (weighted by # of treated in each) to get a matching estimate of treatment on the treated for that level of P. We could also put them all together and calculate the average treatment effect on the treated. We will get the same answer. We will need to estimate this P(X) in order to use it for matching. This introduces some error into our estimation of the treatment effect which we will need to account for in our standard errors. Fortunately, there is evidence that the parametric form (logit, probit, or semi-parametric model) does not matter too much in the performance of propensity-score matching estimators. Assume we use a logit. There is no clear method for choosing the specification for the logit. There is an idea of a “balancing test” that checks whether the X’s have any additional power to predict D when P(X) has already been conditioned upon. The idea is that they shouldn’t, since their influence should be embedded in P(X) already. Common Support Plotting a histogram of the propensity scores can help us see what the common support looks like. (Show a sample). An interesting note: when there is no common support at all, this may be the place to use a regression discontinuity design. We noted last time that we need to trim down the sample and estimate only over the common support. In practice, some researchers simply trim off all the data beyond each end of the common support. Unfortunately, this may drop some potentially good matches for some of the treated near the edge. It also ignores any “holes” in the support. Heckman, Ichimura, Smith, and Todd (1998) suggest that researchers drop those in the sample (treated or untreated) that have particularly low estimated density at their P level. None of these methods actually get rid of the “problem” that we cannot estimate a treatment effect for certain parts of the distribution – they are just a way of coping. Implementing Propensity Score Matching The general form of the matching estimator is the following:

for all S in the common support

111 0

1ˆ ˆ( ) [( ( | , 0)] i ii ii I

M S Y E Y X DN∈

= − =∑

2

Page 42: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

*** CHANGE to Di = 1 above**** We are replacing X with P when we use propensity scores. The way we estimate the E-hat quantity for each treated person is with some weighted average of untreated people. There are several ways of doing this – we will talk about three general methods:

1) nearest neighbor matching 2) stratification/interval matching 3) kernel/local linear matching

1) Nearest neighbor This means exactly what it sounds like – for each treated person, the outcome of the untreated person with the most similar propensity score is deemed their counterfactual. This can be done with or without replacement – with replacement may re-use many untreated people, increasing the variance of the estimator; without replacement may result in lower-quality matches, and is dependent upon the order in which you implement the match. You can modify nearest neighbor to include multiple neighbors (either a certain number of neighbors for each person, or all of the neighbors within a certain distance – this is caliper or radius matching). 2) Stratification/Interval Matching Partition the support of P into a set of intervals. Treat each interval as you would treat a single value of X in regular matching (i.e. act as if P divides into discrete groups, then find the average treatment effect in each group and average them according to the number of treated in each group). Dehejia and Wahba use this strategy. 3) Kernel/Local Linear Matching We will be talking about this more in the next lecture (on kernel methods) so today will just be intuition. The counterfactual is determined by a weighted average of the untreated people surrounding a given treated person, where the weights are largest nearby and smallest (or zero) for those far away. (Draw a scatterplot to illustrate that this is a kind of “regression” estimator, where we are just using the untreated to estimate the nonlinear regression function). Local linear is a slightly fancier version – instead of using the surrounding observations to estimate a mean only, they also estimate a slope, which can help us better estimate the counterfactuals near the endpoints. The difficulties in these nonparametric estimators are (a) choosing the weighting function – often we use normal – and (b) choosing the bandwidth, or (roughly) the spread of the weighting function. The first thing appears not to matter too much, but estimates can be very sensitive to bandwidth.

3

Page 43: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Dehejia and Wahba Summary of Dehejia and Wahba’s setup: They use LaLonde’s sample to try a new estimator (matching) that they expect may be useful for re-creating the experimental results. First, though, they trim down LaLonde’s sample to those with an additional year of pre-treatment earnings data (1974) or those for whom it is known that they were unemployed prior to treatment. (More details on this are discussed in Smith and Todd). They argue that this previous data is needed to properly specify outcomes, since prior employment information has been shown to be an important predictor of outcomes. They replicate LaLonde’s results, both with the original sample and their new subsample. They confirm that the methods tried by LaLonde do poorly in both cases. Divide into 3 groups of 4. Each group has one task:

1) explain how Dehejia and Wahba deal with the common support issue 2) explain how they decide on their propensity score specification 3) explain the two types of matching they implement

What were their results? (Table 3) Essentially, they find that matching does a remarkably good job of estimating the correct treatment effect of job training. They are also robust to changes in the propensity score specification, though they are sensitive to dropping the earlier-period earnings variable. Smith and Todd Suspicious of the DW result for a few reasons: (list)

- DW still do not have a good comparison group (it is not geographically similar and the variables are not measured in the same way across the treatment and comparison groups)

- DW select a rather strange subsample to work with (see Table 2 of ST) Modifications/Extensions: Regression-Adjustment and Diff-in-Diff They start by going through the various matching methodologies, and presenting a couple of extensions that they use in this paper. Regression Adjustment: This is conceptually not too hard. If we think there may be variables that affect outcomes that are (a) not related to participation in treatment or (b) distinctly related to the outcome

4

Page 44: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

in a way that goes beyond their relationship to participation in treatment (i.e. they are in the P-score) then we may want to regression-adjust our matching estimates to control for these differences. This involves estimating the β’s from a regression of Y on these “special” X’s along with a term representing the remaining error (due to selection). This second term needs to be attacked nonparametrically, so estimating the β’s is actually not that straightforward. However, once it is done, we implement regression-adjusted matching by replacing the outcome Y with the part of the outcome Y that is unexplained by βX (i.e. with Y – βXhat). The matching is done on the propensity scores as before. Diff-in-diff: This is a really important issue. If we think that there may not be perfect selection on observables, this matching offers a more flexible framework. As long as we’re willing to believe that the treatment and control groups were on the same trajectory, and we have before and after data, we can allow for limited selection on unobservables (limited in the sense that the unobservables must be time-invariant). This assumption is the parallel to the regular DD assumption. The DD matching estimator uses the change in Y, rather than Y itself, as the dependent variable. Everything else is the same. If you want to also regression adjust, you can (it just requires some programming). In previous work, Smith and others found that the DD version of the estimator got closer to experimental results. As with usual DD, this variant of matching can be used on repeated cross-sections as well as panel data. Estimation Smith and Todd use the LaLonde, DW, and their own samples from the NSW to assess the robustness of matching estimators in this context. (see Table 2) They use two propensity score specifications, based on LaLonde and DW. Their approach: estimate bias directly by comparing the control and comparison groups; match on log-odds ratio (makes estimates robust to choice-based sampling); deal with common support issue by trimming (losing 5-10% of sample) Results: Table 5A (using CPS – also 5B which uses PSID) There is uniquely little bias for the DW sample using the DW scores – all others are very biased. Table 6A (and B) uses the DD version, and finds lower bias overall, but still huge differences across perturbations of the sample and P specification. Table 7A (and B) finds that even using LaLonde’s non-matching estimators, the DW sample seems to do “better” – particularly with the CPS sample. For the CPS, it is not that “matching” per se that makes the estimates better, but rather the sample itself that alleviates much of the selection problem from the start. For PSID, the regression versions are not as good, suggesting that the linearity assumption has an effect here.

5

Page 45: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

They also do some specification tests like Heckman and Hotz – seems to do well in dropping the worst estimates. Three main conclusions:

1) their evidence leads them to question the DW claims of matching’s success with the NSW data; the results aren’t generalizable.

2) Diff-in-diff matching estimators perform substantially better than the corresponding cross-sectional estimators.

3) Though the choice between diff-in-diff and cross-sectional matching matters a lot, the other details of the matching procedure in general do not.

Dehejia: Reply

1) Concern that ST overstate DW’s enthusiasm for matching as a perfect estimation strategy

2) Concern about ST using the DW propensity score specifications on various samples; the specification should be optimized for each. The results are a different proper specification for each set of estimates; when these are used, the experimental benchmarks are matched with the matching estimates. Q: How did they do this with LaLonde’s data, when they insisted in the 99 paper that the additional previous year of data was absolutely essential? Main issue: The Balancing Test How does it work? See handout from DW’s 2002 paper’s Appendix. Conclusion: The DW sample balances well and is robust to small changes in the P-score specification. The other samples are not, and they thus give bad estimates.

Smith and Todd: Rejoinder

1) Note that DW still don’t defend the choice of a strange treatment group 2) Focus on only the DW sample, and the fragility of its estimates based on changes

that should not matter (i.e. interchanging the experiment’s treatment and comparison groups in pre-treatment tests).

3) Sensitivity of DW results to the type of matching and even the seed. 4) Alternative balancing tests don’t accept the DW estimation.

6

Page 46: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/26/05 Summary/Notes on:

The Benefits of Prenatal Care: Evidence from the PAT Bus Strike by William N. Evans and Diana S. Lien

The Point: This article attempts to estimate the effects of prenatal care on birth outcomes in a novel way. The authors note that the usual strategy (a regression of birth outcomes on amount of prenatal care) is subject to potentially a great deal of omitted variable bias. They argue that they have an instrument that can alleviate this problem: a month-long bus strike could plausibly have affected the number of prenatal doctor visits for a given woman, but the same bus strike is otherwise unlikely to affect birth outcomes. They find that because such a small group is affected by the bus strike, they cannot estimate the effects very precisely. They are able to uncover some evidence that prenatal care reduces maternal smoking and that missing a prenatal visit early in pregnancy is associated with lower birth weight and reduced gestation (shorter pregnancy). More strongly, though, they do at least establish a strong link between prenatal care and the availability of public transportation for inner-city residents (particularly blacks). Structure of Paper:

I. Introduction II. Primer on Prenatal Care III. Background

A. the PAT bus strike B. bus strike’s potential impact on prenatal visits

IV. Research Strategy and Data A. econometric model B. census of U.S. births: the Natality Detail File C. choosing control counties D. descriptive statistics

V. Results A. first-stage estimates [convince us instrument isn’t weak] B. reduced-form estimates (these are with the instrument in the struc.

equation instead of the endogenous regressor) C. OLS and 2SLS estimates [present results] D. Late versus early prenatal care E. Are there other explanations for these results? [convince us instrument

isn’t independently related to outcome of interest] VI. Conclusion

Page 47: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Setup: What is the endogeneity problem? 2 possibilities:

1) health-conscious women (who are likely to have good birth outcomes) are more likely to engage in lots of prenatal care

2) women with problem pregnancies (likely to have bad birth outcomes) are more likely to engage in lots of prenatal care

Each of these will bias the estimates of the effects of prenatal care on outcomes (first will overstate, second will understate). There is a section early in the paper (IIIb) that tries to establish the possibility that a meaningful number of people might depend on public transportation for medical visits. What does this investigation show?

- it is hard to say for sure, but it appears that maybe 5%-11% of women in the area we are looking at (Pittsburgh) who are of childbearing age use public transportation for medical visits.

- MOST IMPORTANTLY, the authors find that race is a significant factor in determining this percentage. They are careful to account for race in a flexible way throughout the paper.

The Econometric Approach: The authors talk about needing a “control” group. Their most basic 1st-stage regression is: Prenatal care = Xβ + county, month, year dummies (plus their interactions with race and inner-city dummies) + (strike*Allegheny)α + ε α is the “treatment effect” in the first stage. We need to establish that it exists (otherwise the instrument fails the first test of being valid). Why don’t they use just Allegheny data only (that’s where the strike was) and then define the treatment as just “strike”? Angrist and Krueger don’t need a control group for the returns-to-educ paper that uses quarter of birth as an instrument. What’s different?

- we might just pick up some sort of time effects, since the strike variable is just =1 for certain months (The Angrist and Krueger instrument is time-invariant within each person, differing across people. The strike instrument is the opposite.)

- better idea to have a comparison geographic region, so that we can control for general changes over the time period of treatment (using the “strike” variable to

Page 48: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

mark that time period) and then just look for changes above and beyond that in the particular area affected by the strike (interacting strike with Allegheny county indicator).

[Note that they also do an additional specification in which separate treatment effects are estimated for each of four groups (defined by race and inner-city residence) all at one time. This is an important feature of what they are doing.] The authors choose a control group based on the similarity of the counties to Allegheny county both before and after the strike (not during – that would ruin the identification strategy; we need to allow Allegheny county to respond to the strike). Results: The first goal of the authors is to convince us that they do not have a weak instrument. Are they successful? (Table 4) - Yes, particularly with their 8-county control group. They have a strong F-test and their R-squared of .12 isn’t so bad (recall that Angrist and Krueger’s was .0001). They also make sure the order of magnitude is sensible by doing a back-of-the-envelope (naïve) estimate and comparing it to what they got in the econometric model. Nice. Next, they go on to try the instrument in the outcome equation (they refer to this as the reduced form – it differs from the structural equation because it doesn’t have the endogenous regressor in it). How do you interpret these estimates? (Table 6) - these are the estimated effects of the strike on birth outcomes; at this point, they are not dividing through by the first-stage estimate to calculate the expected effects of prenatal care on birth outcomes (which is ultimately of interest) You can do so yourself to try to preview results. They don’t find much statistically significant here. Why are their SEs so big?

- the population that is actually affected is small, so we don’t get a lot of precision even in this large sample

- the amount of variance in the predicted values of the endogenous regressor is an important factor as well (if they don’t vary much, our SEs will be large – you can see how this relates to having a weak instrument)

They go ahead and calculate some various 2SLS and OLS estimates (for comparison). Using the whole sample, they get pretty much nothing going on with 2SLS, making it different from OLS. BUT they point out that these two estimates aren’t all that comparable, since their IV estimation is estimating a LATE. They do the same thing again, but with a limited sample of only black women. The coefficients then look a lot

Page 49: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

like OLS with the same sample – or at least they’re statistically indistinguishable b/c of large SEs. (Table 8) Toward the end of the paper, the authors also check if it matters whether a missed prenatal appointment is early or late in pregnancy. They find evidence that missing an early appointment appears to have stronger consequences, particularly on birth weight, but the result is only significant at the 15% level. The other IV requirement Before they conclude, the authors take a bit of time to try to convince us that the strike did not affect birth outcomes in any way besides through decreased prenatal care. Two possibilities they discuss: 1) women who gave birth in the 7 months during the strike could have been “different” 2) could be a sample composition change – if women also couldn’t take the bus to get abortions, then birth outcomes could be worse but it wouldn’t have anything to do with prenatal care Some other issues related to their 1st stage: 1) a shock to employment/earnings could have occurred at the same time (seems to be an argument that “strike” is endogenous in 1st stage – shouldn’t matter…although if they want to push the result that the strikes “caused” the lack of prenatal care, then they need to worry about this) 2) might have been bad weather that influenced prenatal care visits (same type of argument as (1) I think) Conclusion: They push fairly strongly their results from the 1st stage (it is not merely a means to an end in this paper). They also note that they find statistically significant evidence that more prenatal care reduces maternal smoking. They don’t push hard on their other coefficients because of imprecise estimates. The authors include a helpful “lesson” in the second-to-last paragraph. They point out that you need 3 things to usefully engage in 2SLS to evaluate a causal mechanism:

1) a mechanism that mimics random assignment 2) a mechanism that impacts a large-enough population so the 2nd stage effects can

be measured with precision 3) an experiment where the LATE is an interesting outcome

They conclude that they got 1 and 3, but came up a bit short on 2.

Page 50: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/10/07 Summary/Notes on: Edward Glaeser “Researcher Incentives and Empirical Methods” Harvard Institute of Economic Research, Discussion Paper 2122. October 2006 Glaeser begins by expressing the notion that econometricians assume that they themselves are not opportunistic maximizers, even when we assume that everyone else is. He suggests that “The solution to this problem is not to expect a mass renunciation of data mining, selective data cleaning or opportunistic methodology selection, but rather to follow Leamer’s lead in designing and using techniques that anticipate the behavior of optimizing researchers.” He defines “researcher initiative bias” as the bias created as econometricians make decisions at the various steps in the research process that tend toward them finding significant results. Glaeser proposes a model of researchers whose objective is to “produce famous results that appear to change the way that we see the world.” The choice variable is the effort exerted to bias results toward further significance and away from their actual value. What do you think about this? He concludes based on this model that the important of an issue will cause a researcher to work harder to get results that are perceived to be more significant than they actually are (this is not surprising given the assumptions of the model). Here are his ten points about “a more economic approach to empirical methods”: Which do you think are most important/interesting? (My thoughts: 1,2,5,8,10)

1. Researchers will always respond to incentives and will be more skeptical than standard statistical techniques suggest.

Why the word “always”? And which incentives, exactly? Academia is a repeated game, so there is still an incentive to do work that is not subject to the criticism of being highly biased. I wonder if this new view of researchers is similar to the economics of crime – does it always assume that individuals are without conscience, or does it just assume that incentives can matter on the margin? What can we ethically expect out of econometricians, if anything? The second piece might make some sense, since we lack information on the decisions of other econometricians even if we have concluded (for whatever reason) that we will self-consciously avoid (to the extent possible) working in a way that generates maximal bias toward our desired results. The simple adjustments suggested here – which amount to apply stronger standards to regression results – seem quite sensible to me. The notion that correction techniques are under-developed and underutilized is an interesting point. The suggestion about adjusting coefficients seem to answer a problem with another problem: “A more thorough approach would require complete specification of the reader’s

1

Page 51: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

assumptions about what the researcher can do, and then use Bayesian inference.” It’s hard to imagine we will correctly characterize these assumptions.

2. The optimal amount of data mining is not zero. Is choosing variables based on some sensible economic model “data mining”? If we don’t do this, are we stuck with a simple process of elimination for determining economic relationships? (The example with T ans q and z is interesting.)

3. Competition and replication can help, but its impact will differ significantly from setting to setting.

Glaeser argues (I think successfully) that competition can help, and its distribution might even be efficient (more effort expended on more important questions). ex. Hoxby and Rothstein He also suggests that no one would be punished for classic data mining unless “norms about specifying hypotheses shifted enough”. What might this look like? * It is interesting that Glaeser notes that “competing researchers are also maximizing their own objectives which may or may not be the perfect presentation of truth.” His model assumes that it is not! He seems to give a little more credit to researchers here.

4. Improvements in technology will increase both researcher initiative bias and the ability of competition to check that bias.

This section made sense to me.

5. Increasing methodological complexity will generally increase researcher initiative bias. His point is that the researcher now has more options (increasing scope for bias) and it is harder for others to compete (increasing the possibility that he/she will get away with it.) * I thought an important point was left out here: some increasing methodological complexity is a direct result of trying to avoid simplifying assumptions and to allow the data to “speak for itself” more clearly. The fact that this might improve the accuracy of the estimates seems underappreciated here. * Glaeser suggests in the last sentence that the profession needs to take researcher initiative seriously and “design corrections that are based on the degrees of freedom enjoyed by an enterprising researcher.” Doesn’t this create a disadvantage for those whose estimates are close to accuracy due to careful avoidance of bias when possible? This seems to change his description of econometricians’ incentives into a prescription for how they should approach research. (see “Econometricians Behaving Badly” by James Reade, a Ph.D. student at Oxford)

2

Page 52: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

(also Peter Kennedy 2002 “Sinning in the Basement: What are the rules? The ten commandments of applied econometrics”)

6. The creating and cleaning of data increases the expected amount of researcher initiative bias.

Creating data: note the story of Robert Lee, a name used in data gathering for a poli sci project Surely the researcher should not gather data at random – does this automatically mean we should assume he/she is choosing data to create false precision in the final estimates? While I agree out that this biasing activity can be subconscious, surely someone who is trying to avoid bias will be more successful in minimizing it. Selective cleaning is an interesting issue – we will talk some about this in class when we get to the “data issue” section. There can be difficult situations in which good intentions can have flawed results.

7. Experimental methods restrict hypothesis shopping but increase researcher discretion in experimental design and data collection.

This section made sense to me, although it continues to be a small share of economic research.

8. The search for causal inference may have increased the scope for researcher initiative bias.

Instrumental variables takes a beating in this section. There is clearly a sense that people are choosing instruments to get the results they want. Consider the following situation: You have a regressor, X, which you are concerned – for theoretical reasons - is endogenous to the outcome Y. You have a prospective instrument Z that you believe – for theoretical reasons - is uncorrelated with the outcome of interest (except through the endogenous variable). For this instrument to be legitimate, you must demonstrate a strong relationship between X and Z. If this relationship does not exist, the instrument is not legitimate and should not be used. Is running the first stage of 2SLS to check this “data mining”? Is testing various instruments this way “data mining”? How would this differ from running the full 2SLS with every potential instrument, even those that turn out to be weak?

9. Researcher initiative bias is compounded in the presence of other statistical errors. It makes sense that “the existence of researcher initiative may make researchers less willing to take steps that would tighten up their standard errors and make significance less likely.” We will be discussing a debate about the use of clustering in some estimates of the effect of the level of gun ownership on neighborhood crime.

3

Page 53: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

10. The impact on researcher initiative bias of increasing the connection between theory and empirics is ambiguous.

This is an interesting insight. However, there still seems to be a lack of acknowledgement that even empiricists are (at least implicitly) using economic models (or common sense) to choose variables and specifications. This can help alleviate the issues from point # 2. Conclusion: Glaeser says “Researchers choose variables and methods to maximize significance and it is foolish to act as if this is either a great sin or to hope that this will somehow magically disappear.” Are these really the two choices? What about working on, say, transparency in the reporting of results? Perhaps some norms would need to change, but I find it hard to believe that this description of what researchers do is a norm accepted by the profession. He suggests that we should “embrace human initiative in response to incentives.” Does this mean all econometricians should subscribe to the method of “maximizing significance”, since some of them already do? If they were subjected to an assumed weakening of their results (ala point # 5), is there any other option for them?

4

Page 54: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/10/05 Notes on “The Incidence of Mandated Maternity Benefits” By Jonathan Gruber The Point: Polictymakers sometimes try to induce behavior through mandates. This allows them to try to make things happen without directly having to pay for them. It can also be more efficient than government payment, given the deadweight loss of taxation, if the costs of the mandate shift to the appropriate consumers. In this paper, Gruber looks at whether the costs of mandates that insurance coverage must include maternity coverage are passed through to the affected workers (ex. married women of childbearing age and their spouses, primarily). The main structure involves two comparison groups, making it a DDD analysis: married women of childbearing age are compared across “mandate” and “non-mandate” states over time (one DD), and then this difference is compared to the same calculation for groups that should not be affected by the mandate. This allows Gruber to take into account any differences in the trajectories of wages across treatment and control states (avoiding a common criticism of DD analysis, which assumes equal trajectories). He also tries estimation using individually-parameterized cost measures. In both cases, he concludes that approximately 100% of the costs of the mandates are reflected in decreased wages among the targeted group; in line with this, he finds that there is little to no employment effect. The Model: Where are Gruber’s predictions coming from? How does he know what to look for? He’s really just drawing a supply and demand curve and thinking about the effects of a shift – it is that simple. This is what suggests that he should look at hours & employment along with wages, since we expect a trade-off between the two. The Identification Strategy: There are two approaches to the estimation in the paper. What are the advantages and disadvantages of each, in particular given his data? Big disadvantages of the data in general: no info on health coverage (prior to 1979), limited identification of individual U.S. states DDD: Advantages:

- a fairly believable assumption (no contemporaneous shocks affecting the relative outcomes of the treatment group in the same state-years as the law)

Page 55: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

- very straightforward interpretation Disadvantages:

- rough cutting of the data; there may be many in the treatment group unaffected by the treatment, so estimated effects of treatment could be underestimated - method ignores variations of mandate’s cost within the treatment group

Individual variation: Advantages: - can account for variation in treatments’ cost on an individual level - can estimate the effect on wages per dollar cost Disadvantages:

- stronger parametric assumptions (adding estimated cost for each person as a term in the regression, instead of the treatment dummy) - requires estimation of insurance coverage costs from outside the data set (which should increase standard errors)

The Results: Gruber finds, with a variety of specifications, that essentially all of the cost of the mandate appears to be transferred to the affected workers, and that employment levels and hours aren’t affected much. In other words, the market for maternity health coverage operates efficiently. This may or may not be exactly what policy makers had in mind, since they may have hoped women wouldn’t be completely paying for their own insurance. However, it appears that women are willing to pay for it, so it isn’t clear where the policy mandate comes in. It may be related to the typical adverse selection problem in insurance markets, causing them to be incomplete.

Page 56: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma

Lecture Notes on “Much Ado About Nothing: Conditional Logit vs. Random Coefficient Models for Estimating Labour Supply Elasticities”

by Peter Haan Key Point: The conditional logit model has restrictive assumptions about the error terms in a discrete choice model. However, when compared with more flexible models (which are much harder to estimate) estimated labor supply elasticities are not very different, even though there is evidence that the restrictive assumptions of the conditional logit do not hold. Reminder of these Discrete Choice Models: (see ECO 7424 lecture notes, section 15.4) The conditional logit model allows alternative-varying regressors (ex. cost of the alternative) but requires the regressor to enter the model in the same way for all alternatives (i.e. fixed β). The multinomial logit is the opposite, allowing varying coefficients on alternative-invariant x’s These models have the unfortunate “independence of irrelevant alternatives” (IIA) assumption, which prevents an added outcome option from affecting the relative probabilities of choosing any two formerly existing options. One can see that this may be a bad idea when thinking of labor options – for example, if “overtime” becomes a new option, do we think it will draw people away from “no work” and “full-time work” at the same rate? It must, for IIA to hold. Otherwise, the estimates will be inconsistent. Technical Contribution of this Paper: This paper begins with a conditional logit model and then extends it BEYOND a multinomial logit to a “random parameters model” or “random coefficient model”. Under this scenario, the coefficient can vary AND the variable(s) need not be alternative-invariant. In other words, it takes the conditional logit and simply adds alternative-varying βi (= β + μi). The random piece is a person-level non-observable effect, such as taste. The random parameters model has the effect of inducing correlation across alternatives (i.e. permitting the utilities of each alternative to be correlated). Benefit: we can drop the IIA assumption, which is good news! Cost: we need to specify the distribution of βi -- or equivalently, the distribution of μi A typical distributional choice, and the one I think they used here, is μi ~ N(0,W). If the variance W is estimated to be near zero, then we have evidence that the varying βi’s are unneeded.

Page 57: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

See Equation 6 in the paper (page 6). The formula expresses the probability of choosing alternative k, and can be interpreted as: “a weighted average of the logit formula evaluated at different values of βi with weights given by the density f(βi).” The authors also contribute a third specification, which is a nonparametric random coefficients method allowing an arbitrary discrete probability distribution with a small number of mass points. “Mass points, and their probabilities, are jointly estimated with the parameters of the model using maximum likelihood.” (p. 7) Identification issues? - only m-1 mass points and m-1 probabilities can be freely estimated – one must be derived via the assumptions of the model. - the unobserved heterogeneity must be independent of the x’s Downsides to random coefficients models: These two random parameters models are really hard to estimate, with difficulties w/ convergence and robustness. How does he make it easier? Just choose one variable (HH income) for which βi can vary. Empirical Analysis: The basic model is: Uij(xij) = x’ijAxij + β’xij + εij

where x = log(income), log(husband’s leisure time), log(wife’s leisure time) The matrix “A” contains the coefficients referring to nonlinear terms (quadratics and interactions) and β is the vector of coefficients (of length 3) for the linear terms. The specification of the error term (i.e. whether we have a flexible βi) distinguishes the models. There are 13 discrete household alternatives --- see Table 1 from the old version of the paper. Data is a cross-section from Germany. Results: The coefficients are not very useful in these models, particularly given all the interaction terms. What do we look at instead? (1) Estimated heterogeneity and (2) estimated elasticities. (1) Estimated heterogeneity -- suggests heterogeneity does exist parametric: bottom of Table 2, middle column, “Var(income)” nonparametric: bottom of Table 2, right-hand column, “c1” (2) Estimated elasticities – suggest heterogeneity does not matter Table 3 shows the elasticity of labor force participation (top panel) and hours of work (bottom panel) in response to a 1% change in wage. Conclusion: “For computational reasons the standard discrete choice model, attractive for its simple structure, provides an adequate model choice for the analysis of labour supply functions” (despite its inconsistent estimates)

Page 58: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 4/11/05

“Estimating the Demand for Prescription Drug Benefits by Medicare Beneficiaries” by Anne E. Hall

The Point: The goal of this paper is to understand the demand for prescription drug benefits for Medicare beneficiaries. It uses Berry’s (1984) estimation strategy for a discrete choice model of product differentiation. By looking at differences in prescription drug plans, Hall estimates the value of various plan features, such as covering brand-name drugs or changing co-pays. She estimates premium elasticities, beneficiaries’ willingness-to-pay for various benefits, and HMO marginal costs for providing those benefits. This ability to perform “policy experiments” is a great strength of this approach to economic modeling (with the downside that there are assumptions along the way – see the last paragraph of the Berry paper – that are more restrictive than some other empirical approaches). Background: The Medicare program, provided to the elderly, provides limited health insurance coverage. It does not currently include a prescription drug benefit. However, people can choose to join a Medicare HMO that may provide some additional services, including various levels of prescription drug coverage (for which they may pay a premium). The HMO is paid a flat per-patient rate by the government based on a few demographic characteristics. The Model: Note the key thing here: we start by specifying a utility function. Uijmt = αpjmt + Xjmtβ + ξjmt + ζigmt + (1 – σ)εijmt p = premium paid for plan j in county m at time t X = characteristics of plan j in county m at time t ξ = unobserved quality of plan j ε = indiv-specific shock ζ = indiv-specific shock within same group (HMO w/ drug benefit, HMO w/o, traditional Medicare) – this is one level “up” from the indiv-specific shock, which differs at the plan level σ = the correlation of within-group utility Assume the εijmt is extreme-value distributed and that the whole random component of utility, ζigmt + (1 – σ)εijmt, is also distributed extreme-value.

1

Page 59: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

[For regular logit, we assume σ and ζigmt are zero. Nested logit allows us to take the selection of a group (or “nest”) into account.] Berry (1994) shows that this model can be estimated in a regression framework (all have “mt” subscripts): ln(sj) – ln(s0) = αpj + Xjβ + σln(sj|g) + ξj sj = plan j’s total market share of demand s0 = market share of the outside option (traditional Medicare) sj|g = plan j’s share within its nest ξj = unobserved plan quality (= error term) Note that this regression is not at the individual level, but at the plan level. This is Berry’s result: every vector of observed market shares can be explained by one and only one vector of utility means. Pretty cool because data are available on the things we need for this. We can look at how a plan’s features (on RHS) affect demand for the plan relative to the outside option. Main areas of differentiation among plans: (see Table 2a)

- brand name vs. generic coverage (none, partial, or total, w/ various combinations) - copayment for prescription (what the beneficiary must pay) - formulary (list of drugs available at a discount to the HMO)

Estimation of Demand: Hall estimates the equation above. Terminology issues: she refers to estimating “a logit” – this is actually what Wooldridge calls a “conditional logit” or “multinomial logit”. It has to do with how the conditional probabilities of each choice are modeled/specified. It turns out that estimating Berry’s equation using OLS is a way of estimating this model. Estimating the non-nested version means imposing σ= 0 (i.e. leaving the nest share out of the regression). Concern: the error term is likely to be correlated with both the premium (p) and ln(nest share). Solution:

(a) first use plan-county fixed effects to take out mean quality, leaving time-specific deviation in plan quality as the error term. This likely has fewer problems.

(b) Since the change in a plan over time may still be correlated w/ p and ln(nest share), use instruments for premiums and nest share.

Results: There are three main sets of results for the paper.

2

Page 60: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Demand Parameter Estimates: Table 5 (talk about columns 1-4) and Table 6 (elasticity version) Note that since the parameters we have estimated are the actual parameters of the utility function, we can now think about seeing how welfare is affected by changes in several variables. Benefits: Tables 7 and 8 (focus only on the discrete adding of a prescription drug benefit) Costs: Tables 9 and 10 (again, focus on adding drug benefit) Conclusion: There is a strange asymmetry between people’s willingness to give up coverage (only need to pay them $20/month) and their use of coverage (on average, $146/month). WHY?? Two possible explanations:

(1) Demand for drug benefits is heterogeneous, causing adverse selection into Medicare HMOs. The average of $20 was averaged across all beneficiaries in the county, but costs are actually paid only by those who enroll in plans offering drug benefits. Willingness to pay and use of coverage might line up better if we focus on them. Could still be positive welfare effects.

(2) Regulation mixed with moral hazard could cause the discrepancy. HMOs had to justify

the size of their flat-rate reimbursements (otherwise they’d have to return the money to gov’t) so they may be willing to provide extra services at a cost larger than their value to beneficiaries. Moral hazard adds to this – once beneficiaries have paid little/nothing for an extra service, they then use up the coverage in large amounts once it is available.

3

Page 61: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/8/05 Choosing Among Alternative Nonexperimental Methods for Estimating the Impact

of Social Programs: The Case of Manpower Training

By James Heckman and V. Joseph Hotz

The Point: Heckman and Hotz are concerned that the literature on experimental vs. nonexperimental estimation has been too pessimistic about nonexperimental methods. Specifically, they argue that those nonexperimental methods that operate well for a given question can be distinguished from those that don’t, so that the inference from the estimates is more likely to be correct that LaLonde (1986) would suggest. They provide several specification tests that can be used to determine which estimates to disregard, and they show that these tests eliminate the estimates that are furthest from the correct result. Key issues: There are disadvantages to experiments as well (people not showing up for treatment, people participating in alternative similar treatments, lack of political popularity, cost). There are two unstated premises that underlie the criticism of nonexp. methods: 1) alternative nonexperimental estimators should produce approximately the same program estimates

• In fact, nonexperimental estimators differ in the assumptions they invoke to justify various statistical adjustments, so as long as there are any systematic differences between the treatment and comparison groups we should not be surprised when different methods bring about different results (they will only all match if all of the assumptions of every method hold).

2) there is no objective way to choose among alternative nonexperimental estimators

• This is only true if we have no way to test our specifications; this paper presents specification tests that can be widely used to rule out certain estimators.

The Setup: Consider the same model as before: Yit = Xitβ + diαt + Uit (Assume E(Uit | Xi ) = 0 for all i and t) Our concern: if there is selection bias, then E(Uit | di, Xi ) ≠ 0

1

Page 62: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The selection process can be generically described as: INi = Ziγ + Vi where di = 1 if INi > 0 and di = 0 otherwise The Z’s can include all X’s. Alternative estimators augment the earnings function and selection rule with additional assumptions to undo the dependence between Uit and di. Two cases: 1) Selection on Observables If the dependence is due to Z, then we have the case where: E(Uit | di, Xi ) ≠ 0 and E(Uit | di, Xi , Zi) ≠ 0 BUT E(Uit | di, Xi , Zi) = E(Uit | Xi , Zi) So once both Z and X are controlled for, treatment is no longer correlated with the error. The simplest of these estimators is the linear control function estimator – just include the union of X and Z in the structural regression and you can get a consistent estimate of αt. You can also interact d with all of the X’s and Z’s to allow the training impact to depend on personal characteristics. 2) Selection on Unobservables If selection depends on unobservables, then even with Z’s we still have: E(Uit | di, Xi , Zi) ≠ E(Uit | Xi , Zi) Two potential types of estimators: a) fixed-effect estimator Assume that the regression error is of the form: Uit = φ1i + υit(there is a mean-zero person-level time-invariant effect and a zero-mean random component independent of all other values of υit’ and φ1i) Selection is assumed to occur on the permanent component, φ1i. Then we find that: (Uit - Uit’ | di, Xi) = 0 (where t’ is pre and t is post) So if we use first differences, the dependence between the treatment and error will disappear: Yit - Yit’ = diαt + (Xit - Xit’)β + (υit - υit’)

2

Page 63: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

b) random growth estimator Assume the regression error is of the form: Uit = φ1i + t φ2i + υit Each person has a time invariant component and a person-specific rate of growth. Selection is assumed to occur on both components. This estimator require two periods of “pre treatment” data so that we can net out the person-specific growth rate. If you have that available, the estimator looks much like fixed-effects but with more pieces (see equation 3.15 in paper). Three Specification Tests: 1) Pre-program Test Using only data prior to the treatment, check if being in the (future) treatment group appears to have any effect (it shouldn’t, since no one has been treated yet). 2) Post-program Test This one requires the experiment’s control group. Using only post-treatment data, define the control group as “treated” and the comparison group as “untreated” – there should be no effect of treatment (if there is, it means the control group and comparison group aren’t sufficiently similar for the comparison group to be used in estimation). 3) Test of Model Restrictions This works for fixed-effects or random-growth estimators if you have more periods of data than you need (so if you only have 2, you can’t use this). Values of the outcome (Y) in periods other than those being used in the estimates should not have any explanatory power in the regressions, so plug them in as regressors (once using only pre-program data, and once using only post-program data, which requires the control group again). See Tables 3 & 4 for estimates with each method. See Table 5 and Table 8 for test results for the two groups (youth and AFDC women). For youths: Both the linear control function and fixed-effects estimators are eliminated. The random growth looks a bit better (note that we wouldn’t usually be able to see the tests to the right of the line on the table, because they depend on having a control group). Note from Table 3 that this eliminates the worst estimates – the remaining ones all provide the inference that there was not a statistically distinguishable effect on earnings –

3

Page 64: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

which is also the inference drawn (although much more precisely) from the experimental estimates. For AFDC women: Both the fixed and random-growth effects models are eliminated based on the model-restriction tests. The linear control function looks a bit better. Note from Table 4 that the weighted average of the linear control function estimates looks good relative to the experimental estimates. Both indicate a strong, statistically significant positive effect of the NSW program on earnings. Conclusion: Maybe our nonexperimental estimates aren’t so bad after all….

4

Page 65: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Prof. Sarah Hamersma

Notes: Crime and Young Men: The Role of Arrest, Criminal Experience and Heterogeneity by Susumu Imai, Hajime Katayama, and Kala Krishna

Goal of Paper: identify the arrest effect and the criminal experience effect on future crime Three aspects of the “arrest effect”:

1) perceived probability of future arrests increases -> deters crime 2) stigma makes it hard to get legitimate work -> encourages crime 3) training in prison develops criminal skills -> encourages crime

Two aspects of the “crime effect”:

1) learning by doing -> encourages crime 2) perceived probability decreases w/ increased undiscovered crimes -> encourages crime

Approach: It is a “mixture model”using ordered probit.

1) Incorporate random effects on the constant term to allow unobserved heterogeneity 2) Allow slope coefficients to differ across two “unobserved types” 3) ordered probit allows for the fact that most people don’t commit any crimes (back to this

later) Note: “types” are not assigned, or even directly defined, but they are endogenously inferred via the model – “the behavioral parameters for the two types as well as the posterior probability (given characteristics) that an individual is of one type or of the other should be determined so as to best replicate the data.” Findings: criminal experience effect – positive coefficient for all (both types, both before and after age 18) and statistically significant for all except criminal types after age 18 arrest effect – after age 18, arrest effect is positive for non-criminal type and negative for criminal type Model Specification ordered probit with individual random effects (recall: random effects are assumed to be uncorrelated with observables) (write equation on the board)

Page 66: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

key explanatory variables: past arrest records and past crimes committed, both coded as ln(x+1) (issues here? coding assumes a logarithmic relationship, so has to adjust for the zeros, of which there are many) outcomes: 5 categories of crime participation (note that thresholds of these categories will be estimated in the model) The authors use random effects. Q: Why? A: Despite its less appealing assumption, because they do not know of any way to estimate a mixed model like theirs with fixed effects. The authors want to include “lagged criminal experience”, but they know initial criminal experience is likely to be correlated with the random intercept, causing lagged criminal experience to have a biased coefficient. They solve this following Wooldridge (2000), who breaks down the random intercept into pieces: bi = θ0 + θ1CEi0 + θ2WEi + θ3HEi + xi’θ4 + ηi

---- ----

This is substituted in for bi in the original equation. By including a bunch of person-level averages, and “initial period criminal experience”, the authors are trying to lay out the part of bi that they fear is correlated with the lagged criminal experience. The remainder, which is ηi , is the NEW random intercept, assumed to be purged of any relationship with initial criminal experience. The final equation has a regular error term, and random effect term; we also want to allow all coefficients to vary according to the “type” (which we want our model to also uncover). This is hard to estimate! One can use Simulated ML or Simulated MM – they use something a bit different (Markov Chain Monte Carlo, or MCMC). Data National Youth Survey (NYS) - individuals aged 11-17 in first year of survey, 1976 - surveyed in 76,77,78,79,80,83,86 Sample size: there are 918 men in the sample, but only 612 have complete arrest and crime data * Each survey year includes # crimes that year; for in-between years they answer only in categories (hence the ordered probit). * Total arrests to date are given in 1980 and 1983, and last 2 years are given in 1983 and 1986. They spread the incremental arrests evenly over the intervening years.

Page 67: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Did everyone in the sample commit a crime? It is hard to tell at first, but I have concluded that the answer is NO. There are many non-criminals in the data set. Moreover, the average amount of crime is driven almost completely by the top 3% of criminals. See Figure 1 vs. Figure 2. What fraction of the sample ever committed a crime? Can’t tell – it is not reported! Figure 3 gives us a yearly participation rate, which I find hard to believe is over 35% for 11-year-olds. Results Non-criminal type: crime rate increases with age for under age 18 and falls thereafter Criminal type: crime rate continues to rise, even after age 18 – however, criminal type becomes better at avoiding arrest Two “spec tests”

1) try without allowing “types”. The results are in between the two earlier results – but this is by construction! I think the argument for using types requires a stronger basis.

2) try choosing “types” based on total crime and then running estimation separately for each group – results are not strong (nothing significant) and authors argue that this is because crime is endogenous and this method fails to pick that up.

Page 68: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/3/05

Evaluating the Econometric Evaluations of Training Programs with Experimental Data

by Robert J. LaLonde

The Point: (mostly the abstract directly from paper) This paper compares (1) the effect on trainee earnings of an employment program that was run as a field experiment where participants were randomly assigned to treatment and control groups with (2) the estimates that would have been produced by an econometrician. This comparison shows that many of the econometric procedures do not replicate the experimentally determined results, and it suggests that researchers should be aware of the potential for specification errors in other nonexperimental evaluations. This paper was very disturbing and launched a whole series of papers examining the LaLonde sample and, more generally, using experimental data to establish whether certain nonexperimental estimators can be trusted. We’ll read a few of them. The experiment: Eligible workers were randomly assigned the “treatment” of a job training/temporary employment program. Table 1 of the paper shows that this randomization was done well. LaLonde focuses on the “AFDC sample” (women on welfare) and the “Male sample.” Missing info that would be helpful: what are the eligibility standards, exactly? This would help us choose among comparison groups. From elsewhere, I found the standards - four target groups of people with longstanding employment problems: ex-offenders, former drug addicts, women who are long term recipients of welfare benefits, and school dropouts, many with criminal records. Women receiving AFDC were eligible to participate in the program only if they had been on welfare for 30 months out of the previous 36 months, had no children under the age of six, were unemployed, and had little or no work experience. A slight diversion - standard deviations and standard errors: Notice that Table 1 contains (for some entries) both the standard deviation and standard error. What’s the difference in interpretation of these two? * Standard deviation: feature of the data, measures spread from the sample mean * Standard error: estimate of the precision of our estimate of the population mean (where that estimate is the sample mean). If we have a lot of data, we have a lot of precision, so we can be fairly confident that our sample mean is close to the population. If we don’t have much data, this won’t be the case – we’ll have big standard errors. [The sample mean is just an estimator of the population mean, and remember, “an estimator is a random variable; it has a distribution”]

Page 69: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Recall the formula: standard error = sqrt((std. deviation)2/sample size) On HW 1, some of you did not match your DD std errors to the regression version. The Procedure: First step: select a comparison group whose earnings can be compared to the treatment group. (We will try to do some econometrics to sift out other differences as needed). Notice that the smaller the comparison group (as he makes them more and more like the treatment group) the closer the earnings outcomes. But none of his groups (for the AFDC analysis) condition on the specific things that made people eligible for the program in the first place (there are too few people in the survey data), so right away we might expect things to be problematic. LaLonde presents a few different methodologies to try when attempting to replicate the experimental results with comparison groups drawn from alternative data. He notes that if he uses the comparison group provided by the experiment, all of these methods replicate the correct answer (this is important – if they didn’t, we’d really have to question a particular method). The basis for the econometric models is that they are variants on the following: 1) yit = δDi + βXit + bi + nt + εit 2) εit - ρ εit-1 = υit 3) dis = αyis + γZis + ηis 4) Di = 1 if dis > 0; else Di = 0 One-step estimators: Column 4: compare the outcomes only of the trt and comp groups to get a measure of the effect of the program (this assumes that the two groups are so similar that no controls are needed – equivalent to estimating equation 1 with just a constant term and the D indicator) Column 5: compare outcomes while controlling for some Xs (a little better) [spec test: should check if regression-adjusted pre-treatment earnings are the same – i.e D should have a coefficient of 0 in that test] Column 6: fixed effects (diff-in-diff with panel data) (this is the first one that starts sounding reasonable to me – allows individual fixed effects in earnings and allows that fixed effect to influence participation through equation 3)

Page 70: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Column 7: fixed effects with age variable included (in level – change has cancelled out) Column 8: we might be concerned that the earnings trajectory would not be the same for treatment and comparison (often those treated will have recently had negative shocks). Control directly for pre-program earnings in the outcome equation, to allow pre-program earnings to affect earnings directly. (???) Column 9: add demographics Column 10: do outcome regression with all observables possible, including pre-training earnings, but leaving out AFDC Column 11: include AFDC How do the results look? Terrible!!! At least for the AFDC women; the men don’t look much better either (Tables 4 and 5). Even if you could convince yourself that 4-7 are no good, that still leaves 8-11, and there is no clear way to see how you would choose among them. Two-step estimates Recall the initial equation: yit = δDi + βXit + εit We may be concerned that εit is correlated with Di (i.e. there is selection into treatment). We may more specifically be concerned that this selection may be based on unobservables. (the errors in the participation and outcomes equation are correlated) Take expectations to sort this out (this will describe the Heckman procedure): E(yit) = δDi + βXit + E(εit| Di, Xit) We’re concerned that this last term is not zero, specifically because Di depends on the participation-equation error term, which we think may be correlated with εit. (The same unobservable things that affect your decision to participate will also affect your wages). Add and subtract a predicted value for this expectation, and rewrite as: E(yit) = δDi + βXit + E(εit| Di, Xit)HAT + [E(εit| Di, Xit)- E(εit| Di, Xit)HAT]

Page 71: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The piece in brackets can be thought of as a new error term. As long as we have some unbiased estimate of the expectation, this thing will be mean zero – we can run OLS! The only thing is we need to generate that estimate of E(εit| Di, Xit)]HAT. Heckman found that if one assumed the errors in the earnings and participation equations were jointly normally distributed, this conditional expectation is proportional to the conditional expectation of the error in the participation equation. The procedure to estimate it is to run the first stage (participation equation) with a probit, plug the predicted values into the equation (6) in LaLonde (the Mills ratio), and then include that term in an OLS regression of yit on Di and Xit. Note that you should have some variables in the participation equation that are not in the earnings equation, because otherwise the training effect is just identified by the normality assumption. (you just have the same variables coming through in two functional forms). This is called an “exclusion restriction”. Table 6 shows this type of estimation – the estimates are better, though still quite imperfect. Plus, one must make an argument for any exclusion restrictions used. Conclusion: “…this evidence suggests that policymakers should be aware that the available nonexperimental evaluations of employment and training programs may contain large and unknown biases resulting from specification errors.” Limitations:

- only certain methods were tried, based on data availability (could not control for much past employment history, which has been shown to be important; could not select a narrowly defined comparison group) - since this study, it has been clearly shown that there are two characteristics that

should define a good comparison group (1) they should be from the same geographic area as the treatment sample (2) the data for both groups should be from the same source These reduce a lot of the biases that LaLonde found in his study.

Note footnote 25 – he refers to upcoming JTPA experiments, which did then happen and have been the basis for many more studies like this one.

Page 72: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/28/05

“A Contribution to the Empirics of Economic Growth”

by Mankiw, Romer (David)*, and Weil * note: Paul Romer is a different economist, on the other side of the debate The Point: This paper questions the rejection of the Solow model of economic growth, which takes rates of savings and population growth as given, and shows that they determine the stead-state level of income per capita. It has a couple of key implications: higher rates of saving lead to richer countries and higher rates of population growth lead to poorer countries. While the signs of these effects appear to be correct in the data, the model does not correctly predict their magnitudes. This paper adds a new feature to the Solow model (human capital) and argues that this modification is sufficient to match the data. Our Reason for Studying this Paper: This paper is a nice example of a theoretical economic model that is directly estimable. Most of our discussions thus far have not been this “clean” in terms of integrating models and empirical work. The idea here is to present a specific model and see if it fits the data well. The assumptions are provided along the way – as you would guess, some of the modeling assumptions are likely made with empirical simplicity in mind. Outline of Paper:

1) Review Solow model 2) Add Human-Capital Accumulation to Solow model 3) Endogenous Growth and Convergence 4) Interest Rate Differentials and Capital Movements (we didn’t do the last two in class)

1) Review Solow Model

The basic model is: Y(t) = K(t)α(A(t)L(t))1-α 0 < α < 1 Y is output, K capital, L labor, A level of technology. L and A grow exogenously at rates n and g: L(t) = L(0)ent A(t) = A(0)egt

Assumptions: Cobb-Douglas functional form, exogenous growth rates, constant fraction of output, s, is invested.

Page 73: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Steady-state k (= K/AL) is: k* = [s/(n+g+δ)]1/1-α

Substitute k* in for K/AL and take logs to get the central predictions of the Solow model: ln[Y(t)/L(t)] = ln A(0) + gt + (α/(1-α)) ln(s) - (α/(1-α)) ln(n+g+δ) *** Predictions: 1) positive coefficient on ln(s) and negative coefficient on ln(n+g+δ) 2) the two coefficients should be of the same absolute magnitude 3) based on the empirical regularity α = 1/3, they should both have abs value of .5 Assumptions: factors are paid their marginal products. Specifications: This isn’t quite a regression yet. But now assume that A(0) may differ by country according to a country-specific shock (linear in logs): lnA(0) = α + ε Assumption: s and n are independent of ε (and g and δ are constant across countries) Starting at time 0 for simplicity (so gt = 0), we then have the model: ln[Y/L] = α + gt + (α/(1-α)) ln(s) - (α/(1-α)) ln(n+g+δ) + ε Now we can directly estimate the model with OLS. Note footnote 1: if s and n are endogenous (i.e. influenced by the level of income or correlated with the error (which is a country-specific shock)) then OLS will be inconsistent. We’d need an IV – but we don’t have one. The authors give 3 reasons for their exogeneity assumption. They note at the end of the third one that: “If OLS yields coefficients that are substantially different from the prediction, then we can reject the joint hypothesis that the Solow model and our identifying assumption are correct.” They can’t both be correct, although one could be correct and the other be responsible for the different in the estimates from the prediction. This is important. It also means that if the coefficients are NOT substantially different, we won’t know for sure if this is because both the model and regression are correct or if the two assumptions are incorrect but cancel each other out somehow.

Page 74: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Estimation: The authors estimate the equation via OLS for 3 different subsets of countries. They do an unrestricted version and a restricted version (with coeffs required to be of same abs value and opposite sign). Results: Signs are right, relative magnitudes are roughly equal, model has explanatory power Problem: magnitudes are wrong (much too large at 1.5-2.0). The implied α = .59.

2) Add Human-Capital Accumulation to Solow model

The innovation of this paper: allow for a separate “human capital stock” Y(t) = K(t)αH(t)β(A(t)L(t))1-α-β

Assumptions: same production function applies to human capital, physical capital, and consumption. HC depreciates at the same rate as physical capital. There are decreasing returns to capital, i.e. α + β < 1. New model:

ln[Y/L] = ln(A(0)) + gt + (α /(1-α-β)) ln(sk) + (β /(1-α-β)) ln(sh) - (α+ β /(1-α- β)) ln(n+g+δ) + ε

Note: sh represents the fraction of all income invested in human capital. Hypotheses: α should still be 1/3. What about β? The authors argue for a prediction of 1/3 to ½ based on the portion of labor income that “represents” the return to human capital. Econometric notes: 1) if sh was a missing variable in the first OLS equation, then we shouldn’t be surprised that

our coefficient on sk was too big. 2) if sh was a missing variable in the first OLS equation, then we shouldn’t be surprised that

the coeff on ln(n+g+δ) was often bigger than that on sk.

They also write the equation in an alternative way, with the level of human capital (rather than the share of human capital). ln[Y/L] = ln(A(0)) + gt + (α /(1-α)) ln(sk) + (β /(1-α)) ln(h*) - (α /(1-α)) ln(n+g+δ) + ε This highlights more clearly the issue of omitted variable bias in the basic Solow model – it suggests that we can expect the coefficients on capital (savings) and population growth to follow the model’s predictions IF we include the omitted variable of the human capital level.

Page 75: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

So there are two potential tests of the model – a key is finding data on h* or sh. Data: Focus on human capital as education. Note the importance of foregone earnings. Proxy for rate of human capital accumulation (sh): % of working-age people in secondary school. What are we looking for? α should still be 1/3. β should be (according to their argument) 1/3-1/2. The three coefficients should still sum to zero. Results (Table 2) look very supportive. (There is an additional section on endogenous growth and convergence that we didn’t cover in class, so I am not including notes on it here) Conclusion: Augmenting the Solow growth model in a simple way, and the corresponding estimation, seems to make it match the data much better. But next time we’ll talk about some important concerns with this paper.

Page 76: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 2/9/05

Measuring Treatment Effects Non/Semi-Parametrically: Intro to the Method of Matching

Background: Consider the ATT parameter that we want to estimate: δ = E(Y1i – Y0i | Xi, Di = 1) and recall that the hard part to estimate is: E(Y0i | Xi, Di = 1) We have discussed several ways to try to estimate this – different types of regressions, often fixed effects or instrumental variables - all of which involved some comparison with an untreated group combined with some statistical assumption about their relationship to the treated. Matching: There are some concerns with the regression approaches. Two of these can be addressed by an alternative to regression called matching. They are:

1) linearity 2) common support (I’ll explain later)

To keep it simple for now, we will just concern ourselves with the case where we assume there is selection on observables – i.e. for two workers with a given X (one treated and one not), we don’t think there is any difference between the comparison person’s outcome and the outcome the treated person would have had in the absence of treatment. Formally, we are maintaining the assumption: E(Y0i | Xi, Di = 1) = E(Y0i | Xi, Di = 0) The part on the left is the thing we need, and the thing on the right is the thing we can estimate with our data. With OLS, we assumed the variable Y was determined in a linear way by the X’s and the treatment; we just put the data for both groups together and run the regression to get an estimate of the average treatment effect. Here’s some sample data – the person number is “id”, the outcome is “y” and the treatment is “treat”. Suppose we just need to control for level of education – “ed”.

1

Page 77: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

id ed y treat 1 1 10 0 2 1 15 1 3 1 20 1 4 2 25 0 5 2 30 1 6 2 30 0 7 3 25 1 8 3 35 1 9 4 50 0 10 5 55 0 If we estimate the treatment effect with OLS we will get: . regress y ed treat Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 2, 7) = 25.64 Model | 1603.58108 2 801.790541 Prob > F = 0.0006 Residual | 218.918919 7 31.2741313 R-squared = 0.8799 -------------+------------------------------ Adj R-squared = 0.8456 Total | 1822.50 9 202.50 Root MSE = 5.5923 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ed | 9.72973 1.453656 6.69 0.000 6.292379 13.16708 treat | -1.216216 3.723177 -0.33 0.753 -10.02013 7.587699 _cons | 6.756757 4.777202 1.41 0.200 -4.539532 18.05305 ------------------------------------------------------------------------------ Looks like treatment does nothing. We will look back at these results momentarily. Matching came about in an effort to estimate essentially the same ATT parameter without using the OLS methodology or linearity assumptions. The simplest form of matching can be thought of in the following way (propensity score matching is a variation on this):

1) Using the full sample of treatments and comparison group members, divide up the sample into cells according to their X’s (for instance, if the X’s are gender and race, the cells will be white females, white males, black females, etc.). [Note that to implement this with even a few X’s, you need a big sample].

2) Within each cell, the X’s are exactly the same, so the only difference is that some are treated and some are not. Note that since we assumed systematic selection was on observables, we believe that for any treated person in a given cell, the untreated people in that cell are a good proxy for what would have happened to them in the absence of treatment.

2

Page 78: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

3) For each treated person, consider their “counterfactual” to be the average of the untreated people in their cell. Then define that person’s personal “treatment effect” as the difference between their actual outcome (with treatment) and their estimated counterfactual.

4) Average these across all of the treated people to get an estimate of the average effect of treatment on the treated.

The matching estimator for the data above will be: (5 + 10 + 2.50)/3 = 5.8 This is how the matching estimator works. Isn’t this intuitive? Comparison to Regression: Notice that we didn’t use any of the data on people with ed = 3, 4, or 5. Q: Why not? A: They can’t provide us any information about the effect of treatment for people in that range. This is what is called the “common support” issue. We can only measure the effect of treatment over regions of X where there are both treatment and comparison observations. This means that when we report our estimate, we need to be clear that we are estimating the effect of treatment on people in the ed = 1-2 range. How does the common support issue arise in regression? It is very sneaky. If you include ed linearly, those people with high levels of ed are contributing to the estimate of the slope coefficient on X, and therefore indirectly affecting the estimate of the treatment effect. You can see now that the linearity assumption and the common support issue are tightly linked – it is because of the linearity assumption that we can estimate regressions beyond the region of common support. What if we drop linearity from the regression, replacing it with dummies for each level of ed? Seems like a good idea (and it can’t hurt). It turns out we still get different answers with the two methods because of the way that the data are weighted, but they are much closer. Look at the regression: . xi: regress y i.ed treat i.ed _Ied_1-5 (naturally coded; _Ied_1 omitted) Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 5, 4) = 16.70 Model | 1739.16667 5 347.833333 Prob > F = 0.0087 Residual | 83.3333333 4 20.8333333 R-squared = 0.9543 -------------+------------------------------ Adj R-squared = 0.8971 Total | 1822.50 9 202.50 Root MSE = 4.5644 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ied_2 | 15 3.952847 3.79 0.019 4.025137 25.97486

3

Page 79: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

_Ied_3 | 13.33333 4.370037 3.05 0.038 1.200166 25.4665 _Ied_4 | 38.33333 5.892557 6.51 0.003 21.97297 54.69369 _Ied_5 | 43.33333 5.892557 7.35 0.002 26.97297 59.69369 treat | 5 3.952847 1.26 0.275 -5.974863 15.97486 _cons | 11.66667 3.72678 3.13 0.035 1.319467 22.01387 ------------------------------------------------------------------------------ The estimate treatment effect is 5 (we’ll ignore standard errors for now). Note the change from the linear version of the model (where the coefficient was negative and completely insignificant). Seems linearity was a pretty bad assumption. But even without linearity, why do we get 5 and not 5.8? First note that the issue is not in the common support – now that the ed are dummies, the link to the treatment effect estimates for those higher-ed people is broken. The regression estimates, in this case, rely only on the ed =1 & 2 people to identify the effect (though the regression doesn’t indicate that anywhere), just as we do with matching. We can verify this by seeing that we get the same treatment effect with regression even if we throw the highly educated out of the sample. . drop if ed >= 3 (4 observations deleted) . xi: regress y i.ed treat i.ed _Ied_1-2 (naturally coded; _Ied_1 omitted) Source | SS df MS Number of obs = 6 -------------+------------------------------ F( 2, 3) = 13.50 Model | 300.00 2 150.00 Prob > F = 0.0316 Residual | 33.3333333 3 11.1111111 R-squared = 0.9000 -------------+------------------------------ Adj R-squared = 0.8333 Total | 333.333333 5 66.6666667 Root MSE = 3.3333 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ied_2 | 15 2.886751 5.20 0.014 5.813069 24.18693 treat | 5 2.886751 1.73 0.182 -4.186931 14.18693 _cons | 11.66667 2.721655 4.29 0.023 3.005145 20.32819 ------------------------------------------------------------------------------ The difference comes in the way each observation contributes to the estimates. In matching, each treated person has the same weight in the estimate. In OLS, even with dummy variables, the weighting of treated observations depends on the variance of treatment, rather than the number of treated, within their cell. Let δi be the average treatment effect in cell i. Angrist and Krueger show us that: We can write the matching estimate as: δ1*(2/3) + δ2*(1/3) = 7.5*(2/3) + 2.5*(1/3) = 5.8 The regression estimate is: δ1*(1/2) + δ2*(1/2) = 5

4

Page 80: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

In the regression estimate, each cell gets the same weight because each has the same variance of treatment within the cell. In general, if there were other cells, those with large proportions of treated or untreated people would have low weights, and those with nearly equal proportions would have large weights. (This is a consequence of the least squares method itself – minimizing the squared error). Matching, on the other hand, puts equal weight on treated individuals, which amounts to putting the most weight on cells with the most treated people. Why does this matter? In cases where the effect of treatment varies across X, the matching estimate gives us a measure of the effect of treatment on those most likely to be treated. This may be the parameter of interest. Limitations: Why doesn’t everyone always just use this????? Isn’t it the answer to all of our problems? Well, it does have its limitations. Matching breaks down very easily when your sample isn’t large enough. Our exercise was actually a good example of this in some ways – some cells were too small to use (or empty). Assuming linearity “buys” you the opportunity to use all of the data you have rather than throwing some away – the “cost”, of course, is the assumption. If the assumption is good, then you can get more precision out of OLS because you’ll have much more data. [Standard errors for matching are ugly to compute; we’ll talk about them when we discuss bootstrapping]. In practice, matching is most easily implemented using the concept of propensity scores to avoid the problem of the “curse of dimensionality” created by the dozens of cells full of data that you usually need to do matching. Propensity scores are an index of a person’s probability of being treated, conditional on their characteristics – P(Di = 1 | Xi). Instead of matching on a number of characteristics, we can match on a single index of their characteristics. We can treat Pi as if it were the single Xi in our example – differing at the person level, just like Xi – and we know matching on one variable isn’t that hard because we just did it (though P is continuously distributed, which we will talk about later). What’s the catch? We must estimate P somehow. If we try to do it nonparametrically, our curse of dimensionality is back. So we typically estimate it with a parametric method – like a logit regression of Di on Xi. Don’t despair, because there’s some evidence that the specific functional form of the P (logit, probit, semiparametric) is not that important to getting good matching estimates. So propensity score matching overall is still less parametric (and thus more flexible) than OLS. The real key is getting the right X’s in the regression – there is no magic specification test, but we know one thing: since we are assuming selection on observables, we had need everything that contributes to the selection or outcomes to be incorporated into our estimate of P. Fortunately, we can also modify matching to do diff-in-diff type estimation to deal with time-invariant individual fixed effects. We’ll talk more about that next time.

5

Page 81: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/7/05

Intro to Measurement Error Issues

(based on Hausman (JEP 2001) and textbook info) The Point: Sometimes the data we have do not do a perfect job of measuring the variable of interest, due to things like people’s survey responses being imprecise. This may be true about variables on the LHS or RHS in the regression, but the implications of each are different. A number of ways have been proposed to deal with the problems created by measurement error. Main issues: (see additional pages for the derivations we did in class)

1) Classical measurement error (in X) -- OLS will underestimate the true effect

- note that the coefficients on the other variables will be biased in unknown direction - note that if more than one variable has measurement error, we have a mess

2) Getting an upper bound on correct coefficient in cases of classical measurement error

3) Measurement error in Y – results in unbiased coefficient, but loss of efficiency (big SEs)

If both (1) and (3) are true, the results are as expected (underestimated coeffs with large SEs). IV as a Solution to Mismeasurement: Caution The weak instruments problem can become ugly here, just as when IV is used for other sorts of endogeneity problems. Hahn and Hausman present a spec test – after estimating the model with 2SLS, do a 2SLS version of the reverse regression. Results should not differ much (otherwise the problem hasn’t been solved). Measurement Error in the LHS Variables: Probit and Logit If we have a binary dependent variable w/ measurement error (i.e. some people are misclassified), the results will be biased and inconsistent. Hausman and others provide some suggestions. Note: for the case where the LHS is mismeasured but continuous, we will have problems with quantile regression while OLS will remain unbiased.

Page 82: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/16/05

Workers’ Compensation and Injury Duration: Evidence from a Natural Experiment

by Meyer, Viscusi, and Durbin The Point: When workers experience adjustments in the system of benefits available if one is injured on the job, this may affect their (1) probability of injury, i.e. moral hazard (2) willingness to report an injury (3) duration of the compensation claim. This paper focuses on the third issue (arguing that the other two are already well-studied). The authors use an increase in the maximum benefit level to try to examine whether workers tend to claim compensation for longer when they are better compensated. They conclude (using data on workers in Kentucky and Michigan) that a 10% increase in the maximum benefit is associated with a 3-4% increase in the duration of workers’ compensation claims. This increase in duration does not appear to be correlated with any changes in the injury severity or composition of the workers receiving the maximum benefit. Note: the first couple of paragraphs of this paper are really good – it is a nice model for clearly stating a) what the economic issue is and b) what you will be contributing The Identification Issue: It is hard to see how workers’ compensation affects workers’ duration of time out of work because it is a direct function of previous earnings, which also affects peoples’ decision to go back to work. The two cannot easily be disentangled. The key in this paper: if we look at two different formulas for benefits over time, we can separate the effects of the compensation from previous earnings. Because of the structure of the workers’ compensation reforms, the authors are able to look at both people whose benefits were much higher under the new regime (high income) and people whose benefits did not change under the new regime (lower income). This results in a DD approach that is applied in the context of data that are durations. See Figure 1. Data and Results: The authors have data from Kentucky and Michigan on workers’ compensation claims and use their status up to 42 months after the initial claim to establish claim durations. Less than 0.5% of the sample is right-censored, and they assign a value of 42 to all of the censored observations. (It would be nice to know if these were evenly distributed pre- and post-reform, and across previous earnings levels).

Page 83: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Note that they also had an issues with durations of length 0 (5% of their records) which means less than ½ of a week. They carefully tried a couple of different ways to deal with this and made sure it didn’t matter (footnote 18). Three methods of examining the issue:

1) DD regression 2) Compare CDFs of the duration distribution 3) Use regression ln(duration), which is a general form of a duration model

1) See Table 4. Their preferred statistic of interest is the change in the mean of ln(duration). They argue that this is a better measure than simply the change in the mean duration because it is less sensitive to outliers and can be measured with more precision due to the skewed nature of the data. They find large, statistically significant effects of the policy change on ln(duration). They also demonstrate that this doesn’t appear to be a function of actually getting more medical care (a proxy for severity of injury). 2) See figures 2 and 3. They also report graphs and a table of the empirical CDF of claims durations in the data. (This is (1 – survivor function)). We can see from these graphs that after the benefit increase, there was a pretty uniform move toward slightly longer durations. The table shows the statistical significance of the difference, week by week. They also use something called a Wilcoxon two-sample rank test to semi-parametrically tests for the distributions for high earners being different over time (they are). Those for low earners don’t change. This supports the earlier finding. 3) See Table 6. The authors point out that their earlier methods did not control for other aspects of the observations, and in particular could not account for any compositional changes in terms of who decides to report injuries and claim benefits. They run a regression that is parallel to their DD estimates from earlier, with the dependent variable ln(duration). This supports the earlier finding. Important econometric note: This form of the regression is a broad way of estimating a duration model. In the absence of censoring and time-varying explanatory variables, this regression is a more general version of exponential, Weibull, and log-logistic hazard models. Each of those models has a specific assumed error structure, while the regression version can allow it to be flexible. Conclusion: The paper gets fairly tight bounds on the effect of increasing the maximum workers’ compensation maximum benefits on claim duration. They conclude a 10% increase in the maximum benefit is associated with a 3-4% increase in the duration of workers’ compensation claims.

Page 84: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 3/22/05

Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior

By John Mullahy

The Point: The count data literature tends to make the assumption that the relevant variables are exogenous. In many economic applications, there are in fact endogeneity problems. There is a “natural” type of IV approach that one could use, but this turns out to be inconsistent because of the multiplicative relationship between the “unobserved heterogeneity” Θ and the x’s (which isn’t problematic in the absence of endogeneity, or in the absence of a multiplicative relationship – but we have both). Mullahy develops a consistent way of using IV estimation by transforming the basic model so that it can be estimated via GMM. He demonstrates that his results can be very different from the other (inconsistent) IV estimates. The Model: Let y be the count-valued outcome variable of interest, and let the model be: E(y|x,η; α) = exp(xα)η = exp(α0 + x1α1)η Where η = exp(Θ) is unobserved and varies over the population. Think of it like an omitted variable. This implies a regression model, called (2) in the paper, that looks like: y = exp(xα)η + ε Where E(ε| x, η) = 0 by construction. Assume E(η) = 1 without loss of generality. It’s worth noting that we could rewrite this as: y = exp(xα + Θ) + ε Note that the unobserved heterogeneity comes in symmetrically to the other variables (linearly, inside the exp). Also note that we won’t be able to distinguish it from the error term in estimation. Mullahy states “whether the equation is obtained from a Poisson, a negative binomial, or another underlying model is largely irrelevant. The key attribute of the equation is that y is a nonnegative, though not necessarily count, variable.”

1

Page 85: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The Problem: We are worried that η might be correlated with some of our x’s. This will create a problem because it won’t be properly swallowed up in the (constant) intercept term. Recall that for consistency via MLE (or quasi-MLE, which he is using), we want E(xε) = 0. Writing this for our current setup, we can see how it looks: E(xε) = E[x*(y-exp(xα))] = E[x*(exp(xα )η + ε - exp(xα))] = E[x*( exp(xα )(η-1) + ε)] If E(η|x) = 1 (i.e. = E(η)) then we can see this will be zero – the problem is that: E[(η-1)|x) ≠ 0 in general if η depends on x. Ideas for Instrumental Variables Estimators: Assume we have a valid instrument z for the variable in x that is endogenous. “Consistency of nonlinear IV estimators is typically established on the basis of a residual function satisfying the conditional moment restriction: E[u(y,x; α)|z)] = 0.” The “natural” IV estimator isn’t going to work – see the residual: u = y – exp(xα) = exp(xα )(η-1) + ε Taking the expectation here, even conditional on z, is going to result in the same problem as before, so it won’t be = 0. If instead we assumed the model: y = exp(xα) + η + ε then we could get consistency, because the model would be additively separable in unobservables (the multiplication problem would disappear). The problem with this is that it’s hard to rationalize this kind of model because it doesn’t treat observables and unobservables symmetrically (x’s enter exponentially and η enters linearly).

2

Page 86: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

A New and Improved IV Estimator We want two things out of our estimator for it to work:

1) we want additive separability (to be able to get consistency) 2) we want the observables and unobservables to enter symmetrically

It turns out that with a transformation of the original model (equation 2, in which unobservables entered symmetrically), we can generate additive separability. Take equation 2 and divide through by exp(xα) to get: y/ exp(xα) = η + ε/exp(xα) Now define a new variable υ = η – 1, to get [y/ exp(xα)] - 1 = υ + ε/exp(xα) Note that we have additive separability now. We can now show that a new moment restriction, using a residual equal to the RHS (or LHS – doesn’t matter) above, will hold. E(residual) = E[υ + ε/exp(xα)|z] = E[υ|z] + E[ε/exp(xα)|z] = 0 ! We can estimate using GMM and this new moment condition. Something important: We haven’t made any assumptions about the form of the first-stage relationship between x and z. This is a nice feature of the method. He shows some alternative approaches but they require more assumptions. Conclusion: This new estimator is a good way to avoid bad assumptions and maintain a consistent estimator when there is endogeneity in a count data model.

3

Page 87: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/31/05

Estimating Treatment Effects with a Regression-Discontinuity Design The Simplest Version: Sharp RD The simplest version of the RD estimator (called “sharp” RD) applies when there is a treatment that is applied in some discontinuous way based on otherwise smooth data. An example would be when treatment is assigned based on a test score (ex. Martorell’s paper on graduation exams). If we focus on differences in outcomes that occur on either side of the treatment/no treatment line, we can isolate the effects of the treatment. (i.e. we can act as if treatment were randomly assigned) The general estimation strategy looks like this:

Outcome of interest (ex. future wages)

Treatment effect

Determinant of treatment (ex. test score)

This is called “sharp” discontinuity because treatment exactly coincides with a discontinuity – i.e. your test score is not merely associated with your chances of treatment, it actually determines it directly. The estimator looks like the following:

lim ( | ) lim ( |i i i iscore cutoff score cutoff

E Y score E Y score )θ+ −→ →

= −

Note that the score is in the conditional mean function (i.e. the regression) so it is allowed to affect outcomes on its own – a dummy is added for the point of discontinuity. Q: Why keep any data that is not right near the cutoff? A: We are trying to estimate the gap for the population (not just our sample) so the rest of the data will add precision to our estimate of the size of the effect. Adding One Modification: Fuzzy RD

Page 88: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

A slight variation on sharp RD is used on Matsudaira’s paper (job candidate who visited

e uses a slightly modified estimator:

last week). In his case, the treatment (bilingual education) is not directly assigned based on the discontinuity (test score cutoff), but the discontinuity is highly correlated with the treatment (90% of those below the cutoff end up treated, while almost none above the cutoff do). Let Ti indicate actual treatment (as opposed to eligibility for treatment). H

lim ( | ) lim ( | )

lim ( | ) lim ( | )

i i i iscore cutoff score cutoff

i i i iscore cutoff score cutoff

E Y score E Y score

E T score E T scoreθ

+ −

+ −

→ →

→ →

−=

ake sure you see the intuition here – we take the difference between the mean outcomes

is

ated

this familiar? It should be. This approach is equivalent to using program eligibility as

ilingual education (treatment): endogenous regressor (eligible people may select into

ligibility: instrument (predicts treatment very well (strong), does not itself predict

ey point: with RD, we do not need “exogenous” instruments; we just need to control for

ow do we interpret the estimate? Like an IV, it is a Local Average Treatment Effect,

Mof people on each side of the cutoff, and then divide through by the difference in their probability of treatment. Since in Matsudaira’s case, this difference is about 90%, this like multiplying the usual estimate by 1.1. This makes sense because the estimate he is getting of a “treatment effect” (with the numerator alone) acts as if everyone is treated, when really about 10% of the people are not getting treated. The effects for these untreated types, which are presumably zero, are getting averaged into the usual estimeffect (numerator), making it a little smaller than it should be. Dividing by .9 makes the effect a little bigger, basically adjusting for this. Isan instrument to find the treatment effect of bilingual education participation in the neighborhood of the cutoff. Btreatment based on unobservables that also affect their outcomes) Eoutcomes right near the cutoff once we control smoothly for scores themselves) Ksmooth effects and identify the treatment effect using the “jump” (discontinuity) in the instrument. Hso it measures the effect on those who became treated due to the instrument. It is also “local” in the sense that it applies to people with a certain range of values near the discontinuity (ex. near the test-score cutoff).

Page 89: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

Sarah Hamersma 1/12/04 Notes/Summary of:

Demand and Pricing in Electricity Markets: Evidence from San Diego During California’s Energy Crisis

By Reiss and White

The Point: We have a very limited understanding of how elastic the demand is for electricity, and this paper makes a contribution to this understanding using data from San Diego during the California energy crisis in 2000-2001. The authors have utility-bill data that allow them to follow households over time. They identify price-induced changes in electricity consumption by predicting anticipated monthly consumption changes relative to the same month in the previous year (on a household basis) and then using this as a comparison to actual consumption changes during the crisis. They provide quite convincing evidence that their predictions are solid for use in this way. The authors find evidence that consumption fell by about 12% on average over about 60 days (when prices more than doubled during deregulation), and then that consumption returned to within 3% of previous levels once a price cap was introduced. They also found that public appeals for energy conservation and relatively small remunerative programs induced some short-term changes in demand, but these changes did not stick. The authors suggest that their results reveal stronger responsiveness to electricity prices than others have thought existed. The Setup: Reiss and White describe the California energy crisis, and the particular situation in San Diego, very clearly. Their Figure 7 really tells the whole story in terms of the events they are following – first deregulation, then a price cap, and then 2 kinds of voluntary conservation programs (one of which provided monetary rewards). Data Issues: 1) Attrition/entry caused by people moving in and out of their homes during the sample period. This has large effects – of a sample of 70,000 HHs, they can only use 46,800. They adjust for this using weights for each HH, where a person’s weight depends on how many people “like them” (by some measures, listed in Appendix) are missing from the data due to the attrition/entry problems. If half of the people “like you” are missing, you will get a weight of “2” – it is that simple. (Measuring characteristics is the hard part – they use census tract data to get some idea of who is missing). 2) Timing issues and cohorts required them to deal very carefully with the data. People are on 21 different billing cycles (corresponding to # of weekdays in a month, usually). This means that when we are thinking about consumption relative to last year in the same

Page 90: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

month, we need to take the billing dates into account. The authors group and compare consumption data by billing cohort, and they merge in weather data according to the exact billing dates. Identification Strategy: The strategy is fundamentally a DD strategy, but it takes into account household-specific patterns in electricity use in a flexible way. It also allows a DD strategy without finding a separate group of “untreated” people. In words, the strategy is to compare: Treatment: the change in monthly electricity consumption, relative to last year in the same month, for a month POST-treatment (i.e. after the price increase) Control: the change in monthly electricity consumption, relative to last year in the same month, that we would have expected during a month post-price-increase IF there had been no price increase They use these differences to avoid problems with seasonal effects (such as annual vacations or Christmas light displays). The key assumption is that people’s energy use would have remained on the same trajectory over time in the absence of this exogenous price shock. (All of these calculations are adjusted for weather as well, and we must assume weather affects consumption in the same way before and after the crisis.) Something nice about this paper is that there is enough time-series data to estimate a separate “pre-treatment” trajectory for every household, so that there is then a HH-specific counterfactual for every month. The econometrics looks like the following:

1) Estimate household-specific energy demand changes over time, using only pre-crisis data, by regressing the monthly change in consumption (relative to the same month last year) on the change in weather relative to last year and a constant. The coefficients on weather and the constant are now our HH-specific parameters for estimating future electricity consumption.

2) Calculate HH estimates of the crisis-induced consumption change by taking the actual consumption change in a crisis month (relative to same month in previous year) and compare it to the predicted change based on the parameters estimated in part 1 and the relevant weather difference. (see equation 3)

3) We have an estimate of the consumption effect for each HH – now we add these up and average them to get an estimate of the average effect of the crisis on consumption. (This averaging is done with the special weights mentioned above).

Page 91: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

They seem to defend this very well – I can’t see any problem with the approach. In particular, they show that weather is a VERY good predictor of electricity use pre-crisis, which is important to convincing the reader that it will generate plausible counterfactuals. Results: They find a decrease in average consumption of 12-13%, which they very nicely put into context by describing the types of behavioral changes that would induce a change this large. They find that when price-caps are implemented, consumption rebounds but remains about 3% below its predicted historical levels. (This is probably due to people investing in energy-efficient appliances or patterns of behavior). The authors do something very careful and honest on page 25 – they explicitly refuse to calculate an electricity price elasticity, because they do not believe the market had time to reach the equilibrium necessary to make such a calculation. They do, however, calculate marginal valuations – this suggests that the last 5% of electricity that households use has a low value to them (about $5 on average). The authors also examine responses to “public appeal” and “compensatory” requests for voluntary conservation. They find that both have an effect, but it is transitory. I think this is a really nice, clear paper. It is very well-written and convincing. My only complaint: where are the standard errors? They may not be easy to calculate (especially with the special weighting due to attrition/entry) but there must be some way to approximate them…maybe bootstrapping? Perhaps because their fit is so good pre-crisis, they assume the reader will be convinced that any variation from it must be attributable to the crisis. (This may be a reasonable argument).

Page 92: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring

The Young Economist's Guide to Professional Etiquette

Daniel S. Hamermesh

The Journal of Economic Perspectives, Vol. 6, No. 1. (Winter, 1992), pp. 169-179.

Stable URL:

http://links.jstor.org/sici?sici=0895-3309%28199224%296%3A1%3C169%3ATYEGTP%3E2.0.CO%3B2-D

The Journal of Economic Perspectives is currently published by American Economic Association.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/aea.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals. Formore information regarding JSTOR, please contact [email protected].

http://www.jstor.orgWed Jan 10 23:20:21 2007

Page 93: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 94: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 95: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 96: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 97: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 98: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 99: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 100: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 101: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 102: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring
Page 103: Economics 7427: Econometric Methods II - carlospitta.com Info/Econometric Methods 2... · Economics 7427: Econometric Methods II University of Florida Department of Economics Spring