norc at the university of chicago the university of...

NORC at the University of Chicago

The University of Chicago

Teacher Incentives and Student Achievement: Evidence from New York City Public SchoolsAuthor(s): Roland G. FryerSource: Journal of Labor Economics, Vol. 31, No. 2 (April 2013), pp. 373-407Published by: The University of Chicago Press on behalf of the Society of Labor Economists and theNORC at the University of ChicagoStable URL: http://www.jstor.org/stable/10.1086/667757 .

Accessed: 03/05/2013 10:10

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

The University of Chicago Press, Society of Labor Economists, NORC at the University of Chicago, TheUniversity of Chicago are collaborating with JSTOR to digitize, preserve and extend access to Journal ofLabor Economics.

http://www.jstor.org

This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=ucpress

http://www.jstor.org/action/showPublisher?publisherCode=sle

http://www.jstor.org/action/showPublisher?publisherCode=norc

http://www.jstor.org/stable/10.1086/667757?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Teacher Incentives and Student
Achievement: Evidence from New
York City Public Schools

Roland G. Fryer, Harvard University and NBER

As global policy makers and school leaders look for ways to im-prove student performance, financial incentives programs for teach-

TJoeDoproKatWoWo

[©0

ers have become increasingly popular. This article describes a school-based randomized trial in over 200 New York City public schoolsdesigned to better understand the impact of teacher incentives. I findno evidence that teacher incentives increase student performance,attendance, or graduation, nor do I find evidence that these incentiveschange student or teacher behavior. If anything, teacher incentivesmay decrease student achievement, especially in larger schools. Thearticle concludes with a speculative discussion of theories to explainthese stark results.

When I was in Chicago, our teachers designed a program forperformance pay and secured a $27 million federal grant. . . . In
Chicago’s model—every adult in the building—teachers, clerks,
his pr pport ofl Kleiminiqject az forn Heerner

Journa2013

734-30

janitors and cafeteria workers—all were rewarded when theschool improved. It builds a sense of teamwork and gives thewhole school a common mission. It can transform a school cul-ture. ðUS Secretary of Education Arne Duncan, the NationalPress Club, July 27, 2010Þ

oject would not have been possible without the leadership and su
n. I am also grateful to Jennifer Bell-Ellwanger, Joanna Cannon, andue West for their cooperation in collecting the data necessary for thisnd to my colleagues Edward Glaeser, Richard Holden, and Lawrencehelpful comments and discussions. Vilsa E. Curto, Meghan L. Howard,e Park, Jorg Spenkuch, David Toniatti, Rucha Vankudre, and Marthaprovided excellent research assistance. Financial support from the Fisher
l of Labor Economics, 2013, vol. 31, no. 2, pt. 1]by The University of Chicago. All rights reserved.6X/2013/3102-0001$10.00

373



I. Introduction

374 Fryer

Human capital, especially teacher quality, is believed to be one of themost important inputs into education production. A 1 standard deviationincrease in teacher quality raises math achievement by 0.15–0.24 standarddeviations per year and reading achievement by 0.15–0.20 standard de-viations per year ðRockoff 2004; Hanushek and Rivkin 2005; Aaronson,Barrow, and Sander 2007; Kane and Staiger 2008Þ. The difficulty, however,is that one cannot identify ex ante the most productive teachers. Value-added measures are not strongly correlated with observable teacher char-acteristics ðRivkin, Hanushek, and Kain 2005; Aaronson et al. 2007; Kaneand Staiger 2008; Rockoff et al. 2008Þ. Some argue that this, coupled withthe inherent challenges in removing low-performing teachers and increasedjob market opportunities for women, contributes to the fact that teacherquality and aptitude has declined significantly in the past 40 years ðCor-coran, Evans, and Schwab 2004; Hoxby and Leigh 2004Þ.1One potential method to increase student achievement and improve the

quality of individuals selecting teaching as a profession is to provideteachers with financial incentives based on student achievement. Theo-retically, teacher incentives could have one of three effects. If teach-ers lack motivation or incentive to put effort into important inputs tothe education production function ðe.g., lesson planning, parental engage-mentÞ, financial incentives for student achievement may have a positive im-pact by motivating teachers to increase their effort. However, if teachersdo not know how to increase student achievement, the production func-tion has important complementarities outside their control, or the incen-tives are either confusing or too weak, teacher incentives may have no im-pact on achievement. Conversely, if teacher incentives have unintendedconsequences such as explicit cheating, teaching to the test, or focusing onspecific tested objectives at the expense of more general learning, teacherincentives could have a negative impact on student performance ðHolm-strom andMilgrom 1991; Jacob and Levitt 2003Þ. Similarly, some argue thatteacher incentives can decrease a teacher’s intrinsic motivation or lead toharmful competition between teachers in what some believe to be a collab-orative environment ð Johnson 1984; Firestone and Pennell 1993Þ.Despite intense opposition, there has been growing enthusiasm among

education reformers and policy makers around the world to link teacher

Foundation is gratefully acknowledged. Contact the author at [email protected]
.edu. The usual caveat applies.
1 Corcoranet al. ð2004Þfindthat in1964–71, 20%–25%ofnewfemale teacherswereranked in the top 10% of their high school cohort, while in 2000, less than 13%wereranked at the top decile. Hoxby and Leigh ð2004Þ similarly find that the share oteachers in the highest aptitude category fell from 5% in 1963 to 1% in 2000, and theshare in the lowest aptitude category rose from 16% to 36% in the same period.


f


compensation to student achievement in myriad ways.2 This is due, inpart, to the low correlation between a teacher’s observables at the time of

Teacher Incentives and Student Achievement 375

hiring and his value added and, in part, to policy makers’ belief that a newpayment scheme will attract more achievement-minded applicants. Anumber of states, including Colorado, Florida, Michigan, Minnesota,South Carolina, Tennessee, and Texas, and Washington, DC, have imple-mented statewide programs for districts and schools to provide individualand group incentives to teachers for student achievement and growth, andmany more individual school districts have implemented similar policies.In 2010, the USDepartment of Education selected 62 programs in 27 statesto receive over $1.2 billion over 5 years from the Teacher Incentive Fund.States applying for funds from “Race to the Top,” the Obama Administra-tion’s $4.4 billion initiative to reform schools, are evaluated on plans toimprove teacher and principal effectiveness by linking teacher evaluations tostudent growth and making decisions about raises, tenure, and promotionsdepending on student achievement. Similar initiatives are underway in theUnited Kingdom, Chile, Mexico, Israel, Australia, Portugal, and India.The empirical evidence on the efficacy of teacher incentives is ambig-

uous. Data from field experiments in Kenya and India yield effect sizes ofapproximately 0.20 standard deviations in math and reading when teach-ers earned average incentives of 2% and 3% of their yearly salaries, re-spectively ðGlewwe, Ilias, and Kremer 2010; Muralidharan and Sundar-araman 2011Þ. Data from a pilot initiative in Tennessee, where the averagetreatment teachers earned incentives totaling 8% of their annual salary,suggest no effect of incentives on student achievement.3

In the 2007–8 through the 2009–10 school year, the United Federationof Teachers ðUFTÞ and the New York City Department of EducationðDOEÞ implemented a teacher incentive program in over 200 high-need

2 Merit pay faces opposition from the two major unions: the American Federationof Teachers ðAFTÞ and the National Education Association ðNEAÞ. Although infavor of reforming teacher compensation systems, the AFT and the NEA officiallyobject to programs that reward teachers on the basis of student test scores and prin-cipal evaluations, while favoring instead systems that reward teachers on the basis ofadditional roles and responsibilities they take within the school or certifications andqualifications they accrue. The AFT’s official position cites the past underfunding ofsuch programs, the confusingmetrics bywhich teacherswere evaluated, and the crudebinary reward system in which there is no gradation of merit as the reasons for itsobjection. TheNEA’s official positionmaintains that any alterations in compensationshould be bargained at the local level and that a singular salary scale and a strong basesalary should be the standard for compensation.

3 Nonexperimental analyses of teacher incentive programs in theUnited States havealso shown nomeasurable success, although one should interpret these data with cau-tion due to the lack of credible causal estimates ðVigdor 2008; Glazerman,McKie, andCarey 2009Þ.



schools, distributing a total of roughly $75million to over 20,000 teachers.4

The experiment was a randomized school-based trial, with the randomi-

376 Fryer

zation conducted by the author. Each participating school could earn$3,000 for every UFT-represented staff member, which the school coulddistribute at its own discretion, if the school met the annual performancetarget set by the DOE on the basis of the school report card scores. Eachparticipating schoolwas given $1,500 perUFT staffmember if itmet at least75% of the target but not the full target. Note that the average New YorkCity ðNYCÞ public school has roughly 60 teachers; this implies a transfer of$180,000 to schools on average if they met their annual targets and a transferof $90,000 if they met at least 75% of but not the full target. In elementaryand middle schools, school report card scores hinge on student performanceand progress on state assessments, student attendance, and learning envi-ronment survey results. High schools are evaluated similarly, with gradua-tion rates, regents exams, and credits earned replacing state assessment resultsas proxies for performance and progress.An important feature of the experiment is that schools had discretion

over their incentive plans. As mentioned above, if a participating schoolmet 100% of the annual targets, it received a lump sum equivalent to$3,000 per full-time unionized teacher. Each school had the power todecide whether all of the rewards would be given to a small subset ofteachers with the highest value added, whether the winners of the rewardswould be decided by lottery, or virtually anything in between. The onlyrestriction was that schools were not allowed to distribute rewards on thebasis of seniority. Theoretically, it is unclear how to design optimalteacher incentives when the objective is to improve student achievement.Much depends on the characteristics of the education production func-tion. If, for instance, the production function is additively separable, thenindividual incentives may dominate group incentives, as the latter en-courages free riding. If, however, the production function has importantcomplementarities between teachers in the production of student achieve-ment, group incentives may be more effective at increasing achievementðBaker 2002Þ.To my surprise, an overwhelming majority of the schools decided on

a group incentive scheme that varied the individual bonus amount onlyby the position held in the school. This could be because teachers havesuperior knowledge of education production and believe the productionfunction to have important complementarities, because they feared retri-bution from other teachers if they supported individual rewards, or simplybecause this was as close to pay based on seniority ðthe UFT’s officialviewÞ that they could do.

4 The details of the program were negotiated by Chancellor Joel Klein and RandiWeingarten, along with their staffs. At the time of the negotiation, I was serving as an
advisor to Chancellor Klein and convinced both parties to agree to include randomassignment to ensure a proper evaluation.


The results from this incentive experiment are informative. Providingincentives to teachers on the basis of the school’s performance on metrics


involving student achievement, improvement, and the learning environ-ment did not increase student achievement in any statistically meaningfulway. If anything, student achievement declined. Intent-to-treat estimatesyield treatment effects of20.018 ð0.024Þ standard deviations ðhereafter jÞin mathematics and 20.014j ð0.020Þ in reading for elementary schoolsand 20.046j ð0.018Þ in math and 20.030j ð0.011Þ in reading for middleschools, per year. Thus, if an elementary school student attended schoolsthat implemented the teacher incentive program for 3 years, her test scoreswould decline by 20.054j in math and by 20.042j in reading, neither ofwhich is statistically significant. For middle school students, however, thenegative impacts are more sizable: 20.138j in math and 20.090j in readingover a 3-year period.The impact of teacher incentives on student attendance, behavioral in-

cidences, and alternative achievement outcomes such as predictive state as-sessments, course grades, regents exam scores, and high school graduationrates are all negligible. Furthermore, I findno evidence that teacher incentivesaffect teacher behavior, measured by retention in district or school, numberof personal absences, and teacher responses to the learning environmentsurvey, which partly determined whether a school received the performancebonus.I also investigate the treatment effects across a range of subsamples—

gender, race, previous-year achievement, previous-year teacher value added,previous-year teacher salary, and school size—and find that although somesubgroups seem to be affected differently by the program, none of the es-timates of the treatment effect are positive and significant if one adjusts formultiple hypothesis testing. The coefficients range from20.264j ð0.073Þ, inglobal history for white students who took that regent’s exam, to 0.114jð0.091Þ, in math state exam scores for white elementary school students.The article concludes with a ðnecessarilyÞ speculative discussion of pos-

sible explanations for the stark results, especially when one compares themwith the growing evidence from developing countries. One explanation isthat incentives are simply not effective in American public schools. Thiscould be due to a variety of reasons, including differential teacher char-acteristics, teacher training, or effort. I argue that a more likely explanation isthat, due in part to strong influence by teacher’s unions, all incentive schemespiloted thus far in the United States have been unnecessarily complex, andthus teachers cannot always predict how their efforts translate to rewards;this provides teachers with less control than incentive experiments in de-veloping countries andmay lead teachers to underestimate the expected valueof increased effort. This uncertainty and lack of control in American in-centive schemes, relative to those attempted in developing countries, mayexplain my results. Other explanations suggesting that the incentives werenot large enough, group-based incentives are ineffective, or teachers are



ignorant of the production function all contradict the data in importantways.

378 Fryer

II. A Brief Literature Review

There is a nascent but growing body of literature on the role of teacherincentives on student performance ðLavy 2002, 2009; Vigdor 2008; Glazer-man et al. 2009; Glewwe et al. 2010; Springer et al. 2010; Muralidharan andSundararaman 2011Þ, including an emerging literature on the optimal designof such incentives ðNeal 2011Þ. There are four papers, three of them outsidethe United States, that provide experimental estimates of the causal impactof teacher incentives on student achievement: Duflo and Hanna ð2005Þ,Glewwe et al. ð2010Þ, Springer et al. ð2010Þ, and Muralidharan and Sundar-araman ð2011Þ.Duflo andHanna ð2005Þ randomly sampled 60 schools in rural India and

provided them with financial incentives to reduce absenteeism. The in-centive scheme was simple; teachers’ pay was linear in their attendance, atthe rate of Rs 50 per day, after the first 10 days of each month. They foundthat the teacher absence rate was significantly lower in treatment schoolsð22%Þ compared to control schools ð42%Þ and that student achievement intreatment schools was 0.17j higher than in control schools.Glewwe et al. ð2010Þ report results from a randomized evaluation that

provided fourth through eighth grade teachers in Kenya with group in-centives based on test scores and find that while test scores increased inprogram schools in the short run, students did not retain the gains after theincentive program ended. They interpret these results as being consistentwith teachers expending effort toward short-term increases in test scoresbut not toward long-term learning.Muralidharan and Sundararaman ð2011Þ investigate the effect of indi-

vidual and group incentives in 300 schools in Andhra Pradesh, India, andfind that both group and individual incentives increased student achieve-ment by 0.12j in language and 0.16j in math in the first year, both equallysuccessful. In the second year, however, individual incentives are shown tobe more effective with an average effect of 0.27j across math and languageperformance, while group incentives had an average effect of 0.16j.Springer et al. ð2010Þ evaluated a 3-year pilot initiative on teacher in-

centives conducted in the Metropolitan Nashville School System from the2006–7 school year through the 2008–9 school year: 296 middle schoolmathematics teachers who volunteered to participate in the program wererandomly assigned to the treatment or the control group, and those as-signed to the treatment group could earn up to $15,000 as a bonus if theirstudents made gains in state mathematics test scores equivalent to the 95thpercentile in the district. They were awarded $5,000 and $10,000 if theirstudents made gains equivalent to the 80th and the 90th percentiles, re-



spectively. Springer et al. ð2010Þ found there was no significant treatmenteffect on student achievement and on measures of teachers’ response such


as teaching practices.5

The contribution of this article is threefold. First, the incentive schemeallows for schools to choose how to allocate incentive payments. Ifschools have superior knowledge of their production function ðrelative toa social plannerÞ or better knowledge about their staff, this design is op-timal. Second, this experiment is the largest on teacher incentives inAmerican public schools by orders of magnitude, and the incentivescheme is similar to those being implemented in school districts across thecountry. Third, the set of outcomes is expansive and includes informationon student achievement, student behavior, teacher retention, and teachereffort.

III. Program Details

A. Overview

On October 17, 2007, NYC’s mayor, schools chancellor, and the pres-ident of the UFT announced an initiative to provide teachers with finan-cial incentives to improve student performance, attendance, and schoolculture. The initiative was conceived as a 2-year pilot program in roughly400 of the lowest performing public schools inNYC.6 School performancewas tied to metrics used to calculate NYC’s school report card—a com-posite measure of school environment, student academic performance, andstudent academic progress. The design of the incentive scheme was left tothe discretion of the school. There were three requirements: ð1Þ incentiveswere not allowed tobe distributed according to seniority; ð2Þ schools had tocreate a compensation committee that consisted of the principal, a designeeof the principal, and two UFT staff members; and ð3Þ the committee’sdecision had to be unanimous. The committee had the responsibility ofdeciding how incentives would be distributed to each teacher and otherstaff. Below, I describe how schools were selected and the incentive scheme,and I provide an overview of the distribution of incentive rewards to schools.

5 There are several nonexperimental evaluations of teacher incentive programsin the United States, all of which report nonsignificant impact of the program on
student achievement. Glazerman et al. ð2009Þ report a nonsignificant effect of20.04 standard deviations on student test scores for the Teacher AdvancementProgram ðTAPÞ in Chicago, and Vigdor ð2008Þ reports a nonsignificant effect ofthe ABC School-Wide Bonus Program in North Carolina. Outside the UnitedStates, Lavy ð2002, 2009Þ reports significant results for teacher incentive programsin Israel.
6 The pilot program did not expand to include more schools in the second andthird years due to budget constraints, but all schools that completed the programin the first or second year were invited to participate again in the following years.



B. School Selection

380 Fryer

Table 1 provides an accounting of how the experimental sample wasselected. Eligible middle and high schools were selected on the basis of theaverage proficiency ratings on fourth and eighth grade state tests, respec-tively. Eligible elementary schools were selected on the basis of poverty ratesand student demographic characteristics, such as the percentage of Englishlanguage learners and special education students. The NYC DOE identi-fied 438 schools that met the above-mentioned eligibility criteria. Of theseschools, 34 were barred by the UFT for unknown reasons, and eight wereDistrict 75 ði.e., special educationÞ schools. The remaining 396 comprise theexperimental sample, among which 212 schools were randomly selectedby the author and offered treatment. In November 2007, schools in thetreatment group were invited to participate in the program. To formallyaccept the offer, schools were required to have at least 55% of their active

able 1ample Construction

Number ofSchools

Number of Observations

Elementary Middle High

et the eligibility criteria 438 . . . . . . . . .arred by the UnitedFederation of Teachers 34 . . . . . . . . .pecial district schools 8 . . . . . . . . .xperimental sample 396 64,373 50,413 70,826reatment 233 37,791 29,857 38,861ontrol 163 26,582 20,556 31,965ffered treatment in year 1 233 . . . . . . . . .reated in year 1 198 . . . . . . . . .ffered treatment in year 2 195 . . . . . . . . .reated in year 2 191 . . . . . . . . .ffered treatment in year 3 191 . . . . . . . . .reated in year 3 189 . . . . . . . . .alid state English languagearts exam scores . . . 61,829 47,473 . . .alid state math exam scores . . . 62,418 48,044 . . .alid regents English examscores . . . . . . 19,813 . . .alid regents math examscores . . . . . . 11,219 . . .alid regents science examscores . . . . . . 18,370 . . .alid regentsUShistory examscores . . . . . . 17,791 . . .alid regents global historyexam scores . . . . . . 21,711 . . .

NOTE.—All statistics are reported for the 2007–8 school year ðyear 1Þ, unless otherwise specified.

TS

MB

SETCOTOTOTV

VV

V

V

V

V



full-time staff represented by the UFT at the school to vote for the pro-gram.7 Schools forwarded voting results through e-mail to the DOE by


late November. Of the 212 schools randomly chosen to receive treatment,179 garnered enough votes to participate, and 33 declined treatment.8 Toincrease the number of schools eligible to participate, 21 schools were addedoff the wait list; 19 garnered the requisite votes. So, overall, 233 schools inthe experimental sample were invited to participate in the program, and198 actually participated. The final experimental sample in year 1 consistsof the lottery sample, with 233 treatment schools and 163 control schools.In the second year, 195 out of the 198 schools that received treatment in

the first year were invited to participate in the second year pilot programðthe other three schools were closed because of low performanceÞ. Ofthe 195 schools offered treatment, 191 voted to participate in the secondyear. In the third year of treatment, 191 schools that received treatment inthe second year were invited to participate; 189 schools voted to participatein the program.

C. Incentive Scheme

Figure 1 shows how the progress report card score, which is the basis forawarding incentives, is calculated. Environment, which accounts for 15% ofthe progress report card score, is derived from attendance rate ð5% of theoverall scoreÞ and learning environment surveys administered to students,teachers, and parents in the spring semester ð10%Þ. Attendance rate is aschool’s average daily attendance. While environment subscores are cal-culated identically for all schools, student performance ð25%Þ and studentprogress ð60%Þ are calculated differently for high schools versus elementaryand middle schools. In elementary and middle schools, student performancedepends on the percentage of students at grade level and the median profi-ciency rating in English language arts ðELAÞ and math state tests. In highschools, student performance is measured by 4-year and 6-year graduationrates and diploma-weighted graduation rates.9 Student progress depends onthe average changes in proficiency ratings among students and the per-centage of students making at least a year of progress in state tests for ele-

7 Repeated attempts to convince the DOE and the UFT to allow schools to opt
in to the experimental group before random assignment were unsuccessful.
8 Anecdotal evidence suggests that schools declined treatment for a variety ofreasons, including fear that more work ðnot outlined in the agreementÞ would berequired for bonuses. As one teacher in a focus group put it, “Money ain’t free.”Furthermore, some teachers in focus groups expressed resentment that anyonewould believe that teachers would be motivated by money.

9 The DOE awards different levels of diplomas—local, regents, advanced regents,andadvancedregentswithhonors—dependingonthenumberof regents examspassed.Further details on graduation requirements and progress report card score calculationcan be found on the DOEwebsite ðhttp://schools.nyc.gov/default.htmÞ.



mentary and middle schools. Student progress in high schools is measuredby the percentage of students earning more than 10 credits and regents exam

FIG. 1.—Progress report card metrics

382 Fryer


pass rates in the core subjects—English, math, science, US history, andglobal history. Schools can also earn extra credit points by exemplary gainsin proficiency ratings or credit accumulation and graduation rates amonghigh-need students such as English language learners, special educationstudents, or those in the lowest tercile in ELA andmath test scores citywide.In each of the three categories, learning environment, student perfor-

mance, and student progress, schools were evaluated by their relativeperformance in each metric compared to their peer schools and all schoolsin the city, with performance relative to peer schools weighted three timesthe weight given to performance relative to all schools citywide. However,because it is calculated using many metrics and because scores in eachmetric are calculated relative to other schools, how much effort is neededto raise the progress report card score by, say, 1 point is not obvious.Table 2 shows the number of points by which schools had to increase

their progress report card scores in the 2007–8 academic year in order tobe considered to have met their goal and receive their incentive payment.The table illustrates that the target depends on the citywide ranking basedon the previous year’s progress report card score. If, for example, anelementary school was ranked at the 20th percentile in the 2006–7 aca-demic year, it needed to increase its progress report card score by 15 pointsto meet the annual target.

1. An Example

Consider the following simplified example with an elementary schoolthat ranks at about the 10th percentile citywide and at about the 25th


percentile among its peer schools. This school would have to increase itstotal progress report card scores by 17.5 points to meet the annual target.

Table 2Progress Report Target Points

Citywide Ranking Based onthe Previous Year ðPercentileÞ Elementary and Middle High

≥85th 7.5 2≥45th and <85th 12.5 3≥15th and <45th 15 4≥5th and <15th 17.5 6<5th 20 8

NOTE.—Calculated by the author.


Let’s now assume that the school increased the attendance rate to be aboutthe 30th percentile citywide and the 75th percentile in the peer group.Then, holding everything constant, the school will increase the overallscore by 1 point. Similarly, if the school increased its performance to thesame level, the school will increase its score by 5 points. If studentprogress increased to the same level, its progress report card score willincrease by 12 points. Hence, if the peer group and district schools stay atthe same level, a low-performing school would be able to meet the annualtarget only if it dramatically increased its performance in all of the sub-areas represented in the progress report. However, because all scores arecalculated relative to other schools, some schools can reach their incentivetargets if their achievement stays constant and their peer schools under-perform in a given year.

2. A Brief Comparison with Other School DistrictIncentive Schemes

Most school districts that have implemented performance pay use similarmetrics to NYC to measure teacher performance. For example, TAP inChicago—started by Arne Duncan and described in the quote at the be-ginning of this article—rewarded teachers on the basis of classroom ob-servations ð25%Þ and school-wide student growth on Illinois state examsð75%Þ. Houston’s ASPIRE program uses school value added and teachervalue added in state exams to reward the top 25% and 50% of teachers.Alaska’s Public School Performance Incentive Program divides studentachievement into six categories and rewards teachers on the basis of theaverage movement up to higher categories. Florida’s S.T.A.R. used a similarapproach.A key difference between the incentive schemes piloted in America

thus far and those piloted in developing countries is that those in America



compare teachers’ or schools’ performance to the distribution in the dis-trict. That is, teachers are not rewarded unless the entire school satisfies a

384 Fryer

criterion or their performance is in the top X percentile of their district,despite how well any individual or group of teachers performs. NYC’sdesign rewards teachers on the basis of the school’s overall performanceonly. A teacher participating in Houston’s ASPIRE program would berewarded the predetermined bonus amount only if his teacher value addedin one subject is in the top 25% of the district, regardless of how he orhis school performs. Chicago’s TAP program rewards teachers similarly.This ambiguity—the likelihood of receiving an incentive depends on myeffort and the effort of others—may have served to flatten the functionthat maps effort into output.

D. Incentive Distribution

The lump-sum performance bonus awarded to a school was distributedto teachers in whatever way the school’s compensation committee decided.Recall that the compensation committee consisted of the principal, a designeeof the principal, and two UFT staff members. The committee was not al-lowed to vary the bonus by seniority but could differentiate the compensa-tion amount by the position held at school or by the magnitude of contri-butionmade ðe.g., teacher value addedÞ or could distribute the bonus amountequally. The committee was chosen by December of the first year, and theyreported to the UFT and the DOE their decision on how to distribute thebonus.School bonus results were announced in September of the following

year for elementary, K–8, and middle schools and in November for highschools, shortly after the DOE released progress report cards. Rewardswere distributed to teachers either by check or as an addition to theirsalary, in accordance with the distribution rule decided on by the com-pensation committee. In the first year, 104 out of 198 schools that par-ticipated met the improvement target and received the full bonus, while 18schools met at least 75% of the target and received half of the maximumincentive payment. In total, the compensation received by participatingschools totaled $22 million. In the second year, 154 out of 191 schools thatparticipated received the full bonus, while seven schools received half themaximum compensation. The total compensation awarded to schools inthe second year was $31 million. I do not have precise numbers for year 3,but the DOE claims that the total costs of the experiment was approxi-mately $75 million.Figure 2A shows the distribution of individual compensation in the

experiment. Most teachers in the schools that received the full bonus of$3,000 per staff were rewarded an amount close to $3,000. Figure 2Bpresents a histogram of the fraction of teachers receiving the same amount



in each school in order to characterize how many schools decided on anegalitarian distribution rule. More than 80% of schools chose to reward

FIG. 2.—A, Distribution of individual bonus amount; B, distribution of schoolincentives scheme.


the same bonus amount to at least 85% of the teaching staff each year.

IV. Data and Research Design

A. Data

I combined data from two sources: student-level administrative data onapproximately 1.1 million students across the five boroughs of the NYCmetropolitan area from the 2006–7 to 2009–10 school year and teacher-level human resources data on approximately 96,000 elementary andmiddle school teachers during this same time period. The student-leveldata include information on student race, gender, free- and reduced-pricelunch eligibility, behavior, attendance, matriculation with course grades,and state math and ELA test scores for students in grades 3–8. For highschool students, the data contain regents exam scores and graduation rates.Data on attendance and behavioral incidences are available for all students.The main outcome variable is an achievement test unique to New York.

The state ELA and math tests, developed byMcGraw-Hill, are high-stakeexams administered to students in the third through the eighth grade.Students in third, fifth, and seventh grades must score at level 2 or above



ðout of fourÞ on both math and ELA tests to advance to the next gradewithout attending summer school. Material covered in the math test is

386 Fryer

divided among five strands: ð1Þ number sense and operations, ð2Þ alge-bra, ð3Þ geometry, ð4Þmeasurement, and ð5Þ statistics and probability. TheELA test is designed to assess students on three learning standards: ð1Þ in-formation and understanding, ð2Þ literary response and expression, andð3Þ critical analysis and evaluation. Both tests includemultiple-choice, short-response, and extended-response questions. The ELA test also includes abrief editing task and is divided into reading and listening sections.All public-school students are required to take the math and ELA tests

unless they are medically excused or have a severe disability. Studentswith moderate disabilities or limited English proficiency must take bothtests, but they may be granted special accommodations ðadditional time,translation services, etc.Þ at the discretion of school or state adminis-trators. In the analysis, test scores are normalized to have a mean of zeroand a standard deviation of one for each grade and year across the entireNYC sample.I construct measures of attendance, behavioral problems, and grade

point average ðGPAÞ using the NYC DOE data. Attendance is measuredas the number of days present divided by the number of days present plusthe number of days absent.10 Behavioral problems are measured as thetotal number of behavioral incidences in record each year. Attendance,behavioral problems, and GPA were normalized to have a mean of zeroand a standard deviation of one by grade level each year in the full NYCsample.I use a parsimonious set of controls to aid in precision and to correct for

any potential imbalance between treatment and control groups. The mostimportant controls are achievement test scores from previous years, whichI include in all regressions. Previous year’s test scores are available formost students who were in the district in the previous year.11 I also includean indicator variable that takes on the value of one if a student is missing atest score from a previous year and zero otherwise. Other individual-levelcontrols include a mutually exclusive and collectively exhaustive set ofrace dummies, indicators for free lunch eligibility, special education sta-tus, and English language learner status. See the appendix, available in theonline version of Journal of Labor Economics, for further details.I also construct school-level controls. To do this, I assign each student

who was present at the beginning of the year, that is, in September, tothe first school attended. I construct the school-level variables on the basis

10 The DOE does not collect absence data from schools after the first 180 days,
so the attendance rate calculated is the rate in the first 180 days.
11 See table 1 for exact percentages of experimental group students with valid testscores from previous years.



of these school assignments by taking the mean value of student demo-graphic variables and student test scores for each school. Variables con-


structed this way include percentage of black, Hispanic, special education,limited English proficiency, and free-lunch-eligible students. Also con-structed is the total number of behavioral incidences in a school in the2006–7 academic year.I construct teacher-level variables from NYC Human Resources ðHRÞ

records and teacher value added data. Teacher gender and race are con-structed by taking the most recent nonmissing records from 2004 to 2010HR records. Teacher experience, or years of experience as a teacher, istaken from theOctober 2007HR file. Teacher value added ðTVAÞ data areavailable from the 2006–7 academic year until the 2008–9 academic year. Itake the TVA measured in standard deviation units and standardize thenumber by grade level each year to have a mean of zero and a standarddeviation of one in the full city sample. For teachers who taught morethan one grade, I take the average of TVA across grade levels. In addition,I construct the cumulative teacher absences using HR data on teacherattendance through May of each academic year.Table 3 provides pretreatment descriptive statistics. Columns 1–4 show

the mean and standard deviation of student and teacher characteristics inall schools in the NYC district, the experimental sample, the treatmentgroup, and the control group. In addition, the last two columns show thep-value of the difference between the mean of the entire district and thatof the experimental sample and the p-value of the difference between thetreatment group and the control group. The table of summary statisticsshows that most student and teacher characteristics are balanced betweenthe treatment and the control group. The only exceptions are the per-centage of white teachers, the percentage of Asian teachers, and TVA inmath in the 2006–7 academic year.

B. Research Design

The simplest and most direct test of any teacher incentive programwould be to examine the outcome of interest ðe.g., test scoresÞ regressedon an indicator for enrollment in the teacher incentive program forgrades g in school s in year t ðincentivei;g;s;tÞ and controls for basic studentand school characteristics, Xi and Xs, respectively:

outcomei;g;s;t 5 a1 1 b1Xi 1 g1Xs 1 dg 1 z t 1 p1incentivei;g;s;t 1 εi;g;s;t:

Yet, if schools select into teacher incentive programs because of importantunobserved determinants of academic outcomes, estimates obtained usingthe above equation may be biased. To confidently identify the causal im-pact of incentive programs, I must compare participating and nonpartici-pating schools that would have had the same academic outcomes had they



Tab

le3

Descriptive

Statistics

andCov

ariate

Balan

ce

New

York

CityDistrict

ð1Þ

Exp

erim

ental

Sample

ð2Þ

Treatment

Group

ð3Þ

Control

Group

ð4Þ

p-ValueBalance

ð1Þa

ndð2Þ

ð3Þa

ndð4Þ

Percentwhite

.119

ð.192Þ

.014

ð.022Þ

.013

ð.017Þ

.015

ð.027Þ

.00

.51

Percentblack

.365

ð.291Þ

.411

ð.265Þ

.407

ð.266Þ

.415

ð.265Þ

.00

.76

PercentHispanic

.405

ð.256Þ

.552

ð.266Þ

.556

ð.267Þ

.546

ð.265Þ

.00

.69

PercentAsian

.105

ð.162Þ

.018

ð.031Þ

.018

ð.025Þ

.019

ð.039Þ

.00

.72

Percentother

race

.006

ð.007Þ

.005

ð.005Þ

.005

ð.005Þ

.005

ð.005Þ

.01

.98

Percentmale

.502

ð.080Þ

.515

ð.071Þ

.514

ð.080Þ

.517

ð.056Þ

.00

.65

Percentfemale

.498

ð.080Þ

.485

ð.071Þ

.486

ð.080Þ

.483

ð.056Þ

.00

.65

Percentfree

lunch

eligible

.858

ð.181Þ

.959

ð.038Þ

.960

ð.040Þ

.959

ð.036Þ

.00

.92

Percentspecialeducation

.083

ð.056Þ

.110

ð.051Þ

.108

ð.050Þ

.113

ð.053Þ

.00

.30

PercentEnglishlangu

agelearner

.135

ð.143Þ

.191

ð.149Þ

.189

ð.143Þ

.195

ð.158Þ

.00

.71

2006–7ELA

score

652.952

ð17.199Þ

639.324

ð9.068Þ

639.614

ð9.214Þ

638.909

ð8.874Þ

.00

.49

2006–7mathscore

671.327

ð20.413Þ

657.193

ð14.880Þ

657.829

ð15.025Þ

656.283

ð14.680Þ

.00

.36

Eighth

gradeELA

score

661.061

ð20.446Þ

646.489

ð12.854Þ

647.425

ð9.554Þ

645.143

ð16.574Þ

.00

.44

Eighth

grademathscore

672.029

ð21.222Þ

655.689

ð6.750Þ

656.127

ð6.314Þ

655.078

ð7.372Þ

.00

.50

Schoolsize

�100

6.832

ð5.738Þ

6.356

ð4.118Þ

6.257

ð4.071Þ

6.498

ð4.192Þ

.05

.57

2006–7schoolprogressreport

card

score

53.964

ð14.377Þ

51.680

ð15.968Þ

52.100

ð16.160Þ

51.084

ð15.723Þ

.00

.53

2006–7individual

behavioral

incidences

.142

ð.167Þ

.180

ð.162Þ

.185

ð.171Þ

.174

ð.149

Þ.00

.49

2006–7school-widebehavioral

incidences

98.435

ð147.081Þ

117.699ð125.257Þ

120.335ð126.853Þ

113.933ð123.231Þ

.00

.62

Percentfemaleteachers

.822

ð.128Þ

.810

ð.101Þ

.813

ð.103Þ

.805

ð.100Þ

.05

.46

388



New

York

CityDistrict

ð1Þ

Exp

erim

ental

Sample

ð2Þ

Treatment

Group

ð3Þ

Control

Group

ð4Þ

p-ValueBalance

ð1Þa

ndð2Þ

ð3Þa

ndð4Þ

Percentmaleteachers

.178

ð.128Þ

.190

ð.101Þ

.187

ð.103Þ

.195

ð.100Þ

.05

.46

Percentwhiteteachers

.566

ð.257Þ

.378

ð.176Þ

.396

ð.186Þ

.351

ð.158Þ

.00

.03

Percentblack

teachers

.231

ð.232Þ

.337

ð.226Þ

.321

ð.225Þ

.360

ð.226Þ

.00

.14

PercentHispanic

teachers

.154

ð.155Þ

.246

ð.169Þ

.247

ð.175Þ

.245

ð.161Þ

.00

.89

PercentAsian

teachers

.044

ð.073Þ

.034

ð.030Þ

.030

ð.027Þ

.038

ð.033Þ

.00

.02

Percentother

race

teachers

.002

ð.007Þ

.003

ð.007Þ

.003

ð.008Þ

.003

ð.007Þ

.13

.92

Teacher

salary

�1,000

68.048

ð7.397Þ

66.512

ð4.043Þ

66.430

ð4.035Þ

66.628

ð4.067Þ

.00

.67

Teacher

experience

8.300

ð2.600Þ

7.900

ð2.109Þ

7.919

ð2.136Þ

7.873

ð2.078Þ

.00

.85

2006–7ELA

TVA

.044

ð.557Þ

.055

ð.528Þ

.067

ð.543Þ

.038

ð.508Þ

.67

.65

2006–7mathTVA

.024

ð.560Þ

.041

ð.590Þ

.097

ð.602Þ

2.040

ð.565Þ

.52

.04

Number

ofteachersin

school

57.276

ð27.485Þ

59.061

ð21.618Þ

59.130

ð21.768Þ

58.961

ð21.486Þ

.17

.95

Percentmissing2006–7ELA

score

.506

ð.262Þ

.501

ð.258Þ

.503

ð.259Þ

.498

ð.256Þ

.70

.86

Percentmissing2006–7mathscore

.496

ð.267Þ

.489

ð.265Þ

.491

ð.266Þ

.486

ð.263Þ

.58

.86

Percentmissingeigh

thgradeELA

score

.262

ð.173Þ

.280

ð.201Þ

.261

ð.187Þ

.306

ð.220Þ

.26

.32

Percentmissingeigh

thgrademath

score

.218

ð.147Þ

.223

ð.183Þ

.211

ð.163Þ

.240

ð.208Þ

.67

.49

Missing2006–7individualbehavioral

incidences

.112

ð.081Þ

.115

ð.071Þ

.114

ð.073Þ

.115

ð.068Þ

.49

.93

Missing2006–7school-widebehav-

ioralincidences

.030

ð.170Þ

.000

ð.000Þ

.000

ð.000Þ

.000

ð.000Þ

.00

...

Percentmissing2006–7ELA

TVA

.875

ð.060Þ

.878

ð.057Þ

.877

ð.056Þ

.880

ð.059Þ

.35

.70

Percentmissing2006–7mathTVA

.871

ð.058Þ

.872

ð.052Þ

.870

ð.053Þ

.876

ð.049Þ

.73

.34

N1,417

396

233

163

...

...

NOTE.—

ELA

5Englishlangu

agearts.E

achcolumnreportssummarystatistics

from

differentsamples.Allvariablesareschool-levelmeasuresoraverages.T

eacher

valueadded

ðTVAÞw

asstandardized

tohaveameanofzero

andastandardofdeviationofonebytestgradelevelinthefullcity

samplebefore

theschoolaverage

was

taken.M

eansandstandard

deviationsðin

parenthesesÞa

rereported.Thep-values

ofthedifferencesbetweensamplesarereported

inthelast

twocolumns.

389



both participated in the program. By definition, this involves an unob-servable counterfactual.

390 Fryer

In the forthcoming analysis, the counterfactual is constructed by ex-ploiting the random assignment of schools into treatment and controlgroups. Restricting the analysis to schools that were selected ðby the UFTand the DOEÞ to be included in the experimental sample, I can estimatethe causal impact of being offered a chance to participate in a teacher in-centive program by comparing the average outcomes of schools randomlyselected for treatment and the average outcomes of schools randomlyselected for control. Schools that were not chosen to participate form thecontrol group corresponding to the counterfactual state that would haveoccurred in treatment schools if they had not been offered a chance toparticipate.Let Ts be an indicator for a treatment school. The mean difference in

outcomes between treatment schools ðTs 5 1Þ and control schools ðTs 5 0Þis known as the “intent-to-treat” ðITTÞ effect and is estimated by re-gressing student outcomes on Ts. In theory, predetermined student schoolcharacteristics ðXi and XsÞ should have the same distribution acrosstreatment and control groups because they are statistically independent oftreatment assignment. The specifications estimated are of the form

outcomei;g;s;t 5 a2 1 b2Xi 1 g2Xs 1 p2Ts 1 dg 1 z t 1 εi;s;

where the vector of student level controls,Xi, includes a mutually inclusiveand collectively exhaustive set of race dummies, predetermined measuresof the outcome variables when possible ði.e., preintervention test scores orthe number of behavioral incidencesÞ, and indicators for gender, free-luncheligibility, special education status, and English language learner status. Theset of school-level controls, Xs, includes the school’s prelottery number ofbehavioral incidences and the percentages of students at a school who areblack, Hispanic, free-lunch eligible, English-language learners, and specialeducation students. The ITT is an average of the causal effects for studentsenrolled in treatment schools compared to those enrolled in control schools,at the time of random assignment. The ITT therefore captures the causaleffect of being offered a chance of participating in the incentive program, notof actually participating.Under several assumptions ðthat the treatment group assignment is

random, control schools are not allowed to participate in the incentiveprogram, and treatment assignment only affects outcomes through pro-gram enrollmentÞ, I can also estimate the causal impact of actually par-ticipating in the incentive program. This parameter, commonly known asthe “treatment-on-treated” ðTOTÞ effect, measures the average effect oftreatment on schools that choose to participate in the merit pay program.The TOT parameter can be estimated through a two-stage least-squares



regression of student outcomes on participation, with original treatmentassignment ðTsÞ as an instrumental variable ðIVÞ for participation. I use


the number of years a student spent in treated schools as the actualparticipation variable. The first stage equations for IV estimation take theform

incentivei;g;s;t 5 a3 1 b3Xi 1 g3Xs 1 dg 1 z t 1 p3Ts 1 εi;s;g;t;

where p3 captures the effect of treatment assignment ðTsÞ on the averagenumber of years a student spends in a treatment school. The TOT is theestimated difference in outcomes between students in schools who wereinduced into participating through treatment assignment and those in thecontrol group who would have enrolled if they had been offered thechance.

V. The Impact of Teacher Incentives

A. Student Achievement

Table 4 presents first-stage, ITT, and TOT estimates of the effect ofteacher incentives on state math and ELA test scores. Columns 1–3 reportestimates from the elementary school sample, columns 4–6 report esti-mates from middle schools, and columns 7–9 present results for a pooledsample of elementary and middle schools. I present both raw estimatesand those that contain the parsimonious set of controls described in theprevious section. Note that the coefficients in the table are normalized sothat they are in standard deviation units and represent 1-year impacts.Surprisingly, all estimates of the effect of teacher incentives on student

achievement are negative in both elementary and middle school; themiddle school results are significant at the 5% level. The ITT effect of theteacher incentive scheme is 20.014j ð0.020Þ in reading and 20.018jð0.024Þ in math for elementary schools and20.030j ð0.011Þ in reading and20.046j ð0.018Þ in math for middle schools. The effect sizes in middleschool are nontrivial—a student who attends a participating middle schoolfor 3 years of the experiment is expected to lose 0.090j in reading and0.138j in math. The TOT estimates are smaller than the ITT estimates, asthe first-stage coefficients are all larger than one.Table 5 presents results similar to table 4, but for high schools. High

school students do not take the New York state exams. Instead, they haveto take and score 55 or above in regents exams in five key subject areas tograduate with a local diploma. To graduate with a regents diploma, stu-dents had to score 65 or above in the same five required regents examareas. To graduate with an advanced regents diploma, students had tomeet the requirements for regents diploma and, in addition, score a 65or higher in the mathematics B, life science, physical science, and foreign



Tab

le4

The

Impa

ctof

Teach

erIncentives

onStud

entAch

ievemen

t,Elemen

tary

andMiddleScho

ol

Elementary*

Middley

PooledSample

First

Stage

ð1Þ

ITT ð2Þ

TOT

ð3Þ

First

Stage

ð4Þ

ITT ð5Þ

TOT

ð6Þ

First

Stage

ð7Þ

ITT ð8Þ

TOT

ð9Þ

ELA:

Raw

1.319

2.013

2.010

1.137

2.031

2.027

1.236

2.020

2.017

ð.057Þ

ð.023Þ

ð.017

Þð.0

45Þ

ð.012Þ

ð.011Þ

ð.043Þ

ð.014Þ

ð.011

ÞControl

1.323

2.014

2.010

1.137

2.030

2.026

1.236

2.020

2.016

ð.055Þ

ð.020Þ

ð.015

Þð.0

46Þ

ð.011Þ

ð.010Þ

ð.043Þ

ð.013Þ

ð.010

ÞN

students

175,894

175,894

175,894

147,141

147,141

147,141

323,317

323,317

323,317

Nschools

230

230

230

136

136

136

323

323

323

Math:

Raw

1.318

2.020

2.015

1.136

2.051

2.045

1.235

2.033

2.027

ð.057Þ

ð.026Þ

ð.020

Þð.0

45Þ

ð.019Þ

ð.017Þ

ð.043Þ

ð.017Þ

ð.014

ÞControl

1.322

2.018

2.014

1.136

2.046

2.040

1.235

2.032

2.026

ð.055Þ

ð.024Þ

ð.018

Þð.0

46Þ

ð.018Þ

ð.016Þ

ð.042Þ

ð.016Þ

ð.013

ÞN

students

176,387

176,387

176,387

147,493

147,493

147,493

324,172

324,172

324,172

Nschools

230

230

230

136

136

136

323

323

323

NOTE.—

Eachcolumnreportsresultsfrom

separateregressions.DependentvariablesarethestateELA

ðEnglishlangu

ageartsÞa

ndmathscoresstandardized

tohaveameanof

zero

andastandarddeviationofonebygradeleveleach

academ

icyearin

thefullcity

sample.Scoresfrom

all3years

ofim

plementationareused.First-stage

regressionusesthe

years

receivingtreatm

entas

theoutcomevariable

andreportsthecoefficientonthedummyvariable

forbeingrandomized

into

thetreatm

entgroup.Theintent-to-treat

ðITTÞ

estimates

report

theeffect

ofbeingassign

edto

thetreatm

entgroupusingtheordinaryleast-squares

method.Thetreatm

ent-on-treated

ðTOTÞe

stim

ates

report

theeffect

of

spendingtimein

treatedschools,usingtherandom

assign

mentinto

thetreatm

entgroupas

theinstrument.Raw

regressionscontrolfor2006–7statetest

scores,test

gradelevel

dummies,andyearfixedeffects.Controlregressionsincludestudentdem

ographic

variablesandschoolcharacteristicsas

additional

controlvariables.Standarderrors,reported

inparentheses,areclustered

atschoollevel.Number

ofstudentobservationsisreported

asan

aggregatetotalofobservationsforall3years

ofim

plementation.Number

ofschool

observationsisreported

forthe2007

–8schoolyearðyear1Þ.

*Sample

within

schoolsdesignated

aselem

entary

intheNew

York

CityDepartm

entofEducationcity

progressreport

files,as

wellas

students

ingradelevels3–

5in

schools

designated

asK–8.

ySample

within

schoolsdesignated

asmiddle

schools,as

wellas

students

ingradelevels6–

8ofschoolsdesignated

asK–8.



language regents exams.12 Table 5 presents first-stage, ITT, and TOT es-timates for the impact of teacher incentives on comprehensive English,


mathematics, science, US history, and global history regents exam scores.All exam scores were standardized to have a mean of zero and a standarddeviation of one in the full city sample each academic year.Similar to the analysis of elementary and middle schools, there is no

evidence that teacher incentives had a positive effect on achievement.Estimates of the effect of teacher incentives on high school achievementare all small and statistically insignificant. The ITT effect on the Englishregents exam score is20.003j ð0.046Þ, the effect on the integrated algebraexam score is 20.020j ð0.033Þ, and the effect on science scores is 20.019jð0.038Þ. The ITT effect on the US history exam score is 20.033j ð0.057Þ,and that on the global history exam score is 20.063j ð0.046Þ. The TOTeffect is of a comparable magnitude.Table 5 also reports treatment effects on 4-year graduation rates. The

dependent variables are a dummy for graduating in 4 years, which takesthe value one if the student graduated in 4 years and zero otherwise, and adummy for graduating in 4 years with a regents diploma, which takes thevalue one if the student graduated with a regents diploma and zero oth-erwise. Students enrolled in treatment schools were 4.4% less likely tograduate in 4 years ðwhich is statistically significant at the 5% levelÞ andwere 7.4% less likely to obtain a regent’s diploma ðstatistically significantat the 10% levelÞ. Note that during the period of the experiment, meangraduation rates fluctuated between 54% and 61%.Table 6 explores heterogeneity in treatment effects across a variety of

subsamples of the data: gender, race, free lunch eligibility, previous yearsstudent test scores, school size, TVA, and teacher salary. The coefficientsin the table are ITT estimates with a parsimonious set of controls. Allcategories are mutually exclusive and collectively exhaustive. The effect ofteacher incentives on achievement does not vary systematically across thesubsamples. For middle school students, the only exceptions are studentswho are free-lunch eligible, are attending larger schools, or are taught bymore experienced teachers ði.e., received higher salariesÞ. Among highschool students, the exceptions are students who are white or Asian,scored in the lowest tercile on eighth grade state tests, or are attending

12 Regents exams are offered in January, June, and August of each academic year
in the following subject areas: comprehensive English, algebra, geometry, trigonom-etry, chemistry, physics, biology, living environment, earth science, world history,US history, and foreign languages. In this article, I present results on comprehensiveEnglish, integrated algebra, living environment, US history, and global history regentsexam scores. Among mathematics and science exam areas, integrated algebra andliving environment were selected because the highest number of students took thoseexams. Using other exam scores gives qualitatively similar results.


Tab

le5

The

Impa

ctof

Teach

erIncentives

onStud

entAch

ievemen

t,HighScho

ol

Raw

Control

Coefficient

Standard

Error

Number

of

Students

Number

of

Schools

Coefficient

Standard

Error

Number

of

Students

Number

of

Schools

Regents

exam

scores:

English:

First

stage

1.082

ð.095

Þ55,791

101

1.069

ð.090

Þ55,791

101

ITT

.009

ð.052

Þ55,791

101

2.003

ð.046

Þ55,791

101

TOT

.009

ð.048

Þ55,791

101

2.003

ð.043

Þ55,791

101

Mathem

atics:

First

stage

1.111

ð.058

Þ55,785

134

1.101

ð.060

Þ55,785

134

ITT

2.019

ð.031

Þ55,785

134

2.020

ð.033

Þ55,785

134

TOT

2.017

ð.028

Þ55,785

134

2.018

ð.029

Þ55,785

134

Science:

First

Stage

1.042

ð.066

Þ57,364

109

1.028

ð.067

Þ57,364

109

ITT

2.021

ð.039

Þ57,364

109

2.019

ð.038

Þ57,364

109

TOT

2.020

ð.037

Þ57,364

109

2.018

ð.037

Þ57,364

109

UShistory:

First

stage

1.107

ð.094

Þ50,315

901.089

ð.090

Þ50,315

90IT

T2.028

ð.064

Þ50,315

902.033

ð.057

Þ50,315

90TOT

2.025

ð.058

Þ50,315

902.030

ð.052

Þ50,315

90Global

history:

First

stage

1.008

ð.083

Þ61,892

89.987

ð.084

Þ61,892

89IT

T2.057

ð.045

Þ61,892

892.063

ð.046

Þ61,892

89TOT

2.057

ð.045

Þ61,892

892.063

ð.046

Þ61,892

89

394



Raw

Control

Coefficient

Standard

Error

Number

of

Students

Number

of

Schools

Coefficient

Standard

Error

Number

of

Students

Number

of

Schools

Four-year

graduation:

Graduated:

First

stage

.840

ð.078Þ

27,995

87.830

ð.073

Þ27,995

87IT

T2.056

ð.024Þ

27,995

872.044

ð.021

Þ27,995

87TOT

2.066

ð.029Þ

27,995

872.053

ð.026

Þ27,995

87Regents

diploma:

First

stage

.996

ð.084Þ

15,803

85.985

ð.079

Þ15,803

85IT

T2.077

ð.045Þ

15,803

852.074

ð.043

Þ15,803

85TOT

2.078

ð.046Þ

15,803

852.075

ð.044

Þ15,803

85

NOTE.—

Eachrowreportsresultsfrom

differentregressions.Thedependentvariablesareregentsexam

scoresin

comprehensive

English,m

athem

aticsðin

tegrated

algebraÞ,s

cience

ðlivingenvironmentÞ,

UShistory,andglobal

history

forstudents

ingrades

8–12

andhighschoolgraduationoutcomes.Regents

exam

scoresarestandardized

byexam

typeeach

academ

icyearto

haveameanofzero

andastandardofdeviationofonein

thefullcity

sample.Graduationoutcomeiscoded

asdummyvariablesthat

takethevalueoneifthe

studentgraduated

highschoolorifthestudentgraduated

witharegentsdiploma,respectively,andzero

otherwise.Firststageusestheyears

that

thestudentreceived

treatm

entas

theoutcomevariableandreportsthecoefficientonthedummyvariableforbeingin

thetreatm

entgroup.T

heintent-to-treat

ðITTÞe

stim

ates

reporttheeffect

ofbeingassign

edto

thetreatm


method.Thetreatm

ent-on-treated

ðTOTÞe

stim

ates

report

theeffect

ofspendingtimein

treatedschools,usingtherandom

assign

mentinto

thetreatm

entgroupas

theinstrument.Raw

regressionscontrolforeigh

thgradeELA

ðEnglishlangu

ageartsÞa

ndmathstatetestscoresforstudentsin

grades

9–12

andseventh

gradeELA

andmathstatetestscoresforstudentsin

eigh

thgrade.Su

bstitutingseventh

gradeprevioustestscoresforallgrades

does

notqualitativelyalterresults.Raw

regressionsalso

includegradefixedeffectsandyearfixedeffects.Controlregressionscontrolforstudentdem

ographicsandschoolcharacteristicsin

addition.Standarderrors

are

clustered



asan


ofim

plementation.Number

ofschoolobservationsis

reported

forthe2007


395



Tab

le6

The

Impa

ctof

Teach

erIncentives

onAlterna

tive

Outcomes

Elementary*

Middley

Highz

First

Stage

ITT

TOT

First

Stage

ITT

TOT

First

Stage

ITT

TOT

Attendance

rate

1.308

2.017

2.013

1.121

2.019

2.017

.915

2.014

2.016

ð.054Þ

ð.021Þ

ð.016Þ

ð.045Þ

ð.025Þ

ð.022Þ

ð.068Þ

ð.054

Þð.0

59Þ

Nstudents

181,393

181,393

181,393

154,017

154,017

154,017

207,718

207,718

207,718

Nschools

230

230

230

136

136

136

8888

88Behaviorproblems

1.308

2.000

2.000

1.121

.010

.009

.915

.061

.067

ð.054Þ

ð.017Þ

ð.013Þ

ð.045Þ

ð.022Þ

ð.019Þ

ð.068Þ

ð.035

Þð.0

39Þ

Nstudents

181,393

181,393

181,393

154,017

154,017

154,017

207,718

207,718

207,718

Nschools

230

230

230

136

136

136

8888

88GPA

1.387

.005

.004

1.153

2.003

2.003

1.161

2.004

2.003

ð.094Þ

ð.041Þ

ð.029Þ

ð.050Þ

ð.031Þ

ð.027Þ

ð.082Þ

ð.033

Þð.0

28Þ

Nstudents

45,410

45,410

45,410

128,764

128,764

128,764

123,434

123,434

123,434

Nschools

201

201

201

136

136

136

...

...

...

Predictive

ELA

1.069

2.022

2.021

.947

2.021

2.022

...

...

...

ð.042Þ

ð.017Þ

ð.016Þ

ð.042Þ

ð.019Þ

ð.020Þ

...

...

...

Nstudents

108,515

108,515

108,515

49,117

49,117

49,117

...

...

...

Nschools

230

230

230

136

136

136

...

...

...

Predictive

math

1.063

2.026

2.025

.949

2.049

2.052

...

...

...

ð.042Þ

ð.020Þ

ð.019Þ

ð.042Þ

ð.022Þ

ð.024Þ

...

...

...

Nstudents

108,078

108,078

108,078

48,777

48,777

48,777

...

...

...

Nschools

230

230

230

136

136

136

...

...

...

NOTE.—

ITT5

intentto

treat;TOT5

treatm

entontreated;ELA

5Englishlangu

agearts.Eachcolumnreportsresultsfrom

separateregressions.Thedependentvariablesare

attendance

rate,thenumber

ofbehavioralp

roblems,annualGPA

ðgradepointaverageÞ,

andspringpredictive

stateexam

scoresfrom

each

academ

icyear.Alloutcomevariablesare

standardized

bygradeleveleachacadem

icyearto

haveameanofzero

andastandarddeviationofonein

thefullcity

sample.T

heregressionspecificationusedisthesameas

intables

4and5.

Standarderrors,reported

inparentheses,areclustered



asan


of

implementation.Number

ofschoolobservationsisreported

forthe2007–8schoolyearðyear1Þ.

*Sample

within

schoolsdesignated

aselem

entary

intheNew

York

CityDepartm

entofEducationcity

progressreport

files,as

wellas

students

ingradelevels3–5in

schools

designated

asK–8.

ySample

within

schoolsdesignated

asmiddle

schools,as

wellas

students

ingradelevels6–

8ofschoolsdesignated

asK–8.

zStudents

ingradelevels9–

12.



larger schools. Students in these subsamples seem to be affected morenegatively by teacher incentives.


The estimates above use the sample of students for which I haveachievement test scores. If students in treatment and control schools havedifferent rates of selection into this sample, my results may be biased. Asimple test for selection bias is to investigate the impact of the treatmentoffer on the probability of entering the sample. The results of this test,although not shown here in tabular form, demonstrate that the coefficienton treatment is small and statistically zero.13 This suggests that differentialattrition is not likely to be a concern in interpreting the results.

B. Alternative Outcomes

Thus far, I have concentrated on student progress on state assessments,the most heavily weighted element of NYC’s incentive scheme. Now,I introduce two additional measures of student performance and threemeasures of school environment: GPAs, predictive math and ELA exams,school environment surveys, attendance, and behavioral incidences. Manyof these outcomes enter directly into the incentive scheme and may be af-fected by it.Table 7 shows estimates of the impact of teacher incentives on this set of

alternative outcomes. Predictive assessments are highly correlated withthe state exams and are administered to all public school students in grades3–8 in October and May. The DOE gives several different types of pre-dictive exams, and schools can choose to use one of the options dependingon their needs. In this article, I analyze math and ELA test scores from thespring Acuity Predictive Assessment.14 Each student’s attendance rate iscalculated as the total number of days present in any school divided by thetotal number of days enrolled in any school. Attendance rate was stan-dardized by grade level to have a mean of zero and a standard deviation ofone each academic year across the full city sample. Grades were extractedfrom files containing the transcripts of all students in the district.15 Ele-mentary school students received letter grades, which were converted to a4.0 scale, and middle and high school students received numeric gradesthat ranged from 1 to 100. Students’ grades from each academic year wereaveraged to yield an annual GPA. As with test scores, GPAs were stan-dardized to have a mean of zero and a standard deviation of one amongstudents in the same grade with the same grade scale across the schooldistrict. The number of behavioral incidences was pulled from behaviordata, which record the date, level, location, and a short description of all

13 Tabular results are available from the author upon request.14 Eighth grade students did not take the spring predictive tests because they did

not have to take state exams in the following year.15 Elementary school transcripts are not available for all schools each academic

year. High school transcripts were not available until the 2008–9 academic year.



Tab

le7

The

Impa

ctof

Teach

erIncentives

onTeach

erBeh

avior

Elementary

School

Middle

School

K–8

First

Stage

ITT

TOT

First

Stage

ITT

TOT

First

Stage

ITT

TOT

Retentionin

district

1.224

.002

.002

1.233

2.006

2.005

1.290

.010

.007

ð.052Þ

ð.006Þ

ð.005Þ

ð.082Þ

ð.011Þ

ð.009Þ

ð.091Þ

ð.013

Þð.0

10Þ

Nteachers

21,700

21,700

21,700

8,289

8,289

8,289

4,693

4,693

4,693

Nschools

187

187

187

8585

8540

4040

Retentionin

school

1.224

2.007

2.005

1.233

2.027

2.022

1.290

2.000

2.000

ð.052Þ

ð.012Þ

ð.010Þ

ð.082Þ

ð.017Þ

ð.014Þ

ð.091Þ

ð.027

Þð.0

21Þ

Nteachers

21,700

21,700

21,700

8,289

8,289

8,289

4,693

4,693

4,693

Nschools

187

187

187

8585

8540

4040

Personal

absences

1.220

.275

.225

1.225

2.440

2.359

1.290

.613

.475

ð.053Þ

ð.212Þ

ð.174Þ

ð.083Þ

ð.403Þ

ð.325Þ

ð.091Þ

ð.496

Þð.3

82Þ

Nteachers

18,543

18,543

18,543

6,727

6,727

6,727

3,977

3,977

3,977

Nschools

187

187

187

8585

8540

4040

NOTE.—

Eachcolumnreportsresultsfrom

differentregressions.Thedependentvariablesareretentionin

districtandin

schoolandpersonal

absencesin

ayear.Retentionin

districtiscoded

asadummyvariablethattakes

thevalueoneiftheteacher

staysin

theNew

York

Citypublicschoold

istrictin

thenextacadem

icyearandzero

otherwise.Retention

inschooliscoded

similarly

asadummyvariable

that

takes

thevalueoneiftheteacher

stayed

inthesameschoolthenextacadem

icyear.Teacher

absencesisthenumber

ofdays

absentfrom

schoolforpersonal

reasonsin

anacadem

icyear.Outcomevariablesfrom

thefirst2years

ofim

plementationareused.Firststageusesthenumber

ofyears

receiving

treatm

entas

theoutcomevariableandreportsthecoefficientonthedummyvariableforbeingin

thetreatm

entgroup.T

heintent-to-treat

ðITTÞe

stim

ates

reporttheeffect

ofbeing

assign

edto

thetreatm


method.Thetreatm

ent-on-treated

ðTOTÞe

stim

ates

report

theeffect

ofteachingat

treatedschools,usingthe

random

assign

mentinto

thetreatm

entgroupas

theinstrument.Teacher

dem

ographicvariablesandthe2006–7teacher

valueareincluded

ascontrols.S

tandarderrors,reported

inparentheses,are

clustered

atschoollevel.N

umber

ofteacher


asan

aggregatetotalo

fobservationsforthefirst2yearsofim

plementation.N

umber

ofschool


forthe2007




incidences. The total number of incidences attributed to a student in anacademic year across all schools and grades he attended was calculated and


standardized by grade level to have a mean of zero and a standard devi-ation of one each academic year across the full city sample.Results from predictive assessments provide an identical portrait to

that depicted by state test scores. The effect of the teacher incentive pro-gram on predictive ELA exams is negative and statistically insignificant,with the ITT effect equal to 20.022j ð0.017Þ in the elementary schoolsample and20.021j ð0.019Þ in themiddle school sample. The ITT effect onpredictive math exams is20.026j ð0.020Þ in the elementary school sampleand 20.049j ð0.022Þ in the middle school sample. Note that the effect ofteacher incentives on middle school students’ predictive math exam scoresis negative and statistically significant, consistent with the state test scorefindings.Teacher incentives have a statistically insignificant effect on other al-

ternative student outcomes. The ITT and TOT effects on attendance rate,which enters directly in the calculation of progress report card scores,are negative across all school levels. The ITT effect is estimated to be20.017j ð0.021Þ in the elementary school sample, 20.019j ð0.025Þ in themiddle school sample, and 20.014j ð0.054Þ in the high school sample.The effects on behavioral incidences and GPAs are similarly small andinsignificant.

C. Teacher Behavior

In this section, I estimate the impact of the teacher incentive program ontwo important teacher behaviors: absences and retention. I assign teach-ers to treatment or control groups if they were assigned to a treatment or acontrol school, respectively, in October 2007. I only include teachers whowere teaching at schools in the randomization sample in 2007 and ignoreall who enter the system afterward.I measure retention in two ways: in school and in district—both of

which were constructed using HR data provided by DOE. Retention inschool was constructed as a dummy variable that takes the value one if ateacher was associated with the same school in the following academicyear and zero otherwise. Retention in district ismore complicated. Like thecoding of retention in school, I construct a dummy variable that takesthe value one if a teacher was found in theNYC school district’s HR file inthe following academic year and zero otherwise. But there are two im-portant caveats. First, charter schools and high schools are not included inthe NYC public school district’s HR files, and, therefore, some teacherswho left the district may have simply moved to teach at charter schoolsor high schools in the district. As the same types of teacher certificatesqualify teachers to teach in both middle and high schools, it is possible



that some teachers who left the district from middle schools went to teachat high schools. It is unlikely, however, that a significant number of

400 Fryer

elementary school teachers obtained new certificates to qualify for teach-ing in middle schools. Therefore, I divided the sample of teachers intoelementary, middle, and K–8 school samples and estimated the treatmenteffects separately on each sample. To measure absences, I obtained thenumber of personal absences as of May for teachers who did not exit thesystem.Table 8 presents results on the impact of teacher incentives on measures

of teacher behavior. There is no evidence that teacher incentives affectteacher attendance or retention in either district or school. Elementaryschool teachers in treatment schools were 0.2% more likely to stay in theNYC school district, were 0.7% less likely to stay at the same school inthe following academic year, and took 0.275 more days of personal ab-sences. Middle school teachers exhibit similar patterns. None of theseeffects are statistically significant, nor are they economically meaningful.

VI. Discussion

The previous sections demonstrate that the teacher incentive schemepiloted in 200 NYC public schools did not increase achievement. Ifanything, achievement may have declined as a result of the experiment.Yet, incentive schemes in developing countries have proven successful atincreasing achievement.In this section, I consider four explanations for these stark differences:

ð1Þ incentives may not have been large enough, ð2Þ the incentive schemewas too complex, ð3Þ group-based incentives may not be effective, andð4Þ teachers may not know how they can improve student performance.Using the analysis, along with data gleaned from other experiments, Iargue that the most likely explanation is that the NYC incentive scheme,along with all other American pilot initiatives thus far, is too complex andprovides teachers with too little control. It is important to note that Icannot rule out the possibility that other unobservable differences be-tween the developing countries and America ðe.g., teacher motivationÞproduce the differences.

A. Incentives Were Not Large Enough

One potential explanation for the stark results is that the incentivessimply were not large enough. There are two reasons that the incentives toincrease achievement in NYC may have been too small. First, althoughschools had discretion over how to distribute the incentives to teachers ifthey met their performance targets, an overwhelming majority of themchose to pay teachers equally. These types of egalitarian distributionmethods can induce free riding and undercut individual incentives to put



Tab

le8

The

Impa

ctof

Teach

erIncentives

onTeach

erSu

rvey

Results

Elementary

Middle

K–8

High

First

Stage

ITT

TOT

First

Stage

ITT

TOT

First

Stage

ITT

TOT

First

Stage

ITT

TOT

Response

rate

1.667

.018

.011

1.740

.019

.011

1.883

2.011

2.006

1.559

.037

.024

ð.074Þ

ð.023Þ

ð.013

Þð.1

03Þ

ð.040Þ

ð.022Þ

ð.100

Þð.0

61Þ

ð.030Þ

ð.136Þ

ð.028Þ

ð.017Þ

Nschools

187

187

187

9696

9640

4040

8787

87Safety/respect

1.667

2.088

2.053

1.740

.210

.121

1.884

2.558

2.296

1.559

.095

.061

ð.074Þ

ð.126Þ

ð.075

Þð.1

03Þ

ð.167Þ

ð.091Þ

ð.100

Þð.2

63Þ

ð.130Þ

ð.135Þ

ð.158Þ

ð.096Þ

Nschools

187

187

187

9696

9639

3939

8686

86Community

1.667

.071

.042

1.740

.072

.041

1.884

2.583

2.309

1.559

2.000

2.000

ð.074Þ

ð.133Þ

ð.078

Þð.1

03Þ

ð.188Þ

ð.102Þ

ð.100

Þð.2

83Þ

ð.138Þ

ð.135Þ

ð.177Þ

ð.108Þ

Nschools

187

187

187

9696

9639

3939

8686

86Engagement

1.667

.046

.027

1.740

.089

.051

1.884

2.403

2.214

1.559

.024

.016

ð.074Þ

ð.129Þ

ð.076

Þð.1

03Þ

ð.177Þ

ð.097Þ

ð.100

Þð.2

78Þ

ð.134Þ

ð.135Þ

ð.161Þ

ð.098Þ

Nschools

187

187

187

9696

9639

3939

8686

86Academ

icexpectations

1.667

2.006

2.004

1.740

.088

.050

1.884

2.506

2.268

1.559

.101

.065

ð.074Þ

ð.129Þ

ð.076

Þð.1

03Þ

ð.181Þ

ð.099Þ

ð.100

Þð.2

87Þ

ð.139Þ

ð.135Þ

ð.163Þ

ð.098Þ

Nschools

187

187

187

9696

9639

3939

8686

86

NOTE.—

Each

column

reportsresultsfrom

separateregressions.

Thedependentvariablesareteacher

survey

response

ratesand

subscores,

standardized

by

schoollevel

ðelem

entary,m

iddle,andhighÞtohaveameanofzero

andastandarddeviationofone.Theintent-to-treat

ðITTÞe

stim

ates

reporttheeffect

ofbeingassign

edto

thetreatm

entgroup

usingtheordinaryleast-squares

method,andthetreatm

ent-on-treated

ðTOTÞe

stim

ates

report

theeffect

ofreceivingtreatm

ent,withtherandom

assign

mentas

theinstrument.

Regressionscontrolforstudentdem

ographicsandpreviousachievementmeasures.Standarderrors

arereported

inparentheses.N

umber

ofschoolobservationsisreported

forthe

2007




in effort. Moreover, an overwhelming majority of teachers in schools thatmet the annual target earned an amount close to $3,000. This is less than

402 Fryer

4.1% of the average annual teacher salary in the sample. One might thinkthat the bonus was simply not large enough for teachers to put in moreeffort, although similar incentive schemes in India ð3%Þ and Kenya ð2%Þwere relatively smaller.Second, the measures used to calculate the progress report card scores

directly influence other accountability measures such as the AYP ðad-equate yearly progressÞ that determine whether a school will be subjectedto regulations or even be closed, which results in all staff losing their jobs.Hence, all teachers in poor-performing schools, including all treatmentand control schools in the experiment, have incentives to perform well onthe precise measures that were being incentivized. Thus, it is not clearwhether the teacher incentive program provides additional incentives, atthe margin, for teachers to behave differently.A brief look at the results of the Project on Incentives in Teaching

ðPOINTÞ, a pilot initiative in Nashville, Tennessee, suggests that a largerincentive in schools that are not under pressure by AYP was still not anymore effective. Teachers in POINT treatment schools were selected fromthe entire school district and could earn up to $15,000 in a year solely onthe basis of their students’ test scores. Teachers whose performance was atlower thresholds could earn $5,000–$10,000. The maximum amount isroughly 31% of the average teacher salary in Nashville. Springer et al.ð2010Þ find that even though about half of the participating teachers couldhave reached the lowest bonus threshold if their students answered onaverage two or three more out of 55 items correctly, student achievementdid not increase significantly more in classrooms taught by treatmentteachers. Moreover, they report that treatment teachers did not seem tochange their instructional practices or effort level. Considering the NewYork results alongside those of the POINT pilot leaves open the possi-bility that if the magnitude of rewards is, in fact, too small to influenceteacher behavior, then teachers’ labor supply could be so inelastic that theinvestment required to influence behavior at the margin may be too largeto be relevant to policy discussions.

B. Incentive Scheme Was Too Complex

In the experiment it was difficult, if not impossible, for teachers toknow how much effort they should exert or how that effort influencesstudent achievement because of the complexity of the progress report cardsystem used in NYC. For example, the performance score for elementaryand middle schools is calculated using the percentage of students at theproficiency level and the median proficiency rating in state tests. Recallthat the performance score depends on how a school performs compared



to its peer schools that had a similar student achievement level in theprevious year and compared to all schools in the district. But it is highly


unlikely that teachers can predict at which percentile their school will beplaced relative to the peer group and the district in these measures ofperformance if the school increased the overall student achievement by,for example, 1 standard deviation.Similarly, the POINT pilot in Tennessee, like the pilots in other

American school districts, contained an incentive scheme that was depen-dent on the performance of others rather than simpler incentive schemessuch as those in Duflo and Hanna ð2005Þ, Glewwe et al. ð2010Þ, andMuralidharan and Sundararaman ð2011Þ. Lack of experimentation withsimple incentives schemes like those that have been successful in the de-veloping world makes it impossible to draw definitive conclusions abouttheir effectiveness in American public schools; however, this growingbody of international evidence strongly suggests that teachers respond tosimple, individualized incentives by modifying behavior in desirableways. It is plausible that ambiguities in the incentives structures employedin New York and Tennessee may have served to flatten the function thatmaps effort into expected reward.

C. Group-Based Rewards Are Ineffective

Although schools were given flexibility to choose their own incentiveschemes, the vast majority of them settled on a group-based scheme.Group-based incentive schemes introduce the potential for free riding andmay be ineffective under certain conditions. Yet, in some contexts, they havebeen shown to be effective. For example, Muralidharan and Sundararamanð2011Þ found that the group incentive scheme in government-run schoolsin India had a positive and significant effect on student achievement. How-ever, the authors stress that 92% of treatment schools had between twoand five teachers, with an average of 3.28 teachers in each treated school.Similar results are obtained in Glewwe et al. ð2010Þ, where the averagenumber of teachers per school was 12. Provided that NYC public schoolshave 60 teachers on average, the applicability of the results from theseanalyses is suspect. When there are only three ðor 12Þ teachers in a school,monitoring and penalizing those teachers who shirk their responsibilityis less costly. Whatever the mechanism, the success of international in-centives programs in smaller schools suggests that individual- or small-group-focused incentives may be effective modifiers of behavior.However, Lavy ð2002Þ also suggests that group-based incentives may

be effective in larger schools. His nonexperimental evaluation of theteacher incentives intervention in Israel, in which teachers were incen-tivized on the average number of credit units per student, the proportionof students receiving a matriculation certificate, and the dropout rate,



reveals that the program had a positive and significant impact on the av-erage number of credits and test scores. The average number of teachers

404 Fryer

in the treatment schools in Israel is approximately 80, closer to the averagenumber of teachers in a school in NYC.

D. Teachers Are Ignorant, Not Lazy

Evidence from student incentive experiments reveals that programs thatdirectly incentivize inputs to the education production function are morelikely to influence student behavior and improve student learning out-comes than those that incentivize outcome measures like test scores orgrades ðFryer 2011Þ. While economic theory predicts that outcome ex-periments should be more effective, this relies on the premise that studentsknow the best way to improve their academic skills and increase effortefficiently. In reality, despite enthusiasm about the opportunity to profitfrom their achievement, students reveal ignorance about their own edu-cation production function that makes output incentives programs inef-fective in terms of improving student achievement.Teachers who are offered incentives for outcomes that are poorly un-

derstood and inherently high variance may face an analogous predica-ment. Since the teacher incentives that have been tested in Americanschools uniformly incentivize student learning outputs, teachers are re-quired to know effective strategies for promoting student achievement,which, while it is reasonable to expect professional educators to hold thisknowledge, is not necessarily the case. While Duflo and Hanna’s ð2005Þinput experiment revealed a profound influence of input incentives onteacher behavior, table 8 reveals no statistically significant changes inteacher behavior in NYC.So, if teachers are indeed ignorant of the function that maps their own

effort into student achievement, how might that affect their performanceand, ultimately, student learning? There are two ways I might imagine ateacher to behave in the face of this type of uncertainty: ð1Þ maintain thestatus quo and hope for the best or ð2Þ invest time and effort into strategiesfor improving outcomes with unknown expected payoffs. On the onehand, if teachers are ignorant of the education production function, theymay not see any value in expending extra effort into a production functionwhose output they cannot predict. If teachers choose 1 and do not respondto incentives, I would expect to see no statistically significant differencesbetween the treatment and control groups, which is consistent with a widerange of the results.On the other hand, there remain a surprising number of statistically

significant and negative estimates in the middle school cohort that aredifficult to explain. One explanation is that teachers are responding to theincentives, but in counterproductive ways. If a teacher invests excess time



into practices or interventions that are less efficient in producing studentachievement than their normal practices, I would expect especially mo-


tivated and misinformed teachers to overinvest time in ineffective prac-tices at the expense of student learning. In addition, the locus of negativeresults at the middle school level may suggest that employing new strat-egies is riskier in a middle school setting.The most striking evidence against the hypothesis that the results are

driven by teachers’ lack of knowledge of the production function is driv-ing the results is presented in table 8, which displays treatment effects onfive areas of the teacher survey that partly determined 10% of the school’soverall progress report score. Even if they are ignorant of the educationproduction function, survey response rates suggest that teachers are notexhibiting behavior that is consistent with a misguided but high effortresponse to the incentives program.As before, I present first-stage, ITT, and TOT estimates for each de-

pendent variable. The first outcome is the teachers’ response rate to thelearning environment survey. The next four outcomes are the teacher’saverage responses to four areas of the survey questions: safety and respect,communication, engagement, and academic expectations. Questions in thesafety and respect section ask whether a school provides a physically andemotionally secure learning environment. The communication area ex-amines how well a school communicates its academic goals and require-ments to the community. The engagement area measures the degree towhich a school involves students, parents, and educators to promotelearning. Questions in the academic expectations area measure how well aschool develops rigorous academic goals for students. The scores werestandardized to have a mean of zero and a standard deviation of one byschool level in the full city sample.One might predict that teachers in the incentive program would be

more likely to fill out the survey and give higher scores to their schoolsgiven that they can increase the probability of receiving the performancebonus by doing so. This requires no knowledge of the production func-tion—just an understanding of the incentive scheme. Table 8 reveals thattreatment teachers were not significantly more likely to fill out schoolsurveys. The mean response rate at treatment schools was 64% in the2007–8 academic year and 76% in the 2008–9 academic year. This mayindicate that teachers did not even put in the minimum effort of filling outteacher surveys in order to earn the bonus.

References

Aaronson, Daniel, Lisa Barrow, and William Sander. 2007. Teachers andstudent achievement in the Chicago public high schools. Journal of LaborEconomics 25, no. 1:95–135.



Baker, George. 2002. Distortion and risk in optimal incentive contracts.Journal of Human Resources 37:728–51.

C

D

F

F

G

G

H

H

H

Ja

Jo

K

L

—

M

406 Fryer

orcoran, Sean P., William N. Evans, and Robert M. Schwab. 2004.Changing labor-market opportunities for women and the quality ofteachers, 1957–2000. American Economic Review 94, no. 2:230–35.uflo, Esther, and Rema Hanna. 2005. Monitoring works: Getting teach-ers to come to school. Working Paper no. 11880, National Bureau ofEconomic Research, Cambridge, MA.irestone, William A., and James R. Pennell. 1993. Teacher commitment,working conditions, and differential incentive policies. Review of Ed-ucational Research 63, no. 4:489–525.ryer, Roland G. 2011. Financial incentives and student achievement:Evidence from randomized trials. Quarterly Journal of Economics 126,no. 4:1755–98.lazerman, Steven, Allison McKie, and Nancy Carey. 2009. An evaluationof the Teacher Advancement Program ðTAPÞ inChicago: Year one impactreport. Mathematica Policy Research Report no. 6319-520, Washington,DC.lewwe, Paul, Nauman Ilias, and Michael Kremer. 2010. Teacher in-centives. American Economic Journal 2, no. 3:205–27.anushek, Eric, and Steven Rivkin. 2005. Teachers, schools and academicachievement. Econometrica 73, no. 2:417–58.olmstrom, Bengt, and Paul Milgrom. 1991. Multitask principal-agentanalyses: Incentive contracts, asset ownership, and job design. Journalof Law, Economics and Organization 7:24–52.oxby, Caroline M., and Andrew Leigh. 2004. Pulled away or pushedout? Explaining the decline of teacher aptitude in the United States.American Economic Review 94, no. 2:236–40.cob, Brian A., and Steven D. Levitt. 2003. Rotten apples: An investi-gation of the prevalence and predictors of teacher cheating. QuarterlyJournal of Economics 118, no. 3:843–77.hnson, Susan M. 1984. Merit pay for teachers: A poor prescription forreform. Harvard Education Review 54, no. 2:175–85.ane, Thomas J., andDouglasO. Staiger. 2008. Estimating teacher impactson student achievement: An experimental validation.Working Paper no.14607, National Bureau of Economic Research, Cambridge, MA.avy, Victor. 2002. Evaluating the effect of teachers’ group performanceincentives on pupil achievement. Journal of Political Economy 110,no. 6:1286–1317.——. 2009. Performance pay and teachers’ effort, productivity, andgrading ethics. American Economic Review 99, no. 5:1979–2021.uralidharan, Karthik, and Venkatesh Sundararaman. 2011. Teacherperformance pay: Experimental evidence from India. Journal of PoliticalEconomy 119, no. 1:39–77.



Neal, Derek. 2011. The design of performance pay systems in education.Working Paper no. 16710, National Bureau of Economic Research,


Cambridge, MA.Rivkin, Steven G., Eric A. Hanushek, and John F. Kain. 2005. Teachers,schools, and academic achievement. Econometrica 73, no. 2:417–58.

Rockoff, Jonah E. 2004. The impact of individual teachers on studentachievement: Evidence from panel data. American Economic Review 94,no. 2:247–52.

Rockoff, Jonah E., Brian A. Jacob, Thomas J. Kane, and Douglas O.Staiger. 2008. Can you recognize an effective teacher when you recruitone?Working Paper no. 14485, National Bureau of Economic Research,Cambridge, MA.

Springer, Matthew G., Dale Ballou, Laura S. Hamilton, Vi-Nhuan Le,J. R. Lockwood, Daniel F. McCaffrey, Matthew Pepper, and Brian M.Stecher. 2010. Teacher pay for performance: Experimental evidence fromthe project on incentives in teaching. Nashville: National Center on Per-formance Incentives.

Vigdor, Jacob L. 2008. Teacher salary bonuses in North Carolina. Work-ing Paper no. 15, National Center for Analysis of Longitudinal Data inEducation Research, Durham, NC.



norc at the university of chicago the university of...

Documents