norc at the university of chicago the university of...
TRANSCRIPT
NORC at the University of Chicago
The University of Chicago
Teacher Incentives and Student Achievement: Evidence from New York City Public SchoolsAuthor(s): Roland G. FryerSource: Journal of Labor Economics, Vol. 31, No. 2 (April 2013), pp. 373-407Published by: The University of Chicago Press on behalf of the Society of Labor Economists and theNORC at the University of ChicagoStable URL: http://www.jstor.org/stable/10.1086/667757 .
Accessed: 03/05/2013 10:10
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].
.
The University of Chicago Press, Society of Labor Economists, NORC at the University of Chicago, TheUniversity of Chicago are collaborating with JSTOR to digitize, preserve and extend access to Journal ofLabor Economics.
http://www.jstor.org
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Teacher Incentives and Student
Achievement: Evidence from NewYork City Public Schools
Roland G. Fryer, Harvard University and NBER
As global policy makers and school leaders look for ways to im-prove student performance, financial incentives programs for teach-
TJoeDoproKatWoWo
[©0
ers have become increasingly popular. This article describes a school-based randomized trial in over 200 New York City public schoolsdesigned to better understand the impact of teacher incentives. I findno evidence that teacher incentives increase student performance,attendance, or graduation, nor do I find evidence that these incentiveschange student or teacher behavior. If anything, teacher incentivesmay decrease student achievement, especially in larger schools. Thearticle concludes with a speculative discussion of theories to explainthese stark results.
When I was in Chicago, our teachers designed a program forperformance pay and secured a $27 million federal grant. . . . In
Chicago’s model—every adult in the building—teachers, clerks,his pr pport ofl Kleiminiqject az forn Heerner
Journa2013
734-30
janitors and cafeteria workers—all were rewarded when theschool improved. It builds a sense of teamwork and gives thewhole school a common mission. It can transform a school cul-ture. ðUS Secretary of Education Arne Duncan, the NationalPress Club, July 27, 2010Þ
oject would not have been possible without the leadership and su
n. I am also grateful to Jennifer Bell-Ellwanger, Joanna Cannon, andue West for their cooperation in collecting the data necessary for thisnd to my colleagues Edward Glaeser, Richard Holden, and Lawrencehelpful comments and discussions. Vilsa E. Curto, Meghan L. Howard,e Park, Jorg Spenkuch, David Toniatti, Rucha Vankudre, and Marthaprovided excellent research assistance. Financial support from the Fisherl of Labor Economics, 2013, vol. 31, no. 2, pt. 1]by The University of Chicago. All rights reserved.6X/2013/3102-0001$10.00
373
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
I. Introduction
374 Fryer
Human capital, especially teacher quality, is believed to be one of themost important inputs into education production. A 1 standard deviationincrease in teacher quality raises math achievement by 0.15–0.24 standarddeviations per year and reading achievement by 0.15–0.20 standard de-viations per year ðRockoff 2004; Hanushek and Rivkin 2005; Aaronson,Barrow, and Sander 2007; Kane and Staiger 2008Þ. The difficulty, however,is that one cannot identify ex ante the most productive teachers. Value-added measures are not strongly correlated with observable teacher char-acteristics ðRivkin, Hanushek, and Kain 2005; Aaronson et al. 2007; Kaneand Staiger 2008; Rockoff et al. 2008Þ. Some argue that this, coupled withthe inherent challenges in removing low-performing teachers and increasedjob market opportunities for women, contributes to the fact that teacherquality and aptitude has declined significantly in the past 40 years ðCor-coran, Evans, and Schwab 2004; Hoxby and Leigh 2004Þ.1One potential method to increase student achievement and improve the
quality of individuals selecting teaching as a profession is to provideteachers with financial incentives based on student achievement. Theo-retically, teacher incentives could have one of three effects. If teach-ers lack motivation or incentive to put effort into important inputs tothe education production function ðe.g., lesson planning, parental engage-mentÞ, financial incentives for student achievement may have a positive im-pact by motivating teachers to increase their effort. However, if teachersdo not know how to increase student achievement, the production func-tion has important complementarities outside their control, or the incen-tives are either confusing or too weak, teacher incentives may have no im-pact on achievement. Conversely, if teacher incentives have unintendedconsequences such as explicit cheating, teaching to the test, or focusing onspecific tested objectives at the expense of more general learning, teacherincentives could have a negative impact on student performance ðHolm-strom andMilgrom 1991; Jacob and Levitt 2003Þ. Similarly, some argue thatteacher incentives can decrease a teacher’s intrinsic motivation or lead toharmful competition between teachers in what some believe to be a collab-orative environment ð Johnson 1984; Firestone and Pennell 1993Þ.Despite intense opposition, there has been growing enthusiasm among
education reformers and policy makers around the world to link teacher
Foundation is gratefully acknowledged. Contact the author at [email protected]
.edu. The usual caveat applies.1 Corcoranet al. ð2004Þfindthat in1964–71, 20%–25%ofnewfemale teacherswereranked in the top 10% of their high school cohort, while in 2000, less than 13%wereranked at the top decile. Hoxby and Leigh ð2004Þ similarly find that the share oteachers in the highest aptitude category fell from 5% in 1963 to 1% in 2000, and theshare in the lowest aptitude category rose from 16% to 36% in the same period.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
f
compensation to student achievement in myriad ways.2 This is due, inpart, to the low correlation between a teacher’s observables at the time of
Teacher Incentives and Student Achievement 375
hiring and his value added and, in part, to policy makers’ belief that a newpayment scheme will attract more achievement-minded applicants. Anumber of states, including Colorado, Florida, Michigan, Minnesota,South Carolina, Tennessee, and Texas, and Washington, DC, have imple-mented statewide programs for districts and schools to provide individualand group incentives to teachers for student achievement and growth, andmany more individual school districts have implemented similar policies.In 2010, the USDepartment of Education selected 62 programs in 27 statesto receive over $1.2 billion over 5 years from the Teacher Incentive Fund.States applying for funds from “Race to the Top,” the Obama Administra-tion’s $4.4 billion initiative to reform schools, are evaluated on plans toimprove teacher and principal effectiveness by linking teacher evaluations tostudent growth and making decisions about raises, tenure, and promotionsdepending on student achievement. Similar initiatives are underway in theUnited Kingdom, Chile, Mexico, Israel, Australia, Portugal, and India.The empirical evidence on the efficacy of teacher incentives is ambig-
uous. Data from field experiments in Kenya and India yield effect sizes ofapproximately 0.20 standard deviations in math and reading when teach-ers earned average incentives of 2% and 3% of their yearly salaries, re-spectively ðGlewwe, Ilias, and Kremer 2010; Muralidharan and Sundar-araman 2011Þ. Data from a pilot initiative in Tennessee, where the averagetreatment teachers earned incentives totaling 8% of their annual salary,suggest no effect of incentives on student achievement.3
In the 2007–8 through the 2009–10 school year, the United Federationof Teachers ðUFTÞ and the New York City Department of EducationðDOEÞ implemented a teacher incentive program in over 200 high-need
2 Merit pay faces opposition from the two major unions: the American Federationof Teachers ðAFTÞ and the National Education Association ðNEAÞ. Although infavor of reforming teacher compensation systems, the AFT and the NEA officiallyobject to programs that reward teachers on the basis of student test scores and prin-cipal evaluations, while favoring instead systems that reward teachers on the basis ofadditional roles and responsibilities they take within the school or certifications andqualifications they accrue. The AFT’s official position cites the past underfunding ofsuch programs, the confusingmetrics bywhich teacherswere evaluated, and the crudebinary reward system in which there is no gradation of merit as the reasons for itsobjection. TheNEA’s official positionmaintains that any alterations in compensationshould be bargained at the local level and that a singular salary scale and a strong basesalary should be the standard for compensation.
3 Nonexperimental analyses of teacher incentive programs in theUnited States havealso shown nomeasurable success, although one should interpret these data with cau-tion due to the lack of credible causal estimates ðVigdor 2008; Glazerman,McKie, andCarey 2009Þ.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
schools, distributing a total of roughly $75million to over 20,000 teachers.4
The experiment was a randomized school-based trial, with the randomi-
376 Fryer
zation conducted by the author. Each participating school could earn$3,000 for every UFT-represented staff member, which the school coulddistribute at its own discretion, if the school met the annual performancetarget set by the DOE on the basis of the school report card scores. Eachparticipating schoolwas given $1,500 perUFT staffmember if itmet at least75% of the target but not the full target. Note that the average New YorkCity ðNYCÞ public school has roughly 60 teachers; this implies a transfer of$180,000 to schools on average if they met their annual targets and a transferof $90,000 if they met at least 75% of but not the full target. In elementaryand middle schools, school report card scores hinge on student performanceand progress on state assessments, student attendance, and learning envi-ronment survey results. High schools are evaluated similarly, with gradua-tion rates, regents exams, and credits earned replacing state assessment resultsas proxies for performance and progress.An important feature of the experiment is that schools had discretion
over their incentive plans. As mentioned above, if a participating schoolmet 100% of the annual targets, it received a lump sum equivalent to$3,000 per full-time unionized teacher. Each school had the power todecide whether all of the rewards would be given to a small subset ofteachers with the highest value added, whether the winners of the rewardswould be decided by lottery, or virtually anything in between. The onlyrestriction was that schools were not allowed to distribute rewards on thebasis of seniority. Theoretically, it is unclear how to design optimalteacher incentives when the objective is to improve student achievement.Much depends on the characteristics of the education production func-tion. If, for instance, the production function is additively separable, thenindividual incentives may dominate group incentives, as the latter en-courages free riding. If, however, the production function has importantcomplementarities between teachers in the production of student achieve-ment, group incentives may be more effective at increasing achievementðBaker 2002Þ.To my surprise, an overwhelming majority of the schools decided on
a group incentive scheme that varied the individual bonus amount onlyby the position held in the school. This could be because teachers havesuperior knowledge of education production and believe the productionfunction to have important complementarities, because they feared retri-bution from other teachers if they supported individual rewards, or simplybecause this was as close to pay based on seniority ðthe UFT’s officialviewÞ that they could do.
4 The details of the program were negotiated by Chancellor Joel Klein and RandiWeingarten, along with their staffs. At the time of the negotiation, I was serving as an
advisor to Chancellor Klein and convinced both parties to agree to include randomassignment to ensure a proper evaluation.This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
The results from this incentive experiment are informative. Providingincentives to teachers on the basis of the school’s performance on metrics
Teacher Incentives and Student Achievement 377
involving student achievement, improvement, and the learning environ-ment did not increase student achievement in any statistically meaningfulway. If anything, student achievement declined. Intent-to-treat estimatesyield treatment effects of20.018 ð0.024Þ standard deviations ðhereafter jÞin mathematics and 20.014j ð0.020Þ in reading for elementary schoolsand 20.046j ð0.018Þ in math and 20.030j ð0.011Þ in reading for middleschools, per year. Thus, if an elementary school student attended schoolsthat implemented the teacher incentive program for 3 years, her test scoreswould decline by 20.054j in math and by 20.042j in reading, neither ofwhich is statistically significant. For middle school students, however, thenegative impacts are more sizable: 20.138j in math and 20.090j in readingover a 3-year period.The impact of teacher incentives on student attendance, behavioral in-
cidences, and alternative achievement outcomes such as predictive state as-sessments, course grades, regents exam scores, and high school graduationrates are all negligible. Furthermore, I findno evidence that teacher incentivesaffect teacher behavior, measured by retention in district or school, numberof personal absences, and teacher responses to the learning environmentsurvey, which partly determined whether a school received the performancebonus.I also investigate the treatment effects across a range of subsamples—
gender, race, previous-year achievement, previous-year teacher value added,previous-year teacher salary, and school size—and find that although somesubgroups seem to be affected differently by the program, none of the es-timates of the treatment effect are positive and significant if one adjusts formultiple hypothesis testing. The coefficients range from20.264j ð0.073Þ, inglobal history for white students who took that regent’s exam, to 0.114jð0.091Þ, in math state exam scores for white elementary school students.The article concludes with a ðnecessarilyÞ speculative discussion of pos-
sible explanations for the stark results, especially when one compares themwith the growing evidence from developing countries. One explanation isthat incentives are simply not effective in American public schools. Thiscould be due to a variety of reasons, including differential teacher char-acteristics, teacher training, or effort. I argue that a more likely explanation isthat, due in part to strong influence by teacher’s unions, all incentive schemespiloted thus far in the United States have been unnecessarily complex, andthus teachers cannot always predict how their efforts translate to rewards;this provides teachers with less control than incentive experiments in de-veloping countries andmay lead teachers to underestimate the expected valueof increased effort. This uncertainty and lack of control in American in-centive schemes, relative to those attempted in developing countries, mayexplain my results. Other explanations suggesting that the incentives werenot large enough, group-based incentives are ineffective, or teachers are
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
ignorant of the production function all contradict the data in importantways.
378 Fryer
II. A Brief Literature Review
There is a nascent but growing body of literature on the role of teacherincentives on student performance ðLavy 2002, 2009; Vigdor 2008; Glazer-man et al. 2009; Glewwe et al. 2010; Springer et al. 2010; Muralidharan andSundararaman 2011Þ, including an emerging literature on the optimal designof such incentives ðNeal 2011Þ. There are four papers, three of them outsidethe United States, that provide experimental estimates of the causal impactof teacher incentives on student achievement: Duflo and Hanna ð2005Þ,Glewwe et al. ð2010Þ, Springer et al. ð2010Þ, and Muralidharan and Sundar-araman ð2011Þ.Duflo andHanna ð2005Þ randomly sampled 60 schools in rural India and
provided them with financial incentives to reduce absenteeism. The in-centive scheme was simple; teachers’ pay was linear in their attendance, atthe rate of Rs 50 per day, after the first 10 days of each month. They foundthat the teacher absence rate was significantly lower in treatment schoolsð22%Þ compared to control schools ð42%Þ and that student achievement intreatment schools was 0.17j higher than in control schools.Glewwe et al. ð2010Þ report results from a randomized evaluation that
provided fourth through eighth grade teachers in Kenya with group in-centives based on test scores and find that while test scores increased inprogram schools in the short run, students did not retain the gains after theincentive program ended. They interpret these results as being consistentwith teachers expending effort toward short-term increases in test scoresbut not toward long-term learning.Muralidharan and Sundararaman ð2011Þ investigate the effect of indi-
vidual and group incentives in 300 schools in Andhra Pradesh, India, andfind that both group and individual incentives increased student achieve-ment by 0.12j in language and 0.16j in math in the first year, both equallysuccessful. In the second year, however, individual incentives are shown tobe more effective with an average effect of 0.27j across math and languageperformance, while group incentives had an average effect of 0.16j.Springer et al. ð2010Þ evaluated a 3-year pilot initiative on teacher in-
centives conducted in the Metropolitan Nashville School System from the2006–7 school year through the 2008–9 school year: 296 middle schoolmathematics teachers who volunteered to participate in the program wererandomly assigned to the treatment or the control group, and those as-signed to the treatment group could earn up to $15,000 as a bonus if theirstudents made gains in state mathematics test scores equivalent to the 95thpercentile in the district. They were awarded $5,000 and $10,000 if theirstudents made gains equivalent to the 80th and the 90th percentiles, re-
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
spectively. Springer et al. ð2010Þ found there was no significant treatmenteffect on student achievement and on measures of teachers’ response such
Teacher Incentives and Student Achievement 379
as teaching practices.5
The contribution of this article is threefold. First, the incentive schemeallows for schools to choose how to allocate incentive payments. Ifschools have superior knowledge of their production function ðrelative toa social plannerÞ or better knowledge about their staff, this design is op-timal. Second, this experiment is the largest on teacher incentives inAmerican public schools by orders of magnitude, and the incentivescheme is similar to those being implemented in school districts across thecountry. Third, the set of outcomes is expansive and includes informationon student achievement, student behavior, teacher retention, and teachereffort.
III. Program Details
A. Overview
On October 17, 2007, NYC’s mayor, schools chancellor, and the pres-ident of the UFT announced an initiative to provide teachers with finan-cial incentives to improve student performance, attendance, and schoolculture. The initiative was conceived as a 2-year pilot program in roughly400 of the lowest performing public schools inNYC.6 School performancewas tied to metrics used to calculate NYC’s school report card—a com-posite measure of school environment, student academic performance, andstudent academic progress. The design of the incentive scheme was left tothe discretion of the school. There were three requirements: ð1Þ incentiveswere not allowed tobe distributed according to seniority; ð2Þ schools had tocreate a compensation committee that consisted of the principal, a designeeof the principal, and two UFT staff members; and ð3Þ the committee’sdecision had to be unanimous. The committee had the responsibility ofdeciding how incentives would be distributed to each teacher and otherstaff. Below, I describe how schools were selected and the incentive scheme,and I provide an overview of the distribution of incentive rewards to schools.
5 There are several nonexperimental evaluations of teacher incentive programsin the United States, all of which report nonsignificant impact of the program on
student achievement. Glazerman et al. ð2009Þ report a nonsignificant effect of20.04 standard deviations on student test scores for the Teacher AdvancementProgram ðTAPÞ in Chicago, and Vigdor ð2008Þ reports a nonsignificant effect ofthe ABC School-Wide Bonus Program in North Carolina. Outside the UnitedStates, Lavy ð2002, 2009Þ reports significant results for teacher incentive programsin Israel.6 The pilot program did not expand to include more schools in the second andthird years due to budget constraints, but all schools that completed the programin the first or second year were invited to participate again in the following years.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
B. School Selection
380 Fryer
Table 1 provides an accounting of how the experimental sample wasselected. Eligible middle and high schools were selected on the basis of theaverage proficiency ratings on fourth and eighth grade state tests, respec-tively. Eligible elementary schools were selected on the basis of poverty ratesand student demographic characteristics, such as the percentage of Englishlanguage learners and special education students. The NYC DOE identi-fied 438 schools that met the above-mentioned eligibility criteria. Of theseschools, 34 were barred by the UFT for unknown reasons, and eight wereDistrict 75 ði.e., special educationÞ schools. The remaining 396 comprise theexperimental sample, among which 212 schools were randomly selectedby the author and offered treatment. In November 2007, schools in thetreatment group were invited to participate in the program. To formallyaccept the offer, schools were required to have at least 55% of their active
able 1ample Construction
Number ofSchools
Number of Observations
Elementary Middle High
et the eligibility criteria 438 . . . . . . . . .arred by the UnitedFederation of Teachers 34 . . . . . . . . .pecial district schools 8 . . . . . . . . .xperimental sample 396 64,373 50,413 70,826reatment 233 37,791 29,857 38,861ontrol 163 26,582 20,556 31,965ffered treatment in year 1 233 . . . . . . . . .reated in year 1 198 . . . . . . . . .ffered treatment in year 2 195 . . . . . . . . .reated in year 2 191 . . . . . . . . .ffered treatment in year 3 191 . . . . . . . . .reated in year 3 189 . . . . . . . . .alid state English languagearts exam scores . . . 61,829 47,473 . . .alid state math exam scores . . . 62,418 48,044 . . .alid regents English examscores . . . . . . 19,813 . . .alid regents math examscores . . . . . . 11,219 . . .alid regents science examscores . . . . . . 18,370 . . .alid regentsUShistory examscores . . . . . . 17,791 . . .alid regents global historyexam scores . . . . . . 21,711 . . .
NOTE.—All statistics are reported for the 2007–8 school year ðyear 1Þ, unless otherwise specified.
TS
MB
SETCOTOTOTV
VV
V
V
V
V
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
full-time staff represented by the UFT at the school to vote for the pro-gram.7 Schools forwarded voting results through e-mail to the DOE by
Teacher Incentives and Student Achievement 381
late November. Of the 212 schools randomly chosen to receive treatment,179 garnered enough votes to participate, and 33 declined treatment.8 Toincrease the number of schools eligible to participate, 21 schools were addedoff the wait list; 19 garnered the requisite votes. So, overall, 233 schools inthe experimental sample were invited to participate in the program, and198 actually participated. The final experimental sample in year 1 consistsof the lottery sample, with 233 treatment schools and 163 control schools.In the second year, 195 out of the 198 schools that received treatment in
the first year were invited to participate in the second year pilot programðthe other three schools were closed because of low performanceÞ. Ofthe 195 schools offered treatment, 191 voted to participate in the secondyear. In the third year of treatment, 191 schools that received treatment inthe second year were invited to participate; 189 schools voted to participatein the program.
C. Incentive Scheme
Figure 1 shows how the progress report card score, which is the basis forawarding incentives, is calculated. Environment, which accounts for 15% ofthe progress report card score, is derived from attendance rate ð5% of theoverall scoreÞ and learning environment surveys administered to students,teachers, and parents in the spring semester ð10%Þ. Attendance rate is aschool’s average daily attendance. While environment subscores are cal-culated identically for all schools, student performance ð25%Þ and studentprogress ð60%Þ are calculated differently for high schools versus elementaryand middle schools. In elementary and middle schools, student performancedepends on the percentage of students at grade level and the median profi-ciency rating in English language arts ðELAÞ and math state tests. In highschools, student performance is measured by 4-year and 6-year graduationrates and diploma-weighted graduation rates.9 Student progress depends onthe average changes in proficiency ratings among students and the per-centage of students making at least a year of progress in state tests for ele-
7 Repeated attempts to convince the DOE and the UFT to allow schools to opt
in to the experimental group before random assignment were unsuccessful.8 Anecdotal evidence suggests that schools declined treatment for a variety ofreasons, including fear that more work ðnot outlined in the agreementÞ would berequired for bonuses. As one teacher in a focus group put it, “Money ain’t free.”Furthermore, some teachers in focus groups expressed resentment that anyonewould believe that teachers would be motivated by money.
9 The DOE awards different levels of diplomas—local, regents, advanced regents,andadvancedregentswithhonors—dependingonthenumberof regents examspassed.Further details on graduation requirements and progress report card score calculationcan be found on the DOEwebsite ðhttp://schools.nyc.gov/default.htmÞ.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
mentary and middle schools. Student progress in high schools is measuredby the percentage of students earning more than 10 credits and regents exam
FIG. 1.—Progress report card metrics
382 Fryer
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
pass rates in the core subjects—English, math, science, US history, andglobal history. Schools can also earn extra credit points by exemplary gainsin proficiency ratings or credit accumulation and graduation rates amonghigh-need students such as English language learners, special educationstudents, or those in the lowest tercile in ELA andmath test scores citywide.In each of the three categories, learning environment, student perfor-
mance, and student progress, schools were evaluated by their relativeperformance in each metric compared to their peer schools and all schoolsin the city, with performance relative to peer schools weighted three timesthe weight given to performance relative to all schools citywide. However,because it is calculated using many metrics and because scores in eachmetric are calculated relative to other schools, how much effort is neededto raise the progress report card score by, say, 1 point is not obvious.Table 2 shows the number of points by which schools had to increase
their progress report card scores in the 2007–8 academic year in order tobe considered to have met their goal and receive their incentive payment.The table illustrates that the target depends on the citywide ranking basedon the previous year’s progress report card score. If, for example, anelementary school was ranked at the 20th percentile in the 2006–7 aca-demic year, it needed to increase its progress report card score by 15 pointsto meet the annual target.
1. An Example
Consider the following simplified example with an elementary schoolthat ranks at about the 10th percentile citywide and at about the 25th
percentile among its peer schools. This school would have to increase itstotal progress report card scores by 17.5 points to meet the annual target.
Table 2Progress Report Target Points
Citywide Ranking Based onthe Previous Year ðPercentileÞ Elementary and Middle High
≥85th 7.5 2≥45th and <85th 12.5 3≥15th and <45th 15 4≥5th and <15th 17.5 6<5th 20 8
NOTE.—Calculated by the author.
Teacher Incentives and Student Achievement 383
Let’s now assume that the school increased the attendance rate to be aboutthe 30th percentile citywide and the 75th percentile in the peer group.Then, holding everything constant, the school will increase the overallscore by 1 point. Similarly, if the school increased its performance to thesame level, the school will increase its score by 5 points. If studentprogress increased to the same level, its progress report card score willincrease by 12 points. Hence, if the peer group and district schools stay atthe same level, a low-performing school would be able to meet the annualtarget only if it dramatically increased its performance in all of the sub-areas represented in the progress report. However, because all scores arecalculated relative to other schools, some schools can reach their incentivetargets if their achievement stays constant and their peer schools under-perform in a given year.
2. A Brief Comparison with Other School DistrictIncentive Schemes
Most school districts that have implemented performance pay use similarmetrics to NYC to measure teacher performance. For example, TAP inChicago—started by Arne Duncan and described in the quote at the be-ginning of this article—rewarded teachers on the basis of classroom ob-servations ð25%Þ and school-wide student growth on Illinois state examsð75%Þ. Houston’s ASPIRE program uses school value added and teachervalue added in state exams to reward the top 25% and 50% of teachers.Alaska’s Public School Performance Incentive Program divides studentachievement into six categories and rewards teachers on the basis of theaverage movement up to higher categories. Florida’s S.T.A.R. used a similarapproach.A key difference between the incentive schemes piloted in America
thus far and those piloted in developing countries is that those in America
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
compare teachers’ or schools’ performance to the distribution in the dis-trict. That is, teachers are not rewarded unless the entire school satisfies a
384 Fryer
criterion or their performance is in the top X percentile of their district,despite how well any individual or group of teachers performs. NYC’sdesign rewards teachers on the basis of the school’s overall performanceonly. A teacher participating in Houston’s ASPIRE program would berewarded the predetermined bonus amount only if his teacher value addedin one subject is in the top 25% of the district, regardless of how he orhis school performs. Chicago’s TAP program rewards teachers similarly.This ambiguity—the likelihood of receiving an incentive depends on myeffort and the effort of others—may have served to flatten the functionthat maps effort into output.
D. Incentive Distribution
The lump-sum performance bonus awarded to a school was distributedto teachers in whatever way the school’s compensation committee decided.Recall that the compensation committee consisted of the principal, a designeeof the principal, and two UFT staff members. The committee was not al-lowed to vary the bonus by seniority but could differentiate the compensa-tion amount by the position held at school or by the magnitude of contri-butionmade ðe.g., teacher value addedÞ or could distribute the bonus amountequally. The committee was chosen by December of the first year, and theyreported to the UFT and the DOE their decision on how to distribute thebonus.School bonus results were announced in September of the following
year for elementary, K–8, and middle schools and in November for highschools, shortly after the DOE released progress report cards. Rewardswere distributed to teachers either by check or as an addition to theirsalary, in accordance with the distribution rule decided on by the com-pensation committee. In the first year, 104 out of 198 schools that par-ticipated met the improvement target and received the full bonus, while 18schools met at least 75% of the target and received half of the maximumincentive payment. In total, the compensation received by participatingschools totaled $22 million. In the second year, 154 out of 191 schools thatparticipated received the full bonus, while seven schools received half themaximum compensation. The total compensation awarded to schools inthe second year was $31 million. I do not have precise numbers for year 3,but the DOE claims that the total costs of the experiment was approxi-mately $75 million.Figure 2A shows the distribution of individual compensation in the
experiment. Most teachers in the schools that received the full bonus of$3,000 per staff were rewarded an amount close to $3,000. Figure 2Bpresents a histogram of the fraction of teachers receiving the same amount
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
in each school in order to characterize how many schools decided on anegalitarian distribution rule. More than 80% of schools chose to reward
FIG. 2.—A, Distribution of individual bonus amount; B, distribution of schoolincentives scheme.
Teacher Incentives and Student Achievement 385
the same bonus amount to at least 85% of the teaching staff each year.
IV. Data and Research Design
A. Data
I combined data from two sources: student-level administrative data onapproximately 1.1 million students across the five boroughs of the NYCmetropolitan area from the 2006–7 to 2009–10 school year and teacher-level human resources data on approximately 96,000 elementary andmiddle school teachers during this same time period. The student-leveldata include information on student race, gender, free- and reduced-pricelunch eligibility, behavior, attendance, matriculation with course grades,and state math and ELA test scores for students in grades 3–8. For highschool students, the data contain regents exam scores and graduation rates.Data on attendance and behavioral incidences are available for all students.The main outcome variable is an achievement test unique to New York.
The state ELA and math tests, developed byMcGraw-Hill, are high-stakeexams administered to students in the third through the eighth grade.Students in third, fifth, and seventh grades must score at level 2 or above
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
ðout of fourÞ on both math and ELA tests to advance to the next gradewithout attending summer school. Material covered in the math test is
386 Fryer
divided among five strands: ð1Þ number sense and operations, ð2Þ alge-bra, ð3Þ geometry, ð4Þmeasurement, and ð5Þ statistics and probability. TheELA test is designed to assess students on three learning standards: ð1Þ in-formation and understanding, ð2Þ literary response and expression, andð3Þ critical analysis and evaluation. Both tests includemultiple-choice, short-response, and extended-response questions. The ELA test also includes abrief editing task and is divided into reading and listening sections.All public-school students are required to take the math and ELA tests
unless they are medically excused or have a severe disability. Studentswith moderate disabilities or limited English proficiency must take bothtests, but they may be granted special accommodations ðadditional time,translation services, etc.Þ at the discretion of school or state adminis-trators. In the analysis, test scores are normalized to have a mean of zeroand a standard deviation of one for each grade and year across the entireNYC sample.I construct measures of attendance, behavioral problems, and grade
point average ðGPAÞ using the NYC DOE data. Attendance is measuredas the number of days present divided by the number of days present plusthe number of days absent.10 Behavioral problems are measured as thetotal number of behavioral incidences in record each year. Attendance,behavioral problems, and GPA were normalized to have a mean of zeroand a standard deviation of one by grade level each year in the full NYCsample.I use a parsimonious set of controls to aid in precision and to correct for
any potential imbalance between treatment and control groups. The mostimportant controls are achievement test scores from previous years, whichI include in all regressions. Previous year’s test scores are available formost students who were in the district in the previous year.11 I also includean indicator variable that takes on the value of one if a student is missing atest score from a previous year and zero otherwise. Other individual-levelcontrols include a mutually exclusive and collectively exhaustive set ofrace dummies, indicators for free lunch eligibility, special education sta-tus, and English language learner status. See the appendix, available in theonline version of Journal of Labor Economics, for further details.I also construct school-level controls. To do this, I assign each student
who was present at the beginning of the year, that is, in September, tothe first school attended. I construct the school-level variables on the basis
10 The DOE does not collect absence data from schools after the first 180 days,
so the attendance rate calculated is the rate in the first 180 days.11 See table 1 for exact percentages of experimental group students with valid testscores from previous years.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
of these school assignments by taking the mean value of student demo-graphic variables and student test scores for each school. Variables con-
Teacher Incentives and Student Achievement 387
structed this way include percentage of black, Hispanic, special education,limited English proficiency, and free-lunch-eligible students. Also con-structed is the total number of behavioral incidences in a school in the2006–7 academic year.I construct teacher-level variables from NYC Human Resources ðHRÞ
records and teacher value added data. Teacher gender and race are con-structed by taking the most recent nonmissing records from 2004 to 2010HR records. Teacher experience, or years of experience as a teacher, istaken from theOctober 2007HR file. Teacher value added ðTVAÞ data areavailable from the 2006–7 academic year until the 2008–9 academic year. Itake the TVA measured in standard deviation units and standardize thenumber by grade level each year to have a mean of zero and a standarddeviation of one in the full city sample. For teachers who taught morethan one grade, I take the average of TVA across grade levels. In addition,I construct the cumulative teacher absences using HR data on teacherattendance through May of each academic year.Table 3 provides pretreatment descriptive statistics. Columns 1–4 show
the mean and standard deviation of student and teacher characteristics inall schools in the NYC district, the experimental sample, the treatmentgroup, and the control group. In addition, the last two columns show thep-value of the difference between the mean of the entire district and thatof the experimental sample and the p-value of the difference between thetreatment group and the control group. The table of summary statisticsshows that most student and teacher characteristics are balanced betweenthe treatment and the control group. The only exceptions are the per-centage of white teachers, the percentage of Asian teachers, and TVA inmath in the 2006–7 academic year.
B. Research Design
The simplest and most direct test of any teacher incentive programwould be to examine the outcome of interest ðe.g., test scoresÞ regressedon an indicator for enrollment in the teacher incentive program forgrades g in school s in year t ðincentivei;g;s;tÞ and controls for basic studentand school characteristics, Xi and Xs, respectively:
outcomei;g;s;t 5 a1 1 b1Xi 1 g1Xs 1 dg 1 z t 1 p1incentivei;g;s;t 1 εi;g;s;t:
Yet, if schools select into teacher incentive programs because of importantunobserved determinants of academic outcomes, estimates obtained usingthe above equation may be biased. To confidently identify the causal im-pact of incentive programs, I must compare participating and nonpartici-pating schools that would have had the same academic outcomes had they
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le3
Descriptive
Statistics
andCov
ariate
Balan
ce
New
York
CityDistrict
ð1Þ
Exp
erim
ental
Sample
ð2Þ
Treatment
Group
ð3Þ
Control
Group
ð4Þ
p-ValueBalance
ð1Þa
ndð2Þ
ð3Þa
ndð4Þ
Percentwhite
.119
ð.192Þ
.014
ð.022Þ
.013
ð.017Þ
.015
ð.027Þ
.00
.51
Percentblack
.365
ð.291Þ
.411
ð.265Þ
.407
ð.266Þ
.415
ð.265Þ
.00
.76
PercentHispanic
.405
ð.256Þ
.552
ð.266Þ
.556
ð.267Þ
.546
ð.265Þ
.00
.69
PercentAsian
.105
ð.162Þ
.018
ð.031Þ
.018
ð.025Þ
.019
ð.039Þ
.00
.72
Percentother
race
.006
ð.007Þ
.005
ð.005Þ
.005
ð.005Þ
.005
ð.005Þ
.01
.98
Percentmale
.502
ð.080Þ
.515
ð.071Þ
.514
ð.080Þ
.517
ð.056Þ
.00
.65
Percentfemale
.498
ð.080Þ
.485
ð.071Þ
.486
ð.080Þ
.483
ð.056Þ
.00
.65
Percentfree
lunch
eligible
.858
ð.181Þ
.959
ð.038Þ
.960
ð.040Þ
.959
ð.036Þ
.00
.92
Percentspecialeducation
.083
ð.056Þ
.110
ð.051Þ
.108
ð.050Þ
.113
ð.053Þ
.00
.30
PercentEnglishlangu
agelearner
.135
ð.143Þ
.191
ð.149Þ
.189
ð.143Þ
.195
ð.158Þ
.00
.71
2006–7ELA
score
652.952
ð17.199Þ
639.324
ð9.068Þ
639.614
ð9.214Þ
638.909
ð8.874Þ
.00
.49
2006–7mathscore
671.327
ð20.413Þ
657.193
ð14.880Þ
657.829
ð15.025Þ
656.283
ð14.680Þ
.00
.36
Eighth
gradeELA
score
661.061
ð20.446Þ
646.489
ð12.854Þ
647.425
ð9.554Þ
645.143
ð16.574Þ
.00
.44
Eighth
grademathscore
672.029
ð21.222Þ
655.689
ð6.750Þ
656.127
ð6.314Þ
655.078
ð7.372Þ
.00
.50
Schoolsize
�100
6.832
ð5.738Þ
6.356
ð4.118Þ
6.257
ð4.071Þ
6.498
ð4.192Þ
.05
.57
2006–7schoolprogressreport
card
score
53.964
ð14.377Þ
51.680
ð15.968Þ
52.100
ð16.160Þ
51.084
ð15.723Þ
.00
.53
2006–7individual
behavioral
incidences
.142
ð.167Þ
.180
ð.162Þ
.185
ð.171Þ
.174
ð.149
Þ.00
.49
2006–7school-widebehavioral
incidences
98.435
ð147.081Þ
117.699ð125.257Þ
120.335ð126.853Þ
113.933ð123.231Þ
.00
.62
Percentfemaleteachers
.822
ð.128Þ
.810
ð.101Þ
.813
ð.103Þ
.805
ð.100Þ
.05
.46
388
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
New
York
CityDistrict
ð1Þ
Exp
erim
ental
Sample
ð2Þ
Treatment
Group
ð3Þ
Control
Group
ð4Þ
p-ValueBalance
ð1Þa
ndð2Þ
ð3Þa
ndð4Þ
Percentmaleteachers
.178
ð.128Þ
.190
ð.101Þ
.187
ð.103Þ
.195
ð.100Þ
.05
.46
Percentwhiteteachers
.566
ð.257Þ
.378
ð.176Þ
.396
ð.186Þ
.351
ð.158Þ
.00
.03
Percentblack
teachers
.231
ð.232Þ
.337
ð.226Þ
.321
ð.225Þ
.360
ð.226Þ
.00
.14
PercentHispanic
teachers
.154
ð.155Þ
.246
ð.169Þ
.247
ð.175Þ
.245
ð.161Þ
.00
.89
PercentAsian
teachers
.044
ð.073Þ
.034
ð.030Þ
.030
ð.027Þ
.038
ð.033Þ
.00
.02
Percentother
race
teachers
.002
ð.007Þ
.003
ð.007Þ
.003
ð.008Þ
.003
ð.007Þ
.13
.92
Teacher
salary
�1,000
68.048
ð7.397Þ
66.512
ð4.043Þ
66.430
ð4.035Þ
66.628
ð4.067Þ
.00
.67
Teacher
experience
8.300
ð2.600Þ
7.900
ð2.109Þ
7.919
ð2.136Þ
7.873
ð2.078Þ
.00
.85
2006–7ELA
TVA
.044
ð.557Þ
.055
ð.528Þ
.067
ð.543Þ
.038
ð.508Þ
.67
.65
2006–7mathTVA
.024
ð.560Þ
.041
ð.590Þ
.097
ð.602Þ
2.040
ð.565Þ
.52
.04
Number
ofteachersin
school
57.276
ð27.485Þ
59.061
ð21.618Þ
59.130
ð21.768Þ
58.961
ð21.486Þ
.17
.95
Percentmissing2006–7ELA
score
.506
ð.262Þ
.501
ð.258Þ
.503
ð.259Þ
.498
ð.256Þ
.70
.86
Percentmissing2006–7mathscore
.496
ð.267Þ
.489
ð.265Þ
.491
ð.266Þ
.486
ð.263Þ
.58
.86
Percentmissingeigh
thgradeELA
score
.262
ð.173Þ
.280
ð.201Þ
.261
ð.187Þ
.306
ð.220Þ
.26
.32
Percentmissingeigh
thgrademath
score
.218
ð.147Þ
.223
ð.183Þ
.211
ð.163Þ
.240
ð.208Þ
.67
.49
Missing2006–7individualbehavioral
incidences
.112
ð.081Þ
.115
ð.071Þ
.114
ð.073Þ
.115
ð.068Þ
.49
.93
Missing2006–7school-widebehav-
ioralincidences
.030
ð.170Þ
.000
ð.000Þ
.000
ð.000Þ
.000
ð.000Þ
.00
...
Percentmissing2006–7ELA
TVA
.875
ð.060Þ
.878
ð.057Þ
.877
ð.056Þ
.880
ð.059Þ
.35
.70
Percentmissing2006–7mathTVA
.871
ð.058Þ
.872
ð.052Þ
.870
ð.053Þ
.876
ð.049Þ
.73
.34
N1,417
396
233
163
...
...
NOTE.—
ELA
5Englishlangu
agearts.E
achcolumnreportssummarystatistics
from
differentsamples.Allvariablesareschool-levelmeasuresoraverages.T
eacher
valueadded
ðTVAÞw
asstandardized
tohaveameanofzero
andastandardofdeviationofonebytestgradelevelinthefullcity
samplebefore
theschoolaverage
was
taken.M
eansandstandard
deviationsðin
parenthesesÞa
rereported.Thep-values
ofthedifferencesbetweensamplesarereported
inthelast
twocolumns.
389
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
both participated in the program. By definition, this involves an unob-servable counterfactual.
390 Fryer
In the forthcoming analysis, the counterfactual is constructed by ex-ploiting the random assignment of schools into treatment and controlgroups. Restricting the analysis to schools that were selected ðby the UFTand the DOEÞ to be included in the experimental sample, I can estimatethe causal impact of being offered a chance to participate in a teacher in-centive program by comparing the average outcomes of schools randomlyselected for treatment and the average outcomes of schools randomlyselected for control. Schools that were not chosen to participate form thecontrol group corresponding to the counterfactual state that would haveoccurred in treatment schools if they had not been offered a chance toparticipate.Let Ts be an indicator for a treatment school. The mean difference in
outcomes between treatment schools ðTs 5 1Þ and control schools ðTs 5 0Þis known as the “intent-to-treat” ðITTÞ effect and is estimated by re-gressing student outcomes on Ts. In theory, predetermined student schoolcharacteristics ðXi and XsÞ should have the same distribution acrosstreatment and control groups because they are statistically independent oftreatment assignment. The specifications estimated are of the form
outcomei;g;s;t 5 a2 1 b2Xi 1 g2Xs 1 p2Ts 1 dg 1 z t 1 εi;s;
where the vector of student level controls,Xi, includes a mutually inclusiveand collectively exhaustive set of race dummies, predetermined measuresof the outcome variables when possible ði.e., preintervention test scores orthe number of behavioral incidencesÞ, and indicators for gender, free-luncheligibility, special education status, and English language learner status. Theset of school-level controls, Xs, includes the school’s prelottery number ofbehavioral incidences and the percentages of students at a school who areblack, Hispanic, free-lunch eligible, English-language learners, and specialeducation students. The ITT is an average of the causal effects for studentsenrolled in treatment schools compared to those enrolled in control schools,at the time of random assignment. The ITT therefore captures the causaleffect of being offered a chance of participating in the incentive program, notof actually participating.Under several assumptions ðthat the treatment group assignment is
random, control schools are not allowed to participate in the incentiveprogram, and treatment assignment only affects outcomes through pro-gram enrollmentÞ, I can also estimate the causal impact of actually par-ticipating in the incentive program. This parameter, commonly known asthe “treatment-on-treated” ðTOTÞ effect, measures the average effect oftreatment on schools that choose to participate in the merit pay program.The TOT parameter can be estimated through a two-stage least-squares
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
regression of student outcomes on participation, with original treatmentassignment ðTsÞ as an instrumental variable ðIVÞ for participation. I use
Teacher Incentives and Student Achievement 391
the number of years a student spent in treated schools as the actualparticipation variable. The first stage equations for IV estimation take theform
incentivei;g;s;t 5 a3 1 b3Xi 1 g3Xs 1 dg 1 z t 1 p3Ts 1 εi;s;g;t;
where p3 captures the effect of treatment assignment ðTsÞ on the averagenumber of years a student spends in a treatment school. The TOT is theestimated difference in outcomes between students in schools who wereinduced into participating through treatment assignment and those in thecontrol group who would have enrolled if they had been offered thechance.
V. The Impact of Teacher Incentives
A. Student Achievement
Table 4 presents first-stage, ITT, and TOT estimates of the effect ofteacher incentives on state math and ELA test scores. Columns 1–3 reportestimates from the elementary school sample, columns 4–6 report esti-mates from middle schools, and columns 7–9 present results for a pooledsample of elementary and middle schools. I present both raw estimatesand those that contain the parsimonious set of controls described in theprevious section. Note that the coefficients in the table are normalized sothat they are in standard deviation units and represent 1-year impacts.Surprisingly, all estimates of the effect of teacher incentives on student
achievement are negative in both elementary and middle school; themiddle school results are significant at the 5% level. The ITT effect of theteacher incentive scheme is 20.014j ð0.020Þ in reading and 20.018jð0.024Þ in math for elementary schools and20.030j ð0.011Þ in reading and20.046j ð0.018Þ in math for middle schools. The effect sizes in middleschool are nontrivial—a student who attends a participating middle schoolfor 3 years of the experiment is expected to lose 0.090j in reading and0.138j in math. The TOT estimates are smaller than the ITT estimates, asthe first-stage coefficients are all larger than one.Table 5 presents results similar to table 4, but for high schools. High
school students do not take the New York state exams. Instead, they haveto take and score 55 or above in regents exams in five key subject areas tograduate with a local diploma. To graduate with a regents diploma, stu-dents had to score 65 or above in the same five required regents examareas. To graduate with an advanced regents diploma, students had tomeet the requirements for regents diploma and, in addition, score a 65or higher in the mathematics B, life science, physical science, and foreign
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le4
The
Impa
ctof
Teach
erIncentives
onStud
entAch
ievemen
t,Elemen
tary
andMiddleScho
ol
Elementary*
Middley
PooledSample
First
Stage
ð1Þ
ITT ð2Þ
TOT
ð3Þ
First
Stage
ð4Þ
ITT ð5Þ
TOT
ð6Þ
First
Stage
ð7Þ
ITT ð8Þ
TOT
ð9Þ
ELA:
Raw
1.319
2.013
2.010
1.137
2.031
2.027
1.236
2.020
2.017
ð.057Þ
ð.023Þ
ð.017
Þð.0
45Þ
ð.012Þ
ð.011Þ
ð.043Þ
ð.014Þ
ð.011
ÞControl
1.323
2.014
2.010
1.137
2.030
2.026
1.236
2.020
2.016
ð.055Þ
ð.020Þ
ð.015
Þð.0
46Þ
ð.011Þ
ð.010Þ
ð.043Þ
ð.013Þ
ð.010
ÞN
students
175,894
175,894
175,894
147,141
147,141
147,141
323,317
323,317
323,317
Nschools
230
230
230
136
136
136
323
323
323
Math:
Raw
1.318
2.020
2.015
1.136
2.051
2.045
1.235
2.033
2.027
ð.057Þ
ð.026Þ
ð.020
Þð.0
45Þ
ð.019Þ
ð.017Þ
ð.043Þ
ð.017Þ
ð.014
ÞControl
1.322
2.018
2.014
1.136
2.046
2.040
1.235
2.032
2.026
ð.055Þ
ð.024Þ
ð.018
Þð.0
46Þ
ð.018Þ
ð.016Þ
ð.042Þ
ð.016Þ
ð.013
ÞN
students
176,387
176,387
176,387
147,493
147,493
147,493
324,172
324,172
324,172
Nschools
230
230
230
136
136
136
323
323
323
NOTE.—
Eachcolumnreportsresultsfrom
separateregressions.DependentvariablesarethestateELA
ðEnglishlangu
ageartsÞa
ndmathscoresstandardized
tohaveameanof
zero
andastandarddeviationofonebygradeleveleach
academ
icyearin
thefullcity
sample.Scoresfrom
all3years
ofim
plementationareused.First-stage
regressionusesthe
years
receivingtreatm
entas
theoutcomevariable
andreportsthecoefficientonthedummyvariable
forbeingrandomized
into
thetreatm
entgroup.Theintent-to-treat
ðITTÞ
estimates
report
theeffect
ofbeingassign
edto
thetreatm
entgroupusingtheordinaryleast-squares
method.Thetreatm
ent-on-treated
ðTOTÞe
stim
ates
report
theeffect
of
spendingtimein
treatedschools,usingtherandom
assign
mentinto
thetreatm
entgroupas
theinstrument.Raw
regressionscontrolfor2006–7statetest
scores,test
gradelevel
dummies,andyearfixedeffects.Controlregressionsincludestudentdem
ographic
variablesandschoolcharacteristicsas
additional
controlvariables.Standarderrors,reported
inparentheses,areclustered
atschoollevel.Number
ofstudentobservationsisreported
asan
aggregatetotalofobservationsforall3years
ofim
plementation.Number
ofschool
observationsisreported
forthe2007
–8schoolyearðyear1Þ.
*Sample
within
schoolsdesignated
aselem
entary
intheNew
York
CityDepartm
entofEducationcity
progressreport
files,as
wellas
students
ingradelevels3–
5in
schools
designated
asK–8.
ySample
within
schoolsdesignated
asmiddle
schools,as
wellas
students
ingradelevels6–
8ofschoolsdesignated
asK–8.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
language regents exams.12 Table 5 presents first-stage, ITT, and TOT es-timates for the impact of teacher incentives on comprehensive English,
Teacher Incentives and Student Achievement 393
mathematics, science, US history, and global history regents exam scores.All exam scores were standardized to have a mean of zero and a standarddeviation of one in the full city sample each academic year.Similar to the analysis of elementary and middle schools, there is no
evidence that teacher incentives had a positive effect on achievement.Estimates of the effect of teacher incentives on high school achievementare all small and statistically insignificant. The ITT effect on the Englishregents exam score is20.003j ð0.046Þ, the effect on the integrated algebraexam score is 20.020j ð0.033Þ, and the effect on science scores is 20.019jð0.038Þ. The ITT effect on the US history exam score is 20.033j ð0.057Þ,and that on the global history exam score is 20.063j ð0.046Þ. The TOTeffect is of a comparable magnitude.Table 5 also reports treatment effects on 4-year graduation rates. The
dependent variables are a dummy for graduating in 4 years, which takesthe value one if the student graduated in 4 years and zero otherwise, and adummy for graduating in 4 years with a regents diploma, which takes thevalue one if the student graduated with a regents diploma and zero oth-erwise. Students enrolled in treatment schools were 4.4% less likely tograduate in 4 years ðwhich is statistically significant at the 5% levelÞ andwere 7.4% less likely to obtain a regent’s diploma ðstatistically significantat the 10% levelÞ. Note that during the period of the experiment, meangraduation rates fluctuated between 54% and 61%.Table 6 explores heterogeneity in treatment effects across a variety of
subsamples of the data: gender, race, free lunch eligibility, previous yearsstudent test scores, school size, TVA, and teacher salary. The coefficientsin the table are ITT estimates with a parsimonious set of controls. Allcategories are mutually exclusive and collectively exhaustive. The effect ofteacher incentives on achievement does not vary systematically across thesubsamples. For middle school students, the only exceptions are studentswho are free-lunch eligible, are attending larger schools, or are taught bymore experienced teachers ði.e., received higher salariesÞ. Among highschool students, the exceptions are students who are white or Asian,scored in the lowest tercile on eighth grade state tests, or are attending
12 Regents exams are offered in January, June, and August of each academic year
in the following subject areas: comprehensive English, algebra, geometry, trigonom-etry, chemistry, physics, biology, living environment, earth science, world history,US history, and foreign languages. In this article, I present results on comprehensiveEnglish, integrated algebra, living environment, US history, and global history regentsexam scores. Among mathematics and science exam areas, integrated algebra andliving environment were selected because the highest number of students took thoseexams. Using other exam scores gives qualitatively similar results.This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le5
The
Impa
ctof
Teach
erIncentives
onStud
entAch
ievemen
t,HighScho
ol
Raw
Control
Coefficient
Standard
Error
Number
of
Students
Number
of
Schools
Coefficient
Standard
Error
Number
of
Students
Number
of
Schools
Regents
exam
scores:
English:
First
stage
1.082
ð.095
Þ55,791
101
1.069
ð.090
Þ55,791
101
ITT
.009
ð.052
Þ55,791
101
2.003
ð.046
Þ55,791
101
TOT
.009
ð.048
Þ55,791
101
2.003
ð.043
Þ55,791
101
Mathem
atics:
First
stage
1.111
ð.058
Þ55,785
134
1.101
ð.060
Þ55,785
134
ITT
2.019
ð.031
Þ55,785
134
2.020
ð.033
Þ55,785
134
TOT
2.017
ð.028
Þ55,785
134
2.018
ð.029
Þ55,785
134
Science:
First
Stage
1.042
ð.066
Þ57,364
109
1.028
ð.067
Þ57,364
109
ITT
2.021
ð.039
Þ57,364
109
2.019
ð.038
Þ57,364
109
TOT
2.020
ð.037
Þ57,364
109
2.018
ð.037
Þ57,364
109
UShistory:
First
stage
1.107
ð.094
Þ50,315
901.089
ð.090
Þ50,315
90IT
T2.028
ð.064
Þ50,315
902.033
ð.057
Þ50,315
90TOT
2.025
ð.058
Þ50,315
902.030
ð.052
Þ50,315
90Global
history:
First
stage
1.008
ð.083
Þ61,892
89.987
ð.084
Þ61,892
89IT
T2.057
ð.045
Þ61,892
892.063
ð.046
Þ61,892
89TOT
2.057
ð.045
Þ61,892
892.063
ð.046
Þ61,892
89
394
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Raw
Control
Coefficient
Standard
Error
Number
of
Students
Number
of
Schools
Coefficient
Standard
Error
Number
of
Students
Number
of
Schools
Four-year
graduation:
Graduated:
First
stage
.840
ð.078Þ
27,995
87.830
ð.073
Þ27,995
87IT
T2.056
ð.024Þ
27,995
872.044
ð.021
Þ27,995
87TOT
2.066
ð.029Þ
27,995
872.053
ð.026
Þ27,995
87Regents
diploma:
First
stage
.996
ð.084Þ
15,803
85.985
ð.079
Þ15,803
85IT
T2.077
ð.045Þ
15,803
852.074
ð.043
Þ15,803
85TOT
2.078
ð.046Þ
15,803
852.075
ð.044
Þ15,803
85
NOTE.—
Eachrowreportsresultsfrom
differentregressions.Thedependentvariablesareregentsexam
scoresin
comprehensive
English,m
athem
aticsðin
tegrated
algebraÞ,s
cience
ðlivingenvironmentÞ,
UShistory,andglobal
history
forstudents
ingrades
8–12
andhighschoolgraduationoutcomes.Regents
exam
scoresarestandardized
byexam
typeeach
academ
icyearto
haveameanofzero
andastandardofdeviationofonein
thefullcity
sample.Graduationoutcomeiscoded
asdummyvariablesthat
takethevalueoneifthe
studentgraduated
highschoolorifthestudentgraduated
witharegentsdiploma,respectively,andzero
otherwise.Firststageusestheyears
that
thestudentreceived
treatm
entas
theoutcomevariableandreportsthecoefficientonthedummyvariableforbeingin
thetreatm
entgroup.T
heintent-to-treat
ðITTÞe
stim
ates
reporttheeffect
ofbeingassign
edto
thetreatm
entgroupusingtheordinaryleast-squares
method.Thetreatm
ent-on-treated
ðTOTÞe
stim
ates
report
theeffect
ofspendingtimein
treatedschools,usingtherandom
assign
mentinto
thetreatm
entgroupas
theinstrument.Raw
regressionscontrolforeigh
thgradeELA
ðEnglishlangu
ageartsÞa
ndmathstatetestscoresforstudentsin
grades
9–12
andseventh
gradeELA
andmathstatetestscoresforstudentsin
eigh
thgrade.Su
bstitutingseventh
gradeprevioustestscoresforallgrades
does
notqualitativelyalterresults.Raw
regressionsalso
includegradefixedeffectsandyearfixedeffects.Controlregressionscontrolforstudentdem
ographicsandschoolcharacteristicsin
addition.Standarderrors
are
clustered
atschoollevel.Number
ofstudentobservationsisreported
asan
aggregatetotalofobservationsforall3years
ofim
plementation.Number
ofschoolobservationsis
reported
forthe2007
–8schoolyearðyear1Þ.
395
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le6
The
Impa
ctof
Teach
erIncentives
onAlterna
tive
Outcomes
Elementary*
Middley
Highz
First
Stage
ITT
TOT
First
Stage
ITT
TOT
First
Stage
ITT
TOT
Attendance
rate
1.308
2.017
2.013
1.121
2.019
2.017
.915
2.014
2.016
ð.054Þ
ð.021Þ
ð.016Þ
ð.045Þ
ð.025Þ
ð.022Þ
ð.068Þ
ð.054
Þð.0
59Þ
Nstudents
181,393
181,393
181,393
154,017
154,017
154,017
207,718
207,718
207,718
Nschools
230
230
230
136
136
136
8888
88Behaviorproblems
1.308
2.000
2.000
1.121
.010
.009
.915
.061
.067
ð.054Þ
ð.017Þ
ð.013Þ
ð.045Þ
ð.022Þ
ð.019Þ
ð.068Þ
ð.035
Þð.0
39Þ
Nstudents
181,393
181,393
181,393
154,017
154,017
154,017
207,718
207,718
207,718
Nschools
230
230
230
136
136
136
8888
88GPA
1.387
.005
.004
1.153
2.003
2.003
1.161
2.004
2.003
ð.094Þ
ð.041Þ
ð.029Þ
ð.050Þ
ð.031Þ
ð.027Þ
ð.082Þ
ð.033
Þð.0
28Þ
Nstudents
45,410
45,410
45,410
128,764
128,764
128,764
123,434
123,434
123,434
Nschools
201
201
201
136
136
136
...
...
...
Predictive
ELA
1.069
2.022
2.021
.947
2.021
2.022
...
...
...
ð.042Þ
ð.017Þ
ð.016Þ
ð.042Þ
ð.019Þ
ð.020Þ
...
...
...
Nstudents
108,515
108,515
108,515
49,117
49,117
49,117
...
...
...
Nschools
230
230
230
136
136
136
...
...
...
Predictive
math
1.063
2.026
2.025
.949
2.049
2.052
...
...
...
ð.042Þ
ð.020Þ
ð.019Þ
ð.042Þ
ð.022Þ
ð.024Þ
...
...
...
Nstudents
108,078
108,078
108,078
48,777
48,777
48,777
...
...
...
Nschools
230
230
230
136
136
136
...
...
...
NOTE.—
ITT5
intentto
treat;TOT5
treatm
entontreated;ELA
5Englishlangu
agearts.Eachcolumnreportsresultsfrom
separateregressions.Thedependentvariablesare
attendance
rate,thenumber
ofbehavioralp
roblems,annualGPA
ðgradepointaverageÞ,
andspringpredictive
stateexam
scoresfrom
each
academ
icyear.Alloutcomevariablesare
standardized
bygradeleveleachacadem
icyearto
haveameanofzero
andastandarddeviationofonein
thefullcity
sample.T
heregressionspecificationusedisthesameas
intables
4and5.
Standarderrors,reported
inparentheses,areclustered
atschoollevel.Number
ofstudentobservationsisreported
asan
aggregatetotalofobservationsforall3years
of
implementation.Number
ofschoolobservationsisreported
forthe2007–8schoolyearðyear1Þ.
*Sample
within
schoolsdesignated
aselem
entary
intheNew
York
CityDepartm
entofEducationcity
progressreport
files,as
wellas
students
ingradelevels3–5in
schools
designated
asK–8.
ySample
within
schoolsdesignated
asmiddle
schools,as
wellas
students
ingradelevels6–
8ofschoolsdesignated
asK–8.
zStudents
ingradelevels9–
12.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
larger schools. Students in these subsamples seem to be affected morenegatively by teacher incentives.
Teacher Incentives and Student Achievement 397
The estimates above use the sample of students for which I haveachievement test scores. If students in treatment and control schools havedifferent rates of selection into this sample, my results may be biased. Asimple test for selection bias is to investigate the impact of the treatmentoffer on the probability of entering the sample. The results of this test,although not shown here in tabular form, demonstrate that the coefficienton treatment is small and statistically zero.13 This suggests that differentialattrition is not likely to be a concern in interpreting the results.
B. Alternative Outcomes
Thus far, I have concentrated on student progress on state assessments,the most heavily weighted element of NYC’s incentive scheme. Now,I introduce two additional measures of student performance and threemeasures of school environment: GPAs, predictive math and ELA exams,school environment surveys, attendance, and behavioral incidences. Manyof these outcomes enter directly into the incentive scheme and may be af-fected by it.Table 7 shows estimates of the impact of teacher incentives on this set of
alternative outcomes. Predictive assessments are highly correlated withthe state exams and are administered to all public school students in grades3–8 in October and May. The DOE gives several different types of pre-dictive exams, and schools can choose to use one of the options dependingon their needs. In this article, I analyze math and ELA test scores from thespring Acuity Predictive Assessment.14 Each student’s attendance rate iscalculated as the total number of days present in any school divided by thetotal number of days enrolled in any school. Attendance rate was stan-dardized by grade level to have a mean of zero and a standard deviation ofone each academic year across the full city sample. Grades were extractedfrom files containing the transcripts of all students in the district.15 Ele-mentary school students received letter grades, which were converted to a4.0 scale, and middle and high school students received numeric gradesthat ranged from 1 to 100. Students’ grades from each academic year wereaveraged to yield an annual GPA. As with test scores, GPAs were stan-dardized to have a mean of zero and a standard deviation of one amongstudents in the same grade with the same grade scale across the schooldistrict. The number of behavioral incidences was pulled from behaviordata, which record the date, level, location, and a short description of all
13 Tabular results are available from the author upon request.14 Eighth grade students did not take the spring predictive tests because they did
not have to take state exams in the following year.15 Elementary school transcripts are not available for all schools each academic
year. High school transcripts were not available until the 2008–9 academic year.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le7
The
Impa
ctof
Teach
erIncentives
onTeach
erBeh
avior
Elementary
School
Middle
School
K–8
First
Stage
ITT
TOT
First
Stage
ITT
TOT
First
Stage
ITT
TOT
Retentionin
district
1.224
.002
.002
1.233
2.006
2.005
1.290
.010
.007
ð.052Þ
ð.006Þ
ð.005Þ
ð.082Þ
ð.011Þ
ð.009Þ
ð.091Þ
ð.013
Þð.0
10Þ
Nteachers
21,700
21,700
21,700
8,289
8,289
8,289
4,693
4,693
4,693
Nschools
187
187
187
8585
8540
4040
Retentionin
school
1.224
2.007
2.005
1.233
2.027
2.022
1.290
2.000
2.000
ð.052Þ
ð.012Þ
ð.010Þ
ð.082Þ
ð.017Þ
ð.014Þ
ð.091Þ
ð.027
Þð.0
21Þ
Nteachers
21,700
21,700
21,700
8,289
8,289
8,289
4,693
4,693
4,693
Nschools
187
187
187
8585
8540
4040
Personal
absences
1.220
.275
.225
1.225
2.440
2.359
1.290
.613
.475
ð.053Þ
ð.212Þ
ð.174Þ
ð.083Þ
ð.403Þ
ð.325Þ
ð.091Þ
ð.496
Þð.3
82Þ
Nteachers
18,543
18,543
18,543
6,727
6,727
6,727
3,977
3,977
3,977
Nschools
187
187
187
8585
8540
4040
NOTE.—
Eachcolumnreportsresultsfrom
differentregressions.Thedependentvariablesareretentionin
districtandin
schoolandpersonal
absencesin
ayear.Retentionin
districtiscoded
asadummyvariablethattakes
thevalueoneiftheteacher
staysin
theNew
York
Citypublicschoold
istrictin
thenextacadem
icyearandzero
otherwise.Retention
inschooliscoded
similarly
asadummyvariable
that
takes
thevalueoneiftheteacher
stayed
inthesameschoolthenextacadem
icyear.Teacher
absencesisthenumber
ofdays
absentfrom
schoolforpersonal
reasonsin
anacadem
icyear.Outcomevariablesfrom
thefirst2years
ofim
plementationareused.Firststageusesthenumber
ofyears
receiving
treatm
entas
theoutcomevariableandreportsthecoefficientonthedummyvariableforbeingin
thetreatm
entgroup.T
heintent-to-treat
ðITTÞe
stim
ates
reporttheeffect
ofbeing
assign
edto
thetreatm
entgroupusingtheordinaryleast-squares
method.Thetreatm
ent-on-treated
ðTOTÞe
stim
ates
report
theeffect
ofteachingat
treatedschools,usingthe
random
assign
mentinto
thetreatm
entgroupas
theinstrument.Teacher
dem
ographicvariablesandthe2006–7teacher
valueareincluded
ascontrols.S
tandarderrors,reported
inparentheses,are
clustered
atschoollevel.N
umber
ofteacher
observationsisreported
asan
aggregatetotalo
fobservationsforthefirst2yearsofim
plementation.N
umber
ofschool
observationsisreported
forthe2007
–8schoolyearðyear1Þ.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
incidences. The total number of incidences attributed to a student in anacademic year across all schools and grades he attended was calculated and
Teacher Incentives and Student Achievement 399
standardized by grade level to have a mean of zero and a standard devi-ation of one each academic year across the full city sample.Results from predictive assessments provide an identical portrait to
that depicted by state test scores. The effect of the teacher incentive pro-gram on predictive ELA exams is negative and statistically insignificant,with the ITT effect equal to 20.022j ð0.017Þ in the elementary schoolsample and20.021j ð0.019Þ in themiddle school sample. The ITT effect onpredictive math exams is20.026j ð0.020Þ in the elementary school sampleand 20.049j ð0.022Þ in the middle school sample. Note that the effect ofteacher incentives on middle school students’ predictive math exam scoresis negative and statistically significant, consistent with the state test scorefindings.Teacher incentives have a statistically insignificant effect on other al-
ternative student outcomes. The ITT and TOT effects on attendance rate,which enters directly in the calculation of progress report card scores,are negative across all school levels. The ITT effect is estimated to be20.017j ð0.021Þ in the elementary school sample, 20.019j ð0.025Þ in themiddle school sample, and 20.014j ð0.054Þ in the high school sample.The effects on behavioral incidences and GPAs are similarly small andinsignificant.
C. Teacher Behavior
In this section, I estimate the impact of the teacher incentive program ontwo important teacher behaviors: absences and retention. I assign teach-ers to treatment or control groups if they were assigned to a treatment or acontrol school, respectively, in October 2007. I only include teachers whowere teaching at schools in the randomization sample in 2007 and ignoreall who enter the system afterward.I measure retention in two ways: in school and in district—both of
which were constructed using HR data provided by DOE. Retention inschool was constructed as a dummy variable that takes the value one if ateacher was associated with the same school in the following academicyear and zero otherwise. Retention in district ismore complicated. Like thecoding of retention in school, I construct a dummy variable that takesthe value one if a teacher was found in theNYC school district’s HR file inthe following academic year and zero otherwise. But there are two im-portant caveats. First, charter schools and high schools are not included inthe NYC public school district’s HR files, and, therefore, some teacherswho left the district may have simply moved to teach at charter schoolsor high schools in the district. As the same types of teacher certificatesqualify teachers to teach in both middle and high schools, it is possible
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
that some teachers who left the district from middle schools went to teachat high schools. It is unlikely, however, that a significant number of
400 Fryer
elementary school teachers obtained new certificates to qualify for teach-ing in middle schools. Therefore, I divided the sample of teachers intoelementary, middle, and K–8 school samples and estimated the treatmenteffects separately on each sample. To measure absences, I obtained thenumber of personal absences as of May for teachers who did not exit thesystem.Table 8 presents results on the impact of teacher incentives on measures
of teacher behavior. There is no evidence that teacher incentives affectteacher attendance or retention in either district or school. Elementaryschool teachers in treatment schools were 0.2% more likely to stay in theNYC school district, were 0.7% less likely to stay at the same school inthe following academic year, and took 0.275 more days of personal ab-sences. Middle school teachers exhibit similar patterns. None of theseeffects are statistically significant, nor are they economically meaningful.
VI. Discussion
The previous sections demonstrate that the teacher incentive schemepiloted in 200 NYC public schools did not increase achievement. Ifanything, achievement may have declined as a result of the experiment.Yet, incentive schemes in developing countries have proven successful atincreasing achievement.In this section, I consider four explanations for these stark differences:
ð1Þ incentives may not have been large enough, ð2Þ the incentive schemewas too complex, ð3Þ group-based incentives may not be effective, andð4Þ teachers may not know how they can improve student performance.Using the analysis, along with data gleaned from other experiments, Iargue that the most likely explanation is that the NYC incentive scheme,along with all other American pilot initiatives thus far, is too complex andprovides teachers with too little control. It is important to note that Icannot rule out the possibility that other unobservable differences be-tween the developing countries and America ðe.g., teacher motivationÞproduce the differences.
A. Incentives Were Not Large Enough
One potential explanation for the stark results is that the incentivessimply were not large enough. There are two reasons that the incentives toincrease achievement in NYC may have been too small. First, althoughschools had discretion over how to distribute the incentives to teachers ifthey met their performance targets, an overwhelming majority of themchose to pay teachers equally. These types of egalitarian distributionmethods can induce free riding and undercut individual incentives to put
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Tab
le8
The
Impa
ctof
Teach
erIncentives
onTeach
erSu
rvey
Results
Elementary
Middle
K–8
High
First
Stage
ITT
TOT
First
Stage
ITT
TOT
First
Stage
ITT
TOT
First
Stage
ITT
TOT
Response
rate
1.667
.018
.011
1.740
.019
.011
1.883
2.011
2.006
1.559
.037
.024
ð.074Þ
ð.023Þ
ð.013
Þð.1
03Þ
ð.040Þ
ð.022Þ
ð.100
Þð.0
61Þ
ð.030Þ
ð.136Þ
ð.028Þ
ð.017Þ
Nschools
187
187
187
9696
9640
4040
8787
87Safety/respect
1.667
2.088
2.053
1.740
.210
.121
1.884
2.558
2.296
1.559
.095
.061
ð.074Þ
ð.126Þ
ð.075
Þð.1
03Þ
ð.167Þ
ð.091Þ
ð.100
Þð.2
63Þ
ð.130Þ
ð.135Þ
ð.158Þ
ð.096Þ
Nschools
187
187
187
9696
9639
3939
8686
86Community
1.667
.071
.042
1.740
.072
.041
1.884
2.583
2.309
1.559
2.000
2.000
ð.074Þ
ð.133Þ
ð.078
Þð.1
03Þ
ð.188Þ
ð.102Þ
ð.100
Þð.2
83Þ
ð.138Þ
ð.135Þ
ð.177Þ
ð.108Þ
Nschools
187
187
187
9696
9639
3939
8686
86Engagement
1.667
.046
.027
1.740
.089
.051
1.884
2.403
2.214
1.559
.024
.016
ð.074Þ
ð.129Þ
ð.076
Þð.1
03Þ
ð.177Þ
ð.097Þ
ð.100
Þð.2
78Þ
ð.134Þ
ð.135Þ
ð.161Þ
ð.098Þ
Nschools
187
187
187
9696
9639
3939
8686
86Academ
icexpectations
1.667
2.006
2.004
1.740
.088
.050
1.884
2.506
2.268
1.559
.101
.065
ð.074Þ
ð.129Þ
ð.076
Þð.1
03Þ
ð.181Þ
ð.099Þ
ð.100
Þð.2
87Þ
ð.139Þ
ð.135Þ
ð.163Þ
ð.098Þ
Nschools
187
187
187
9696
9639
3939
8686
86
NOTE.—
Each
column
reportsresultsfrom
separateregressions.
Thedependentvariablesareteacher
survey
response
ratesand
subscores,
standardized
by
schoollevel
ðelem
entary,m
iddle,andhighÞtohaveameanofzero
andastandarddeviationofone.Theintent-to-treat
ðITTÞe
stim
ates
reporttheeffect
ofbeingassign
edto
thetreatm
entgroup
usingtheordinaryleast-squares
method,andthetreatm
ent-on-treated
ðTOTÞe
stim
ates
report
theeffect
ofreceivingtreatm
ent,withtherandom
assign
mentas
theinstrument.
Regressionscontrolforstudentdem
ographicsandpreviousachievementmeasures.Standarderrors
arereported
inparentheses.N
umber
ofschoolobservationsisreported
forthe
2007
–8schoolyearðyear1Þ.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
in effort. Moreover, an overwhelming majority of teachers in schools thatmet the annual target earned an amount close to $3,000. This is less than
402 Fryer
4.1% of the average annual teacher salary in the sample. One might thinkthat the bonus was simply not large enough for teachers to put in moreeffort, although similar incentive schemes in India ð3%Þ and Kenya ð2%Þwere relatively smaller.Second, the measures used to calculate the progress report card scores
directly influence other accountability measures such as the AYP ðad-equate yearly progressÞ that determine whether a school will be subjectedto regulations or even be closed, which results in all staff losing their jobs.Hence, all teachers in poor-performing schools, including all treatmentand control schools in the experiment, have incentives to perform well onthe precise measures that were being incentivized. Thus, it is not clearwhether the teacher incentive program provides additional incentives, atthe margin, for teachers to behave differently.A brief look at the results of the Project on Incentives in Teaching
ðPOINTÞ, a pilot initiative in Nashville, Tennessee, suggests that a largerincentive in schools that are not under pressure by AYP was still not anymore effective. Teachers in POINT treatment schools were selected fromthe entire school district and could earn up to $15,000 in a year solely onthe basis of their students’ test scores. Teachers whose performance was atlower thresholds could earn $5,000–$10,000. The maximum amount isroughly 31% of the average teacher salary in Nashville. Springer et al.ð2010Þ find that even though about half of the participating teachers couldhave reached the lowest bonus threshold if their students answered onaverage two or three more out of 55 items correctly, student achievementdid not increase significantly more in classrooms taught by treatmentteachers. Moreover, they report that treatment teachers did not seem tochange their instructional practices or effort level. Considering the NewYork results alongside those of the POINT pilot leaves open the possi-bility that if the magnitude of rewards is, in fact, too small to influenceteacher behavior, then teachers’ labor supply could be so inelastic that theinvestment required to influence behavior at the margin may be too largeto be relevant to policy discussions.
B. Incentive Scheme Was Too Complex
In the experiment it was difficult, if not impossible, for teachers toknow how much effort they should exert or how that effort influencesstudent achievement because of the complexity of the progress report cardsystem used in NYC. For example, the performance score for elementaryand middle schools is calculated using the percentage of students at theproficiency level and the median proficiency rating in state tests. Recallthat the performance score depends on how a school performs compared
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
to its peer schools that had a similar student achievement level in theprevious year and compared to all schools in the district. But it is highly
Teacher Incentives and Student Achievement 403
unlikely that teachers can predict at which percentile their school will beplaced relative to the peer group and the district in these measures ofperformance if the school increased the overall student achievement by,for example, 1 standard deviation.Similarly, the POINT pilot in Tennessee, like the pilots in other
American school districts, contained an incentive scheme that was depen-dent on the performance of others rather than simpler incentive schemessuch as those in Duflo and Hanna ð2005Þ, Glewwe et al. ð2010Þ, andMuralidharan and Sundararaman ð2011Þ. Lack of experimentation withsimple incentives schemes like those that have been successful in the de-veloping world makes it impossible to draw definitive conclusions abouttheir effectiveness in American public schools; however, this growingbody of international evidence strongly suggests that teachers respond tosimple, individualized incentives by modifying behavior in desirableways. It is plausible that ambiguities in the incentives structures employedin New York and Tennessee may have served to flatten the function thatmaps effort into expected reward.
C. Group-Based Rewards Are Ineffective
Although schools were given flexibility to choose their own incentiveschemes, the vast majority of them settled on a group-based scheme.Group-based incentive schemes introduce the potential for free riding andmay be ineffective under certain conditions. Yet, in some contexts, they havebeen shown to be effective. For example, Muralidharan and Sundararamanð2011Þ found that the group incentive scheme in government-run schoolsin India had a positive and significant effect on student achievement. How-ever, the authors stress that 92% of treatment schools had between twoand five teachers, with an average of 3.28 teachers in each treated school.Similar results are obtained in Glewwe et al. ð2010Þ, where the averagenumber of teachers per school was 12. Provided that NYC public schoolshave 60 teachers on average, the applicability of the results from theseanalyses is suspect. When there are only three ðor 12Þ teachers in a school,monitoring and penalizing those teachers who shirk their responsibilityis less costly. Whatever the mechanism, the success of international in-centives programs in smaller schools suggests that individual- or small-group-focused incentives may be effective modifiers of behavior.However, Lavy ð2002Þ also suggests that group-based incentives may
be effective in larger schools. His nonexperimental evaluation of theteacher incentives intervention in Israel, in which teachers were incen-tivized on the average number of credit units per student, the proportionof students receiving a matriculation certificate, and the dropout rate,
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
reveals that the program had a positive and significant impact on the av-erage number of credits and test scores. The average number of teachers
404 Fryer
in the treatment schools in Israel is approximately 80, closer to the averagenumber of teachers in a school in NYC.
D. Teachers Are Ignorant, Not Lazy
Evidence from student incentive experiments reveals that programs thatdirectly incentivize inputs to the education production function are morelikely to influence student behavior and improve student learning out-comes than those that incentivize outcome measures like test scores orgrades ðFryer 2011Þ. While economic theory predicts that outcome ex-periments should be more effective, this relies on the premise that studentsknow the best way to improve their academic skills and increase effortefficiently. In reality, despite enthusiasm about the opportunity to profitfrom their achievement, students reveal ignorance about their own edu-cation production function that makes output incentives programs inef-fective in terms of improving student achievement.Teachers who are offered incentives for outcomes that are poorly un-
derstood and inherently high variance may face an analogous predica-ment. Since the teacher incentives that have been tested in Americanschools uniformly incentivize student learning outputs, teachers are re-quired to know effective strategies for promoting student achievement,which, while it is reasonable to expect professional educators to hold thisknowledge, is not necessarily the case. While Duflo and Hanna’s ð2005Þinput experiment revealed a profound influence of input incentives onteacher behavior, table 8 reveals no statistically significant changes inteacher behavior in NYC.So, if teachers are indeed ignorant of the function that maps their own
effort into student achievement, how might that affect their performanceand, ultimately, student learning? There are two ways I might imagine ateacher to behave in the face of this type of uncertainty: ð1Þ maintain thestatus quo and hope for the best or ð2Þ invest time and effort into strategiesfor improving outcomes with unknown expected payoffs. On the onehand, if teachers are ignorant of the education production function, theymay not see any value in expending extra effort into a production functionwhose output they cannot predict. If teachers choose 1 and do not respondto incentives, I would expect to see no statistically significant differencesbetween the treatment and control groups, which is consistent with a widerange of the results.On the other hand, there remain a surprising number of statistically
significant and negative estimates in the middle school cohort that aredifficult to explain. One explanation is that teachers are responding to theincentives, but in counterproductive ways. If a teacher invests excess time
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
into practices or interventions that are less efficient in producing studentachievement than their normal practices, I would expect especially mo-
Teacher Incentives and Student Achievement 405
tivated and misinformed teachers to overinvest time in ineffective prac-tices at the expense of student learning. In addition, the locus of negativeresults at the middle school level may suggest that employing new strat-egies is riskier in a middle school setting.The most striking evidence against the hypothesis that the results are
driven by teachers’ lack of knowledge of the production function is driv-ing the results is presented in table 8, which displays treatment effects onfive areas of the teacher survey that partly determined 10% of the school’soverall progress report score. Even if they are ignorant of the educationproduction function, survey response rates suggest that teachers are notexhibiting behavior that is consistent with a misguided but high effortresponse to the incentives program.As before, I present first-stage, ITT, and TOT estimates for each de-
pendent variable. The first outcome is the teachers’ response rate to thelearning environment survey. The next four outcomes are the teacher’saverage responses to four areas of the survey questions: safety and respect,communication, engagement, and academic expectations. Questions in thesafety and respect section ask whether a school provides a physically andemotionally secure learning environment. The communication area ex-amines how well a school communicates its academic goals and require-ments to the community. The engagement area measures the degree towhich a school involves students, parents, and educators to promotelearning. Questions in the academic expectations area measure how well aschool develops rigorous academic goals for students. The scores werestandardized to have a mean of zero and a standard deviation of one byschool level in the full city sample.One might predict that teachers in the incentive program would be
more likely to fill out the survey and give higher scores to their schoolsgiven that they can increase the probability of receiving the performancebonus by doing so. This requires no knowledge of the production func-tion—just an understanding of the incentive scheme. Table 8 reveals thattreatment teachers were not significantly more likely to fill out schoolsurveys. The mean response rate at treatment schools was 64% in the2007–8 academic year and 76% in the 2008–9 academic year. This mayindicate that teachers did not even put in the minimum effort of filling outteacher surveys in order to earn the bonus.
References
Aaronson, Daniel, Lisa Barrow, and William Sander. 2007. Teachers andstudent achievement in the Chicago public high schools. Journal of LaborEconomics 25, no. 1:95–135.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Baker, George. 2002. Distortion and risk in optimal incentive contracts.Journal of Human Resources 37:728–51.
C
D
F
F
G
G
H
H
H
Ja
Jo
K
L
—
M
406 Fryer
orcoran, Sean P., William N. Evans, and Robert M. Schwab. 2004.Changing labor-market opportunities for women and the quality ofteachers, 1957–2000. American Economic Review 94, no. 2:230–35.uflo, Esther, and Rema Hanna. 2005. Monitoring works: Getting teach-ers to come to school. Working Paper no. 11880, National Bureau ofEconomic Research, Cambridge, MA.irestone, William A., and James R. Pennell. 1993. Teacher commitment,working conditions, and differential incentive policies. Review of Ed-ucational Research 63, no. 4:489–525.ryer, Roland G. 2011. Financial incentives and student achievement:Evidence from randomized trials. Quarterly Journal of Economics 126,no. 4:1755–98.lazerman, Steven, Allison McKie, and Nancy Carey. 2009. An evaluationof the Teacher Advancement Program ðTAPÞ inChicago: Year one impactreport. Mathematica Policy Research Report no. 6319-520, Washington,DC.lewwe, Paul, Nauman Ilias, and Michael Kremer. 2010. Teacher in-centives. American Economic Journal 2, no. 3:205–27.anushek, Eric, and Steven Rivkin. 2005. Teachers, schools and academicachievement. Econometrica 73, no. 2:417–58.olmstrom, Bengt, and Paul Milgrom. 1991. Multitask principal-agentanalyses: Incentive contracts, asset ownership, and job design. Journalof Law, Economics and Organization 7:24–52.oxby, Caroline M., and Andrew Leigh. 2004. Pulled away or pushedout? Explaining the decline of teacher aptitude in the United States.American Economic Review 94, no. 2:236–40.cob, Brian A., and Steven D. Levitt. 2003. Rotten apples: An investi-gation of the prevalence and predictors of teacher cheating. QuarterlyJournal of Economics 118, no. 3:843–77.hnson, Susan M. 1984. Merit pay for teachers: A poor prescription forreform. Harvard Education Review 54, no. 2:175–85.ane, Thomas J., andDouglasO. Staiger. 2008. Estimating teacher impactson student achievement: An experimental validation.Working Paper no.14607, National Bureau of Economic Research, Cambridge, MA.avy, Victor. 2002. Evaluating the effect of teachers’ group performanceincentives on pupil achievement. Journal of Political Economy 110,no. 6:1286–1317.——. 2009. Performance pay and teachers’ effort, productivity, andgrading ethics. American Economic Review 99, no. 5:1979–2021.uralidharan, Karthik, and Venkatesh Sundararaman. 2011. Teacherperformance pay: Experimental evidence from India. Journal of PoliticalEconomy 119, no. 1:39–77.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions
Neal, Derek. 2011. The design of performance pay systems in education.Working Paper no. 16710, National Bureau of Economic Research,
Teacher Incentives and Student Achievement 407
Cambridge, MA.Rivkin, Steven G., Eric A. Hanushek, and John F. Kain. 2005. Teachers,schools, and academic achievement. Econometrica 73, no. 2:417–58.
Rockoff, Jonah E. 2004. The impact of individual teachers on studentachievement: Evidence from panel data. American Economic Review 94,no. 2:247–52.
Rockoff, Jonah E., Brian A. Jacob, Thomas J. Kane, and Douglas O.Staiger. 2008. Can you recognize an effective teacher when you recruitone?Working Paper no. 14485, National Bureau of Economic Research,Cambridge, MA.
Springer, Matthew G., Dale Ballou, Laura S. Hamilton, Vi-Nhuan Le,J. R. Lockwood, Daniel F. McCaffrey, Matthew Pepper, and Brian M.Stecher. 2010. Teacher pay for performance: Experimental evidence fromthe project on incentives in teaching. Nashville: National Center on Per-formance Incentives.
Vigdor, Jacob L. 2008. Teacher salary bonuses in North Carolina. Work-ing Paper no. 15, National Center for Analysis of Longitudinal Data inEducation Research, Durham, NC.
This content downloaded from 129.119.32.27 on Fri, 3 May 2013 10:10:44 AMAll use subject to JSTOR Terms and Conditions