proposal for the evaluation of sÃo paulo...

PROPOSAL FOR THE EVALUATION OF SÃO PAULO SCHOOL EMPLOYEES PERFORMANCE PAY REFORM

Barbara Bruns, Claudio Ferraz, and Marcos Rangel *

October 2008

BACKGROUND

Context for the reform in Brazil. Despite spending significant resources in education and increasing school attendance at all levels during the late 1990s and early 2000s, Brazil’s education performance is significantly lower than countries with similar income per capita. In the 2006 Program for International Student Assessment (PISA) test of math, science and literacy among 15 year old students, for example, Brazil ranked 54th among 57 countries in mathematics, scoring lower than Argentina, Indonesia, Mexico, Chile, Thailand and Uruguay. While reading performance was better, with Brazil ranking 49th out of 56 countries, Mexico, Thailand, Uruguay and Chile all scored significantly higher. These internationally benchmarked results as well as evidence from national and state-level learning assessments showing very low average proficiency levels have led to an overall agreement among policy makers that Brazil’s education challenge lies in improving the quality of public schools.

The large decentralization process that took place since 1988 and the introduction of several government programs aimed at increasing the amount of resources going to public schools have not improved the performance of students in test scores accordingly. Moreover, the distribution of test scores, even within small regions, shows a great disparity (controlling for student and family characteristics), suggesting that teacher quality and school management plays an important role (Menezes-Filho, 2007).

Teachers face weak incentives in Brazil. Salaries are relatively low and there are not incentives linked to performance as salaries are mostly determined by tenure (Holanda-Filho and Pessoa 2007). Low teacher motivation and large indices of absenteeism from the classroom directly affect students’ performance. About 30,000 teachers, or 12.8% of the total teaching force, are reported absent each day in the Sao Paulo state education system.

At the end of 2007, the São Paulo State Secretary of Education launched a program aimed at improving the quality of its 5,000 primary and secondary schools and 250,000 teachers. The program consists of several actions, among those the introduction of a new curriculum with clear guidelines on the material and competencies to be taught in each grade, a strong focus on

* Barbara Bruns is Lead Economist in the Office of the HD Chief Economist at the World Bank, E-mail: [email protected]. Claudio Ferraz is an assistant professor at the Catholic University of Rio de Janeiro (PUC-Rio), Brazil. E-mail: [email protected]. Marcos Rangel is an assistant professor at the Harris School of Public Policy, University of Chicago and a visiting professor at the University of São Paulo (USP). E-mail: [email protected].

mailto:[email protected]

mailto:[email protected]

universal literacy for young children and the introduction of supervisors to help directors and teachers improve school management. A central feature of the educational reform in São Paulo is the introduction of an innovative “teacher bonus” to link pay more closely to performance for all state school employees.

Context for research interest in evaluating this program. Despite the central relevance of teacher contracting and pay policies for education system performance, the evidence base on “what works” is weak. In both developing and developed countries, teacher pay is overwhelmingly based on educational attainment, training and experience, rather than performance. Yet variations in teacher performance, even within a single grade in the same school, are substantial (Rivkin et al 2001). Hanushek (2004) has estimated that the “good teacher effect” on student learning outcomes is roughly equivalent to the effect of a 50% decrease in average class size in the US – a much costlier reform. Studies also indicate a weak correlation between teachers’ actual effectiveness and the most common proxies for teacher quality, namely education and experience. Most of the evaluated experience with bonus or merit pay has been in the US. The early experience was not effective (Cohen and Murnane 1986), but these experiments may have been too limited in the magnitude of the reward and the character of the performance evaluation (Hanushek 1994).

The most carefully evaluated programs outside of the US are a cash bonus program for secondary school teachers in Israel (Lavy 2004), a program awarding prizes to teachers in grades 4-8 in rural Kenya (Glewwe, Ilias and Kremer, 2008), and a study in Andra Pradesh India that is currently in its second year. The Israeli results showed significant effects on student performance in the subject areas rewarded, which were attributed by the researchers to changes in teaching methods, after-school-teaching, and increased responsiveness to students’ needs. The researchers concluded that the cash bonuses for individual teachers were more cost-effective than alternative programs which offered cash bonuses for schools as a group or added instructional time to all schools. The Kenya study found relatively modest effects on student learning in the treatment schools, but these gains disappeared after a year. There was little evidence of teacher effort aimed at increasing long-run learning: teacher attendance did not improve, homework assignments did not increase, and pedagogy did not change. The only observed change was that teachers conducted more test preparation. The AP study (Muralidharan and Sundararaman, 2007) is a larger scale, longer-duration study, which after one year found a significant (.19 SD in math, .12 SD in language) impacts on student learning from both group-based (whole school rewarded for average learning gains) and individual incentives (teachers rewarded differentially based on the gains registered by their own class).

The proposed evaluation of Sao Paulo’s bonus program would be the first rigorous evaluation of such a reform in a middle-income developing country, in a program at scale. The study would have high marginal value as a complement to the existing research base in developing countries, which are of pilot programs in low-income settings, and produced somewhat inconsistent results. Our proposed study will evaluate how merit pay affects teachers’ effort, training uptake, skills and classroom practice, and student learning outcomes, and whether it promotes significant adverse behaviors (diverting curriculum time from non-tested subjects or manipulation of test results). Deeper understanding of these issues is needed for effective policy in this area.

THE INTERVENTION

The performance pay system designed for the São Paulo state schools is an annual bonus paid to schools based on how well they meet individual school level targets. Thus, school progress is measured and rewarded on a value-added basis, and implicitly takes into account schools’ differing socioeconomic contexts and specific educational challenges. The incentive is a strong one: for schools that meet 100% of their target, all employees will receive a bonus equivalent to three monthly salaries. For schools that do not meet their targets, the bonus will be paid proportionally to the percentage of the target met (i.e. schools that meet 50% of their target will receive a bonus equivalent to one and a half monthly wages).

The target is calculated for each school based on two sets of indicators. First, 70 percent of the target is calculated based on SARESP (Sao Paulo state annual achievement test) test scores and average school level promotion rates.1 The remaining 30 percent is based on teacher attendance and school management indicators.

The SARESP is a standardized test applied annually to all schools in the state of São Paulo. All students in grades 1st, 2nd, 4th, 6th, 8th of primary school (ensino fundamental) and the 3rd (final) year of high school (ensino médio) are tested on their knowledge of Mathematics and Portuguese. The scale of the exams varies between 0 and 500. Instead of defining school level targets based on average scores, which could create incentives for schools to disregard students at the bottom of the learning distribution, the Secretary of Education decided to use information from the whole distribution of test scores. Four levels of proficiency were created in order to facilitate teacher interpretations of the scores: Below Basic (Not Meeting Learning Standards), Basic (Partially Meeting Learning Standards), Proficient (Meeting Learning Standards), and Advanced (Meeting Learning Standards with Distinction).2 The cut-offs for each category are different for each grade. For Mathematics, for example, the 4th grade cut-offs are 175, 225 and 275. For 8th grade, they are 225, 300, and 350.

In order to aggregate the percentage of students that belong to each category into an index, the Secretary of Education assigned values that penalize the schools linearly for students that are below the Advanced category. The index assigns a penalty of 3,2,1 and 0 to students in each category (the value 3 is assigned to students Below Basic, the value 2 is assigned to students in Basic, and the value 1 is assigned to students in the Proficient level). The indicator of grade discrepancy is then calculated as:

D=3 *% belowbasic+2 *%basic+1 *% proficient+0 *% advanced

This indicator is then converted into an index that varies between 0 and 10 using the following formula:

I=(1−D /3)∗10

1 The SARESP is a standardized test of Portuguese and Mathematics. From grade 6 on, it also includes sections on Sciences and Social Sciences.

2 These groups were created using the statistical distribution of previous SARESP results. The first group includes the bottom 25th percentile, the second group blah…

In addition to this index, the indicator used by the secretary of education takes into account the approval rates. For each school, the primary school years are divided into two groups: the first that varies from 1st to 4th grade and the second from 5th to 8th grade. The average time it takes for students to complete a grade (or group of grades) is then calculated using the fact that the sum of the inverse of approval rates for each grade provides an estimate of the average time that it takes for students to complete a grade. This flow measure of average time it takes to complete a grade, F, is normalized to vary between zero and one by dividing the number of years that should take to students to complete a group of grade by the actual time that it actually takes.

The performance indicator based on test scores I, is then combined with the flow indicator F to create a measure of school quality for São Paulo--the IDESP3:

IDESP=I∗F

Because F varies between 0 and 1, it penalizes the schools for taking longer than expected to complete a series of grades and thus creates incentives in the direction of automatic promotion. But because performance also depends on test scores, there is a countervailing incentive which penalizes schools if students do not learn adequately.

Using the IDESP as an indicator of quality for each school, the secretary of education used the same methodology as the IDEB, implemented by the Ministry of Education. They assume that all schools will converge in 2030 to a maximum grade that equals 9. A logistic function is then estimated and the predicted value for each school and year from 2008 to 2030 provides the target that the school has to attain in that specific year.

The bonus will be paid for all schools according to the percentage of the target achieved by the end of the year. All employees from the schools that meet the target will receive a 100% bonus (equivalent to three monthly salaries), while schools that do not meet the target will receive a bonus that is proportional to the percentage of the target that is attained (e.g. for a school that meets 80% of the target, all employees will receive a bonus of 0.8*3 monthly salaries).

PRIMARY RESEARCH QUESTIONS AND OUTCOME INDICATORS

This evaluation aims to answer 10 research questions:

1) Does linking teachers’ pay to indicators of school performance via a bonus result in improved student learning?

2) Does linking teachers’ pay to indicators of school performance via a bonus reduce teacher absence?

3) Does linking teachers’ pay to student test scores via a bonus result in positive behaviors such as increased teacher effort (hours worked, quantity of homework assigned and graded), more effective teaching strategies, or reassignments in school personnel in favor of tested grades and subjects? 3 This formula is equivalent to the formula used by the Ministry of Education to calculate the national IDEB (Index of Basic Education Development) which is applied to all Brazilian public (state and municipal) primary and secondary schools.

4) Does linking teachers’ pay to student test scores via a bonus result in undesirable behaviors such as manipulation of test results or reduced class time spent on non-tested subjects?4

5) Does giving schools information about the rules of the game for the bonus significantly improve their chances of earning it?

6) Are schools that are unsuccessful in the first year of the bonus program more or less likely to put effort into competing for the bonus in subsequent years?

7) What strategies do schools use to try to improve performance under the bonus program?

8) To what extent do the levels of trust, teamwork and cooperation within schools explain their success in accessing the bonus?

9) How does success -- or lack of success -- in the first year of the bonus program affect levels of trust, teamwork and cooperation within schools?

10) What strategies do schools employ to build trust, teamwork and cooperation?

Outcome indicators will include: student test scores (SARESP), student enrollment, promotion and completion rates, teacher/school personnel absence rates. For all schools, the Secretary of Education collects rich socioeconomic and other background data on school directors, teachers, supervisors and students via an annual school survey, and the state also surveys parents via an online survey. São Paulo also has good budget data, which we will use to estimate changes in school-level spending and the cost-effectiveness of the reform in producing student learning improvements.

For a sample of schools, we will also try to deepen the analysis in three areas. First, we will collect data on teachers’ instructional strategies, use of time and classroom resources, through direct observation using a standardized classroom observation instrument. Second, we will collect qualitative data from directors, teachers, supervisors students and parents about perceptions of the bonus program and school-level changes both prior to and after the first round of bonus payments. Finally, through the application of an innovative set of new instruments, we will develop direct measures of the levels of trust, teamwork and social capital within schools.

EVALUATION DESIGN/ IDENTIFICATION STRATEGY

1. Estimating the impacts of the target based scheme

The core evaluation question for any performance pay scheme is whether its introduction improves performance. In the context of education, performance is measured by the acquisition of cognitive skills. Thus, this evaluation aims at estimating whether teachers under a target-based scheme put more effort into teaching and consequently improve students’ cognitive skills. The effects of introducing such a scheme can be divided in two. First, the announcement that a bonus will be paid based on a school target scheme might induce teachers to increase their effort in

4 Possible methods for measuring cheating on standardized exams were suggested by Jacob and Levitt (2003) in their work in the Chicago public schools. http://pricetheory.uchicago.edu/levitt/papers/JacobLevittCatchingCheating2003.pdf

teaching and therefore affecting students’ performance in test scores. This can be evaluated using pre and post-bonus data on test scores if there is variation across schools that introduce the bonus (ideally chosen randomly, as in Muralidharan and Sundararaman, 2007). Secondly, the payment of the bonus might have an effect on subsequent teacher behavior. Teachers that receive a large bonus might get encouraged while those that receive a small bonus might get discouraged. This evaluation proposal aims at measuring both the short and the medium-term effects of the target based bonus scheme.

Sao Paulo’s target scheme has three characteristics that make it unique and allow for a quasi-experimental evaluation strategy. First, schools that are in the bottom of the distribution of test scores will have to gain more with respect to their initial test scores in order to attain the target. Second, small differences in the distribution of test scores can induce larger differences in school targets because there are thresholds of test scores that divide each category. Third, schools with the same target have differential incentives because one can have more students near the threshold while another might have students that are far away from the cutoff.

2. Measuring the short-term effects of the target-based bonus

The announcement of the new performance pay scheme is expected to create new incentives for school employees. One way to credibly estimate the effects of the bonus announcement would be to randomize the introduction of a bonus system for a sub-group of schools and keep another group of schools under the current scheme. This strategy is followed by Muralidharan and Sundararaman (2007) to study the effects of group versus individual bonus schemes in rural India.

For the current program, which is being implemented at scale in São Paulo, randomization is not possible. We proposes a quasi-experimental approach that exploits discontinuities created in targets across schools based on the initial performance indicator and differences in incentives created by the distribution of SARESP grades.

The first idea exploits the fact that similar schools might end up with significantly different targets because of small differences in the initial SARESP distribution. The second idea exploits the fact that schools with the same target face differential incentives to meet the target depending on how far away from the cut-off to cross levels their students are. For two schools that have to meet the same target, the school with more students near the thresholds faces stronger incentives.

3. Exploiting Differential Incentives

Despite the fact that the rule of the bonus is the same for all schools, the effect of the bonus will depend on how easy it is for a school to meet its target. Schools that face the same target will have differential incentives to meet these targets depending on how far away from the thresholds that define the quality categories its students are located. A simple example will help to illustrate this point. Suppose there are two schools with 5 students each. School A has 3 students with a math score in the SARESP of 110, 1 student with 151 and another with 280. School B has 3 students with scores of 149, 1 student with 151 and another with 280. Because both schools have 60% of their students below basic level (score less than 150), 20% of students in the basic level and 20% in the advanced level, these two schools will have the same indicator of performance equal to 2.67. Suppose for simplicity that they have the same approval rates, their target for 2008

is going to be exactly the same. Nonetheless, the effort these two schools have to put to meet their target is significantly different. While school B has to put little effort to make its students cross the 150 threshold, school A effort will have to be significantly larger. Hence, the incentive for extra effort for schools that have a larger share of students near an upper threshold will be stronger. Conditional on the target, the incentives faced by schools should decrease as the average distance of students’ scores from the cut-offs increases.

For simplicity, suppose there were only two categories that defined school quality (not proficient and proficient) and students were considered proficient is they have a test score of at least 150. For a single school, the average distance of students from cut-off 150 is:

Dis tan ce=∑i

( scorei−150)/ N ,

where N is the number of students below the cut-off 150. With a single cut-off, we could estimate the effect of stronger incentives on an outcome y (test scores or the % of the target met by the school) by:

ysm=α+ β Dis tan cesm+δT arg et sm+γX smi+ηm+εij ,

where Distance is the average distance of students’ test scores to the next cut-off, Target is the target for the school based on initial test scores, X is a vector of student, parents, and school characteristics, η is a school fixed-effect and ε is an error term. Controlling for the initial Target, the incentive faced by school employees become stronger as Distance decreases and we would expect the coefficient β to be negative.

Because in our setting there are multiple cut-offs that define four categories, we can measure the average distance of students belonging to a specific category for the three cut-offs. We can then estimate a model where the coefficient for each distance enters separately:

ysm=α+ β1 Dis tan ce1sm+ β2 Dis tan ce2sm+ β3 Dis tan ce3sm+δT arg et sm+γX smi+ηm+εij.

4. Using Discontinuous Differences in Targets

A second source of variation that emerges from the target rules is that small differences in initial test scores generate significant differences in targets across schools. Using another simple example we can illustrate this point. Suppose, in addition to schools A and B used above, that there is another school with 5 students. In school C, 4 students scored 151 and 1 student scored 280 in the math test. This school has 80% of students in the basic category and hence will have a performance indicator of 4.67. Even though the distribution of scores is very similar to school B, the target for 2008 will be significantly different because of the initial test scores. Hence this school will have to attain a smaller increase in order to attain its target, creating more incentives for effort.

Despite not being able to implement an experimental approach, the rules of the target based mechanism allow for a credible quasi-experimental design in the spirit of Fuzzy Regression Discontinuity Designs. Because two schools with similar average test scores can have different targets because one has more students slightly below a cut-off while the other has more students slightly above the cut-off, the target will vary discontinuously with the initial distribution of test

scores. If we control flexibly for the initial distribution of test scores, we can use a two stage least square estimator to credibly estimate the Local Average Treatment Effect by employing the cut-off indicators as instruments for the Target (see Angrist and Lavy 1999, Van der Klauw 2002 for similar identification strategies.) This strategy will allow us to estimate the effect of the Target on outcome y (teacher effort and student test scores).

5. Measuring Medium Term Effects

While the announcement of the bonus is likely to have impacts on effort during the first year of the program, once the bonus is paid, it can also be expected to affect teacher effort and subsequent student test scores.

Let ysm2 be the performance of school s, in municipality m, in year 2. Let the Bonussm be the average bonus paid to employees in school s, we can estimate the effect of the bonus on performance in year 2, controlling for performance in year 1 and other school characteristics, by estimating:

ysm 2=α+β Bonussm+δy sm 1+γ1 X ji+ηm+εij ,

where Bonussm=%T arg et sm . In order to estimate the effect of the bonus, β, consistently we need exogenous variation in the Target across schools. Unfortunately we cannot randomly allocate the targets across schools and unobservable characteristics that determine the test scores in year 2 might be correlated with the initial target that a school might have.

Again, we can use the variation induced by the initial Target to estimate the effect of the % target achieved on subsequent test scores.

6. “Social Capital” and the Effects of the Bonus

While predicting the effects of introducing an individual bonus scheme is relatively straightforward, the effects of a collective bonus scheme such as Sao Paulo’s are likely to depend on the role of social capital in terms of trust, cooperation and teamwork. Because the bonus will be paid to all employees of schools based on the performance of 4 th and 8th grade students, it is very likely that, in order to be successful, schools will have to have a work environment in which free-riding is minimized and peer-monitoring is most effective.

In order to address this issue, we propose to measure the trust and cooperation of teachers and examining whether the effects of the bonus depends on these measures. We propose to use two alternative measurements. The first is based on a well known survey of attitudinal measures of trust adapted to the school context. The Secretary of Education has accepted to integrate several of these questions into their annual school survey applied to teachers.. The second measure will be implemented via laboratory exercises with teams of co-working teachers. We use well developed lab-techniques in which individual’s type is inferred from actions taken within a simulated environment. We focus on two experiments in this case: trust and public goods provision (or, in other words, free-riding behavior).

6,1 Survey-based measures

Attitudinal survey questions previously utilized in the experimental-economics literature are adapted to the school context we face. Trust and trustworthiness are measured by adding specifically scaled questions to the socio-economic questionnaire applied annually to teachers and other school personnel as part of the standard data collection by the state Secretary of Education. The scale of responses include: Agree completely, Agree, Disagree, Disagree completely, Don’t know. The questions we introduced are:

Teachers and staff in this school are always willing to help co-workers

Teachers and staff in this school have common opinions regarding what is right and wrong.

Teachers and staff in this school can be trusted

If I were in need of help, I could ask a co-worker.

I am a person my co-workers can trust.

If a teacher or staff member in this school were in need of help in an emergency, they could ask for my assistance.

I can trust in the majority of people in my community.

I can trust the majority of teachers and staff members in this school.

The majority of teachers and staff members in this school would try to take advantage of you IF they had a chance.

Most of the time, people in this school are solely worried about themselves.

I can succeed on my own, and do not need a large group of people supporting me and each other.

People who put a lot of effort into their work are generally more successful than those who do not put a lot of effort into their tasks.

6.2. Lab-based measures

Lab-based measures will be implemented following the protocol of two well know lab experiments measuring trust and public-goods voluntary provision.

Trust:

Individuals, assigned roles A and B, are each credited with ten “experiment dollars” (E$), which will be converted to “reais” at the rate E$1 = R$R (to be determined) at the end of the session. Subject A is asked to choose a whole number of experiment dollars, E$Xa in (0,1,...,10), to send to subject B, knowing that B will receive triple the amount sent and can send back a proportion (restricted to sixths of the amount received e.g., 0, 1/6, 1/3, etc.) of that amount, including nothing. Thus, A earns any part of the E$10 kept and can earn an additional amount between E$0

and E$30, depending on B’s choice. B would earn between E$10 and E$(10 + 3Xa – Xb), where Xb ≤ 3Xa is the amount B sends to A. The version of the trust games used involved multiple rounds, which differ slightly from each other. The decisions are made on a grid where rows indicate A’s possible choices of how much to send and columns B’s potential choices of how much to return.

In the first round, subjects play the standard game. In the second round, A can propose a course of action for A and B by highlighting a row and a column. As pre-announced to them, if B agreed to A’s proposal by highlighting the same row and column, they are offered separately (and without knowing the other’s response) the chance to enter into a costly contract. If either or both subjects said no, there is no contract, and they proceeded to make their decisions, as in the standard interaction. If both subjects said yes to the first question, they are asked if they wanted a contract with penalties. If both say yes, they entered into such a contract; if at least one says no, they remain with a contract without penalties. Contracts with penalties cost each subject E$1, whereas contracts without penalties cost E$2. In some experimental sessions, the price of contracts can be doubled.

From these experiments we can derive two measures of trusting – amount sent by A in the first round, and whether A requested a contract if B agreed to his or her (regardless of whether the request was accepted or rejected) in the second round. The amount sent in first round is the traditional operationalization of trust derived from the trust game and is the same approach as in Glaeser et al. (2000). Trust is measured as the number of experimental dollars, ranging from 0 to 10, sent by participants in role A – trustors – in the first round of the trust game. In the first round participants do not have prior experience with the game and have no information about the person with whom they are paired. Participants therefore do not have any game-related information to base their choices on and so are likely to base their choice of how much to send to the receiver on their propensity to trust others. Sending a larger amount is commonly interpreted to reflect a greater level of trusting. Requesting a contract in the second round is an operationalization that allows us to check whether participants in role A were not satisfied with only having an agreement (concurrence) by subject B with the proposal they made but also required additional assurance in the form of a costly contract. Requesting such a contract is then an indicator of distrust.

Public-Goods provision:

We propose a game that follows the structure described in the instructions below:

This is a simple card game. Each participant will be given four cards, two of these cards

are red (hearts or diamonds), and two of these cards are black (clubs or spades). All

cards will be the same number. The exercise will consist of a number of rounds. When a

round begins, the organizer will come to each participant in order, and they will play two

of their four cards by placing these two cards face down on top of the stack in the

organizer’s hand. Individual earnings in “experimental dollars” are determined by what

players do with their red cards. In each of the first five rounds, for each red card kept

players will earn four experimental dollars for the round, and for each black card kept

they will earn nothing. Red cards that are placed on the stack affect everyone’s earnings

in the following manner. They will be counted and the total number of red cards in the

stack determines the number of dollars everyone gets. Black cards placed on the stack

have no effect on the count. When the cards are counted, individual decisions will not be

revealed (i.e., who made which decisions will remain unknown). Each player will receive

back their own cards at the end of the round. To summarize, earnings for the round will

be calculated:

earnings = $4 times the # of red cards kept + $1 times the total # of red cards collected.

After round 5, a change in the earnings for each red card kept will be announced. Even

though the value of red cards kept will change, red cards placed on the stack will always

earn one dollar for each person. Another change will be announced after round 10 and 5

more rounds will be completed after that.

7. Estimating the Role of Information in a Target-Based Bonus Scheme

Up to this point we have assumed that school employees know exactly the rules of the target-based bonus and will respond rationally to the new incentives. Nonetheless, because of the complexity in the way the targets are calculated, it might be that differences in the information on how to attain the school target might affect the school responses. In this sense the incentives of the bonus will be stronger, all else constant, in schools that better understand the rules of the game and are able to design a strategy for attaining their target.

This section proposes a complementary evaluation of the target-based bonus scheme to measure whether providing information to teachers, directors and school employees about the bonus rules and strategies to meet the school target affect their efforts and consequently, affect students’ test scores. The experimental design will be based on a sub-sample of randomly selected schools that will be chosen to receive an intensive training trough a workshop. This is an encouragement design similar, for example, to Duflo and Saez (2003). We will provide a randomized encouragement where we supply information on how close schools are to targets and how they can meet these targets while a control group of schools do not receive such information.

7.1 Information in Target-Based Bonus Schemes

Let infosm be a treatment indicator variable that equals to one for schools that receive an informational training. Let the performance of the school be measured by the % of the target attained (%Targetsm). In the first stage, we aim at estimating whether receiving information affects the % of the target attained by the school:

%T arget sm=α+β Infosm+γX ji+ηm+εij ,

The coefficient β measures the average impact of receiving information on the performance of the school measured as the % of target attained. Alternatively, we can also estimate the effect of information on test scores in the first year of the bonus, controlling from pre-bonus test scores (ysm0):

ysm1=α+β Info sm+δy sm 0+γX ji+ηm+εij ,

Finally, we can use the experimental design as a source of variation of the bonus that employees received after the first year of the program to measure how different levels of bonus received affect subsequent performance. This can be done by using the treatment indicator for information as an instrument for the %Target. In the first stage we will estimate:

%T arget sm=α+β Infosm+γX ji+ηm+εij ,

And in the second stage we will estimate the effect of the average Bonus by estimating:

ysm 2=α+β %T arg et_____

+δysm1+γ 1 X ji+ηm+ε ij ,

Where %Target is the fitted value from the first stage.

It is important to note that we will need to measure teachers’ level of understanding and information about the bonus program before and after the workshops in order to measure the gains in information. Moreover, among the selected treatment schools, we need variation in how close they are to the cut-offs and variation among the initial distribution of test scores (good as well as bad schools) in order to estimate the complementarity between the incentives and the provision of information.

8. Additional Information Needed

While the approaches described above estimate the effects of the strength of the incentives created by the bonus on student test scores using secondary test-score data, for a full understanding of the impacts of the bonus it will be important to understand other effects.

First, it is important to understand what types of actions are taken by teachers and school administrators in order to improve student performance in test scores. Teachers may change their

teaching practices and/or their level of effort (reducing absences, assigning and grading more homework, etc). Administrators may change teachers’ assignments to prioritize the grades and subjects that are tested. Second, it is important to test for potential free-riding by testing for differential effort across teachers of students that will be tested with those that will not be tested. Third, it is important to test for the type of informal monitoring that occurs between teachers and try to uncover monitoring schemes that might help alleviate the potential moral hazard in a group-based incentive.

In order to do so, we will need to follow teacher activities during the first two years of the implementation of the bonus scheme. A sample of schools will be drawn and enumerators will collect data through classroom observation and teacher, administrator, and student surveys on the types of activities executed by teachers, administrators and students in response to the introduction of the bonus program, and schools’ first year results. These data will allow for an estimation of peer pressure.

DATA COLLECTION AND POWER CALCULATIONS

Data on pre and post-program test scores and teacher presence will be available from the São Paulo Secretary of Education administrative records. The Secretary of Education will implement, prior to the SARESP test, an online questionnaire with school directors, teachers and supervisors that include socio-economic characteristics, tenure, and perceptions. The effort to collect data will focus on:

a) Following teacher activities during the first two years of the implementation of the bonus. A sample of schools will be drawn and enumerators will collect data through classroom observation and teacher, administrator, and student surveys on the types of activities executed by teachers, administrators and students in response to the introduction of the bonus program. The sample will be stratified in order to contain both schools that faced stronger incentives because of their initial distance to their target as well as schools that faced weaker incentives because their original distribution of test scores were far away from the target.

b) Implementing the randomized information package in a sample of schools and collecting information about teacher knowledge of the program before and after such information intervention. The sample of treatment and control schools will also be stratified in order to contain both schools that faced stronger incentives because of their initial distance to their target as well as schools that faced weaker incentives.

In order to calculate the number of schools that will be chosen to be part of the information campaign evaluation, we will assume that we want to detect a minimum standardized effect of 0.15 (standard deviations of test score). Let us assume a confidence level of 5% and an average class size of 25. The number of clusters (schools) needed to detect the effect will also depend on the intra-class correlation for student test scores. We take the results from the 2005 Prova Brasil

for 8th grade. It yields an intra-class correlation of 0.10 for language test scores and 0.12 for mathematics.5 We can estimate the sample size needed to detect a standardized effect of 0.15 (standard deviations of test score) assuming a power of test of 0.80 and 0.90.

If we were only assigning one treatment and wanted a power of the difference between treatment and control of 0.90, we could have a sample of 66 treatment and 66 control schools. For a desired confidence level of 1%, this sample would increase to 188 schools. In the context of this evaluation, however, we have to take into account that we need to be able to compare the effect of the information campaign for schools above and below a specific target (or with weaker and stronger incentives). One way to think about this, would be to have a two-by-two matrix and think of four groups. This will generate 376 schools divided in four groups of 95 schools each.

The data collection on teacher activities will take place on a random subset 376 schools assigned to these four groups.

EVALUATION TEAM

Responsibility for the evaluation will rest with co-Principal Investigators Claudio Ferraz (PUC-

Rio), Marcos Rangel (University of Chicago and USP) and Barbara Bruns (HDNCE). Ferraz,

Rangel, and Bruns will be responsible for the design and implementation of the evaluation, and

will be the lead authors of the final impact evaluation report. Vitor Pereira (consultant) will be

the field coordinator for the evaluation and Katie Conn (HDNCE) will provide research support.

Paul Gertler (UC Berkeley) has been retained to provide advice on the design and help assure the

overall quality of the work. Sebastian Martinez has also provided advice to the team. Vitor

Pereira will be responsible for the planning and quality control of all field work. Our team is

also working closely with the team of researchers contracted by the Sao Paulo State Secretary of

Education to evaluate other aspects of the reform program.

Operational advisor: Madalena Dos Santos, Acting HD Sector Leader for Brazil has been deeply

involved in the dialogue with the Secretary of Education on the proposed evaluation and has

advised the team on all operational and political aspects of the work.

Local capacity building: Technical capacity in Brazil, especially in the State of Sao Paulo, is

strong. Although the team in the Secretariat of Education does not have experience carrying out

5 Note that the Prova Brasil only includes urban schools, so these correlations might be slightly different once rural and smaller schools are introduced. We calculate the sample sizes needed to detect the effects of the evaluation taking alternative intra-correlation coefficients into account.

the relatively sophisticated research design proposed here, they are expected to gain skills

quickly through collaboration with the international research team on the design and

implementation of this evaluation. Indeed, the state Secretary of Education is explicitly

interested in building the skills of her team through “learning by doing”. To ensure the quality of

the data collection effort – especially vis a vis the innovative experimental instruments which we

propose to apply – a full-time field advisor has been recruited. To ensure broad dissemination of

this work and its results, a series of seminars on impact evaluation are also envisaged over the

course of our collaboration. Finally, several team members from the Sao Paulo state education

secretariat will be invited to attend the 2009 SIEF regional training course tentatively planned for

Brazil.

PRIMARY RISKS IN COMPLETING THE EVALUATION AS DESIGNED

Risk Rating Contingency Plan

Randomized provision of

proposed information

package to a sample of

schools is deemed

politically unacceptable

M If Secretary does not wish to proceed with this component,

the evaluation will have to rely on the RD identification

strategy.

The bonus program will be

delayed or fail to be

implemented.

N The Secretary of Education is politically popular and has

been actively campaigning and laying the ground for this

reform for the past year. Key elements have already begun

to be implemented. We do not see much risk that the reform

will not be implemented – at least for the first year.

Bonus program will be

stopped or radically revised

after first year.

M Teacher bonus programs are inherently politically risky and

Brazilian teachers’ unions are politically strong and vocal.

The Secretary has been successful in getting union buy-in for

the initial implementation of the program. But, depending

on the first year results, political support could erode. We

are confident that the very dynamic Secretary will actively

manage this risk by highlighting whatever positive benefits

emerge from the program and hammering the serious

failures of the system prior to the reforms. While this risk

cannot be ignored, even if the program were to be suspended

after the first year, we have a robust design for evaluating its

short-term effects and that would be a contribution to the

very sparse empirical evidence with this important type of

reform.

RISK RATING: H [HIGH RISK], S [SUBSTANTIAL RISK], M [MODEST RISK], N [NEGLIGIBLE

RISK]

TIMELINE

Sao Paulo Teacher Incentive IE9.30.2008Timeline

2008 2009 2010 S O N D J F M A M J J A S O N D J F M A M J J A S O N DEvaluation design/concept note review x MOU with Sec. Ed x Contracting of Research Firm (FIPE) x Social Capital and Trust Experiments Contracting of enumerators (USP) x

Development of protocols and enumerator training x Piloting of Games - 4 schools x x Training of full cohort of enumerators x Application of trust experiments - 376 schools x x x x Data capturing/processing x x x

Data analysis x x

Draft report x x

Classroom observations Contracting of classroom obs. trainers x Training of trainers in Classroom observation x Training of SEE supervisors/coordinators x Classroom observations x x x x x x x x Data capturing/processing x x x x x x x x

Data analysis x x x x x x x x

Draft report x x x x x x x x x x

Information Treatment Design of Information package x Training of SE teams to deliver inf. Workshop x Information workshops x x Data capturing (pre and post workshop tests of x x

Data analysis x

Draft report x

Final report x

In-depth School Surveys Teacher, student, parent, director, supervisor questionnaires x x x x x x x x x Data capturing/processing x x x

Draft report x

Final report x

BUDGET

(See attached spreadsheet)

REFERENCES

Angrist, Joshua D., and Victor Lavy (1999). “Using Maimonides Rule to Estimate the Effect of Class Size on Scholastic Achievement.” Quarterly Journal of Economics 114:533–575.

Duflo, E. and E. Saez (2003), “The Roles of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment”, Quarterly Journal of Economics 118(3): 815-842, 2003.

Glewwe, Paul, Nauman Ilias and Michael Kremer (2008) “Teacher Incentives”, (mimeo)

Gordon, Nora and Emiliana Vegas (2005), “Educational Finance Equalization, Spending, Teacher Quality, and Student Outcomes: The Case of Brazil’s FUNDEF” in E. Vegas (Ed.) , Incentives to Improve Teaching: Lessons from Latin America. The World Bank: Washington D.C.

Menezes-Filho, N. (2007). “Determinants of School Performance in Brazil”. Instituto Futuro Brasil, Mimeo.

Muralidharan, K. and V. Sundararaman (2007), “Teacher Incentives in Developing Countries: Experimental Evidence from India”, Mimeo World Bank.

Van Der Klaauw, W. (2002). “Estimating the Effect of Financial Aid Offers on College Enrollment: A Regression Discontinuity Approach.” International Economic Review 43:1249–1287.

proposal for the evaluation of sÃo paulo...

Documents