what are the limitations of survey data versus...

1

What are the limitations of survey data versus

administrative data in impact evaluation?

The case of an SMS campaign in Peru

Preliminary Version 13/06/2016 (do not cite or reproduce)

César Huaroto (Universidad Nacional de la Plata, Argentina)1

Andrea Cornejo (Columbia University)

Luis Baiocchi (Pontifical Catholic University of Peru)2

Abstract

This paper studies the effect that measurement error of survey data has on the precision

of results of a rigorous impact evaluation, also known as attenuation bias. This case study

compares the results of an RCT that evaluates the impact of an SMS campaign to improve

maintenance budget expenditure, when the outcome variable is measured using

administrative data and survey self-reported data.

Four important conditions are met that allow this study to take place: 1) a randomized

control trial (RCT), 2) Both administrative and survey data collected the same indicator, 3)

Impact of the treatment (different from zero), 4) Comparable timing on both sources of

data.

The comparison between self-reported and administrative data for the outcome variable

for the same schools reveals over-reporting of the outcome variable on the former, where

an activity is reported as being completed when in fact it has not been, according to the

administrative data. This attenuation bias leads to an underestimation of the impact of the

intervention tested by the RCT, an SMS campaign.

These results are consistent with those of previous studies on the subject, and suggest

that impact evaluation results can be negatively affected by attenuation bias. Furthermore,

they indicate that the reason behind this is not related merely to sampling bias, but also to

the quality of the self-reported data collected using surveys. In a context where the vast

majority of impact evaluations rely on primary data collection, these results highlight the

benefits of using administrative data for impact evaluation analysis.

1 Corresponding author: [email protected]. 2 The authors are part of a new initiative within the Ministry of Education of Perú (MINEDU) called MineduLab, a laboratory dedicated to performing impact evaluations of policy innovations using Randomized Control Trials (RCTs) and relying on mainly administrative data. We would like to thank the Principal Investigators of the original RCT impact evaluation of the PRONIED SMS campaign, Juan Manuel Hernandez-Agramonte, Stanislao Maldonado, and Andrew Dustan, for sharing the details of the RCT and for their feedback on this document. We also thank all the different actors within MINEDU involved in different stages of the experiment who shared their data and knowledge about how data was generated (both for Wasichay system and Semaforo Escuela survey) and how the PRONIED maintenance program worked. The opinions presented in this paper are entirely those of the authors, and are not endorsed by MINEDU.

2

1. INTRODUCTION

An important limitation of self-reported data is that researchers cannot identify or

control for all possible sources of measurement error that can be introduced during

field work that can affect precision and, therefore, accuracy of the impact

evaluation results (see Stecklov and Weinreb, 2010). Moreover, research tends to

assume that this error is distributed randomly between treatment and control

groups or that this issue is easily solved only by increasing the number of

observations (Millimet, 2010, Bertrand and Mullainathan, 2001).

Nevertheless, in the past decade rigorous impact evaluations have come to rely

strongly upon the primary data collection, which often overlooks non-sampling error

sources (ie. respondent bias, cognitive errors). During this time, impact evaluations

in development studies have acquired the common practice to perform data

collection and analysis in many developing countries (McKenzie, 2012, Duflo,

Kremer & Glennerster, 2007).

This case study allows the comparison between the use of self-reported3 and

administrative data to calculate the outcome variable for the same schools. The

analysis reveals that self-reported surveys tend to over-report compliance of the

outcome variable. In other words, we see that an activity is reported as being

completed when in fact it has not been. Consequently, we observe an

underestimation of the impact of the SMS campaign.

The results presented in this paper are consistent with those of previous studies on

the subject, and suggests that impact evaluation results could be negatively

affected by attenuation bias. Furthermore, they indicate that the reason behind this

is not merely related to sampling bias, but also to the quality of the self-reported

data collected using surveys.

The organization of this paper is as follows: Section 2 will first review contributions

of recent literature to the discussion of measurement error and attenuation bias,

identifying key pitfalls of self-reported data for estimating effects; Section 3 will

describe the case study, the administrative and self-reported databases used for

analysis, and the original design of the RCT; Section 4 details the methodology of

the comparison analysis of measurement error; Section 5 summarizes the findings

and consolidates key conclusions.

3 For purposes of this paper, we consider the terms survey data and self-reported as interchangeable concepts

3

2. MOTIVATION

There is a scant but growing body of literature that has begun to explore the effects

of measurement error on impact evaluation analysis (Taubman et al. 2014, Milimet,

2010, Zwane et al. 2011). For instance, in 2001 Bertrand and Mullainathan

discussed the potential of self-reported results to introduce measurement error that

correlates with important characteristics and behaviors has become more

prominent over the past few years.

Measurement error itself can be broken down into several types and can be found

at all levels of the data collection process. In 2010, Stecklov and Weinreb

performed an extensive review of the major sources of measurement error beyond

sampling and coverage error, including error due to respondents and interviewers,

comparability effects and post-survey errors.

These sources of attenuations bias deeply affect the precision of impact evaluation

results in the context of public policy, as has been demonstrated by Barrera-Osorio

et al. (2011), and Baird and Ozler (2011), that identify an overstatement of the

school attendance rates. More recently in 2014, Taubman et al. also find that there

is statistically significant discrepancies for the impact of Medicaid coverage on

emergency care usage, mentioning that using administrative data for RCTs as

opposed to traditionally collected self-reported data can be crucial for precision of

impacts4.

However, there have been few case studies on the effect that measurement error

of survey data has on the precision of results of a rigorous impact evaluation

(Taubman et al. 2014, Beegle et al. 2010, Millimet, 2010). This is largely because

performing this study requires that at least four important conditions are met: 1) A

completed randomized control trial (the gold standard of impact evaluations), 2)

Administrative and survey data available for the same indicator, 3) Impacts of the

RCT that are different from zero, 4) Comparable timing for both sources of data.

This study meets all of these requirements, allowing us to look compare the main

outcomes when both data sources are used.

In large part, the motivation behind this paper was to elucidate the advantages of

using administrative data, most of which are associated with a reduction in risk of

measurement error in primary data collection in several ways:

4 Self-reported data collection is usually conducted by an enumerator or surveyor (Baird and Özler, 2010, Barrera-Osorio et al. 2011). In the academic social sciences, it is performed internally - through a team of surveyors recruited specifically for the evaluation - or by subcontracting a local data collection firm. Surveyors receive training prior to initiating data collection, accompanied by a systematic monitoring and supervision throughout the process to assure quality data collection. These methods aims to secure high quality data collection by following a series of good practices - however they do not preclude the presence of attenuation bias

4

a) Systematic: The fact that administrative database by nature are systematic

increases the likelihood that procedural issues of data collection have been

identified and resolved. Data collected digitally can also help avoid post survey

errors that are associated to manual data entry.

b) Non-personal: Databases that electronically register and upload indicators over

time do not require an interviewed person to self-report results to a surveyor,

reducing both respondent and interviewer bias; additionally, recurring visits by

surveyors may also alter the behavior being measured and risk underestimating

effects.

c) Periodical: A continuous collection of the same indicator over a prolonged

period of time permits a time-sensitive analysis, which in turns allows for better

identification of several sources of attenuation bias, such as recall bias (Das et

al, 2011)

These comparative benefits to utilizing administrative data to calculate and validate

effects of rigorous impact evaluations are important considerations that can

improve precision of results that shape policy design (Feeney et al. 2015).

Furthermore, as it becomes clear that administrative data is both less costly and

more reliable to evaluate impact, more and more countries are strengthening

systematic and reliable data systems (Meyer and Mittag, 2015, Feeney et al. 2015,

Finkelstein and Taubman, 2015).

In parallel, institutions dedicated to generating to shape policy, such as the

Behavioral Insights Team and the Global Insights Initiative at the World Bank have

grown in number and visibility. These institutions are pushing the barriers of

traditional rigorous impact evaluations to generate a focus on low-cost and cost-

effectiveness evaluations that center on behavioral economics and administrative

data to measure impacts.

Several governments have followed suit, including the Social Behavioral Science

Team on the White House, as well as emerging nation initiatives such as

MineduLab, the Peruvian Ministry of Education initiative to generative evidence on

cost-effective education policy innovations. These initiatives seem to have

unlocked critical advantages to usage of administrative data beyond a reduction in

cost that translates into a larger sample size and robust results.

5

3. THE CASE STUDY SETTING

a. Antecedents

This paper takes advantage of a unique case study that allows a comparison

between an administrative data based analysis and primary data collection data on

the same exact indicators, over the same window of time. The case performs a

quantitative comparison that calculates the primary outcome indicator utilizing

administrative data as well as self-reported data from surveys in the context of an

RCT-designed impact evaluation on an SMS Campaign. The SMS campaign

consisted of reminder messages that were sent to almost 30,000 school

infrastructure maintenance managers5 (one per school) in Peru to improve the

infrastructure budget management indicators.

The administrative data system, called Wasichay, allows each school infrastructure

maintenance manager, hereafter simply called maintenance manager, to

electronically submit two required management documents: 1) An infrastructure

budget expenditure planning sheet, submitted early in the school year that details

how resources will be spent; 2) An expenditure declaration sheet, submitted near

the end of the school year that details how much and how assigned resources

were spent.

The second source of data is a large scale monitoring system launched by the

Ministry of Education of Peru, hereafter called Semáforo Escuela (SE). Each

month, the SE program sends a team of over 300 surveyors to visit a national

representative sample of approximately four thousand different schools. The

monitoring system gathers information on many indicators, a few of which address

administrative operations of the maintenance program.

The SE monitoring system has all the common features of normal surveys: they

are unannounced, guided by the interviewer, correspond to self-reported

information from school principals, and represent a very expensive sample design

(a large sample for which indicators are collected monthly throughout the school

year). In other words, it represents a much more effort-intensive counterfactual of

the type of primary data that would be used to measure the impact of the treatment

had the administrative database of Wasichay not been available.

The results tell an interesting story about the downfalls of using self-reported data

in contrast to administrative data. Comparing outcomes for the same schools, we

find that maintenance managers tend to over-report the timeliness with which they

5 The Peruvian National Program for Educational Infrastructure Maintenance designates one faculty member per public school to serve as the school infrastructure maintenance manager. Typically, he or she tends to be the school principal, though often it may also be a school teacher.

6

report their expenditure activities. There is also a non-trivial percentage of

maintenance managers that report not completing the activity when in fact they

have done so. More importantly, the findings of this study reveal that while there

are no statistically significant results of the SMS campaign when using the self-

reported SE data, the same analysis using administrative data from the Wasichay

database yields statistically significant results. These findings illustrate the costs of

attenuation bias for the case a public policy program.

b. The Maintenance Program and Wasichay system

The National Program for Educational Infrastructure´s (PRONIED) Maintenance

Program is an important intervention aimed at closing the gap in educational

infrastructure of public schools in Peru. In 2014, PRONIED´s Maintenance program

transferred S/ 571 million (US$ 197 million), and S/. 350 million (US$ 100 million)

in 2015 to more than 50,000 schools around the country.6 However, of these

assigned funds, only about 90% is withdrawn from official accounts and about 85%

of the withdrawn budget is submitted in a timely manner.7

The PRONIED Maintenance program works directly with the maintenance

manager of each school8, whom is responsible for the appropriate spending of the

allocated maintenance budget assigned to his/her school, within the given

timelines established by the PRONIED program. The manager is responsible for

organizing oversight committees, for planning the maintenance related activities,

withdrawing and making use of the allocated maintenance funds, and finally

declaring all the expenditures to PRONIED. A brief description of activities involved

in the Maintenance Program can be seen in Figure 1.

To manage and track the information regarding withdrawals and declaration of

expenditures, the PRONIED program manages a large database called Wasichay.

This database was designed so that maintenance managers can log in to their

account when connected to the internet, and update their progress regarding the

planning of maintenance activities throughout the school year as well as the

withdrawal, spending and declaration of the maintenance funds assigned to the

school. The Wasichay database therefore is a live database, with central servers at

the Ministry of Education in Lima, Peru.

6 Infrastructure maintenance funds are transferred to each school manager; managers are responsible for planning, executing and declaring expenses. Funds are monitored through the Wasichay database 7 The activities allowed this budget are: 1) Roof Reparations, 2) Repair of floors, 3) Sanitary bathroom repairs, 4) Repair of walls 5) Repair of doors, 6) Repair of windows 7) Repair of electrical installations, 8) Repair of school furniture, 9) Replacement of school furniture, 10) Painting of walls, 11) School supplies and equipment, and materials for educational use 8 Each public school in the education system has one official school maintenance manager that is in charge of receiving, appropriately executing and declaring the expenditures for these funds

7

Figure 1. Timeframe for PRONIED Maintenance Program9

9 UGELs are the Local Units of Education Management, the lowest level of autonomous intermediary agencies that receive central government education funding for distribution and spending. UGELs on average have 200 schools under their jurisdiction throughout 1-5 districts. In total there are 223 UGEL on the country.

PRONIED PREPARATION

- Selection of Schools- Selection of school maintenance managers

- Budget allocation.

- Preparation of expenditure guidelines (items and deadlines)

MAINTENANCE MANAGER

PREPARATION

- Forming the school maintenance committee

- Forming the oversight committee

- Submission of Expenditure Planning Sheet

EDUCATIONAL DISTRICT

(UGEL) AND PRONIED

VALIDATION

- Approval of Expenditure Planning Sheet by PRONIED program specialists

MAINTENANCE MANAGER

BUDGET EXECUTION

- Withdrawal of funds assigned to the school.

- Expenditure of funds assigned to the school.

MAINTENANCE MANAGER

EXPENDITURE DECLARATION

- Submission of Expenditure Declaration Sheet.

- Re-depositing of funds that weren't used.

- Registering final oversight inform

EDUCATIONAL DISTRICT (UGEL) VALIDATION

- Approval of Expenditure Declarations

PRONIED FINAL VALIDATION

- Approval of Expenditure Declarations

- Revision of total funds assigned to the Maintenance Program

Stage 1: Preparation Stage 3: Evaluation Stage 2: Execution by maintenance manager

8

c. Semáforo Escuela

The Ministry of Education has created Semáforo Escuela (SE), or School traffic

light in its English translation, a management tool designed to improve the

management of education resources at the decentralized levels of the education

sector, such as the regional and district-level intermediary agencies by increasing

the quality of information used to generate public policies10.

SE aims to reach this objective by generating trustworthy and prioritized monitoring

indicators on the continuous progress of the educational services provided to

schools with a representative sample at the UGEL level11 and with a monthly

frequency. Table 1 presents the sample of schools visited during the year.

Table 1. Quantity of surveys collected by Semáforo Escuela by month

Month N %

March 243 1%

April 6,306 13%

May 6,309 13%

June 5,493 12%

July 6,314 13%

August 5,148 11%

September 5,758 12%

October 5,631 12%

November 6,260 13%

Total 47,462 100%

A total of 27,385 thousand public schools across the country were visited by SE

monitors at least once a year, for a total of 47,462 collected surveys12. The SE data

collection obtains detailed information about the quality of education by

administering a survey to each school principal and also collecting non self-

reported data through monitoring at the school premises.

The information that SE collects is used to generate monthly reports on the status

of these indicators and is shared with the intermediary agencies so that they have

10 The SE model is one of many initiatives that were designed based on Prime Minister Delivery Unit System implemented in the UK, launch in 2001 to improve monitoring of public service targets; In 2015, the Pakistani government also followed suit and launch the Prime Minister’s Delivery Unit 11 Peru has a decentralized educational system and there are 223 UGELs, Local Units of Education Management, the lowest level of autonomous intermediary agencies that receive central government education funding for distribution and spending. UGELs on average have 200 schools under their jurisdiction throughout 1-5 districts. 12 Some schools were visited more than once because of their size or because UGELs had too few schools in total. In Table A.1 we present the number of visits to each school by SE. Nearly half of the sample was visited only once.

9

to capacity to identify shortcomings and present proposals to improve programs

and/or projects.

Semáforo Escuela collects information on:

a) The internal management of the school.

b) Minimum inputs for the development of educational services.

c) Attendance of the school principal, teachers and students.

Procedures and Data Flow:

- The surveyors have a tablet with Internet access where they record

information from visited schools.

- The information is processed in real time in the central database of the

Ministry of Education.

- The information obtained during each visit is used to generate monthly

reports and sent to all Regional Education and UGELs in the country.

One of the many variables collected by SE is the adequate fulfilment of the

administrative processes of the Maintenance Program of PRONIED. In this

particular case, each surveyor asks school maintenance managers to report which

processes of the program (listed in Figure 1) have been completed to date. At the

moment of the survey, the SE surveyor has no way to verify these self-reported

responses. Additionally, there are no tangible consequences for the school

manager should he or she lie, a common feature for primary collected and self-

reported survey data.

d. Explaining the SMS PRONIED experiment

The PRONIED program, complete with a database of over 30,000 cell phone

numbers for school maintenance managers nationwide as well as the smoothly-

operating Wasichay database system, is an ideal implementing office for rigorously

testing effects of innovations to policy. As such, PRONIED agreed to cooperate

with the impact evaluation of its SMS campaign to maintenance managers.

In order to do so, PRONIED allowed the design of the SMS campaign to include a

pure control group and well as four treatment groups, each delivering a different

message. Considering the evidence produced by similar interventions (see: Karlan,

2011, 2012, Fink et al 2014, Chong et al, 2013, Castleman and Lindsay, 2014), the

treatment consisted of five rounds of SMS, the first two with a biweekly frequency

and the last three were weekly. The last SMS was delivered the week before the

submission date for declaration of expenditures. All SMS were personalized with

the maintenance manager´s first name.

10

In 2015, the PRONIED Maintenance program distributed maintenance funds for

59,700 schools nationwide in Peru. All of these schools were assigned a budget

and a school maintenance manager. The sample of schools considered for the

original RCT evaluation of the SMS Campaign excludes a fraction of these schools

from the experiment for two reasons. First, for 18,598 of these maintenance

managers, there was no registered cellphone. Additionally, 11,224 schools that

had already completed their expenditure declaration at the beginning of the

experiment were also excluded. Therefore, the final sample of the RCT experiment

to evaluate the impact of the SMS campaign consisted of 29,878 schools, nearly

half of the total coverage of the program.

In Table A.2 we present the means of these three groups in order to understand

the distinction between the final experiment sample and the group of schools

excluded from the experiment for the aforementioned reasons. Schools excluded

due to unavailable cellphone numbers for maintenance managers have a lower

level of compliance for maintenance activities than the experiment sample. Schools

excluded because expenditure declarations were complete prior to the experiment

start have higher levels of compliance.13

In graph A.1A an A.1B, in the Annex, we can see how these groups compare in

their fulfilment of the two most important program activities: submission of the

expenditure planning sheet and submission of the expenditure declaration. In the

graph, the two vertical red dashed lines identify the beginning and the end of the

experiment. For submission of the expenditure planning sheet, the average

compliance rate for the RCT experiment sample is between that of the two

excluded groups. In the case of the submission of expenditure declaration, the

experiment sample had 0% before the start of the experiment and the sample

excluded due to early completion of expenditure declaration had 100% compliance

rate during the entire duration of the experiment.

The SMS message itself has an informational component (e.g. the deadline for

perform an activity) and content applying principles of behavioral economics. In

order to evaluate the impact of the SMS messages, where all maintenance

managers in the final sample were randomly assigned to six different groups:

- Group 1: Alert - Group 2: Social Norm - Group 3: Monitoring with detailed information - Group 4: Notice of public reporting of non-compliant schools - Group 5: Reminder of potential auditory visits - Control: Did not receive SMS.

13 This could be because managers without cellphone could be poorer or be living or working in rural areas where there is lower access to mobile technology. In contrast, the main reason for the difference in the second group might be that these managers are likely to be more motivated.

11

We briefly summarize the experiment sample and design on Figure 2.

Figure 2. Brief description of PRONIED SMS Experiment

e. Description of the Experiment Sample:

For our analysis, we combine the survey information collected by Semaforo

Escuela with the administrative data of the maintenance program collected on the

Wasichay System. For both datasets we have the same indicator: compliance on

the submission of the expenditure planning, a document that maintenance

12

managers are required to submit detailing the budget planification for maintenance

activities in the school.

We matched schools both in the Wasichay and Semáforo Escuela database, since

there is information on Wasichay system for 26,544 of 27,385 schools visited at

least once by SE (about 97% of the total sample). The 26,544 schools matched in

both databases represents about 45% of the total of schools participating in the

PRONIED maintenance program.

From this total number of schools included in the maintenance program and

present in the Wasichay database, about half were excluded for reasons already

mentioned in the previous section d. detailing the original RCT experiment. For

these reasons, of the 26,544 matched schools, only 14,931 belonged to the SMS

experiment sample. We briefly summarize our sample of analysis on Figure 3.

Figure 3. Description of sample of analysis

Total schools

participating in the

maintenance national program

(N=59,704)

Included in Semaforo Escuela

Sample N=26,544

45% from total

Included in SMS PRONIED campaign

N=14,931 55% from sample on

Semaforo and Wasichay

T

Treated N=10,803 5 arms of treatment

C Control

N=4,128

Not included in SMS PRONIED campaign

N=11,613 45% from sample on

Semaforo and Wasichay

Not included in Semaforo Escuela

sample N=33,160

55% from total

In Table A3, we present descriptive school-level characteristics for our analysis

sample including: Average number of students, average number of classrooms,

gender of maintenance manager, geographic location of school (rural/urban) ,

school educational level (primary or secondary), native language spoken at school,

13

among others. We can see there are few differences between the full matched

sample between Semaforo and Wasichay (1) and the RCT subset (2).14

We compare the measurement for this indicator in Wasichay (our objective

benchmark of the accurate information) on the same day when Semáforo Escuela

visited the school and collected the data. In Table 2 we can see how this indicator

varies depending on the source of the data:15

Table 2. Comparison between Wasichay and Semaforo Escuela answers on "Registration of

Technical Sheet"

Panel A: Complete match between Semaforo Escuela and Wasichay

Semaforo Escuela Survey

Frequencies Percentages

No Yes Missing

Doesn't know

Total No Yes Missing Doesn't know

Total

Wasichay System

No 11,144 4,078 189 176 15,587 32.27 11.81 0.55 0.51 45.14

Yes 2,319 16,381 150 94 18,944 6.72 47.44 0.43 0.27 54.86

Total 13,463 20,459 339 270 34,531 38.99 59.25 0.98 0.78 100.00

Panel B: Only matched sample included on the SMS experiment

Semaforo Escuela Survey

Frequencies Percentages

No Yes Missing

Doesn't know

Total No Yes Missing Doesn't know

Total

Wasichay System

No 6,342 2,338 102 92 8,874 31.86 11.75 0.51 0.46 44.59

Yes 1,397 9,494 89 49 11,029 7.02 47.70 0.45 0.25 55.41

Total 7,739 11,832 191 141 19,903 38.88 59.45 0.96 0.71 100.00

For this comparison, our sample includes the total number of visits SE performed for each

school. This maximizes the number of observations in the analysis we use, as often

schools were visited more than once in the mentioned period. Additionally, as will be

explained in the following section, our econometric model makes considers them different

observations.

14 These differences can be misleading since, as seen in sub-section 2.c, two samples were excluded from the experiment that appear to be statistically different; However, since for one group the average compliance was higher and for the other excluded group it was lower, the mean of the experiment sample appears similar to that of the original sample although it is not. 15 This comparison is robust to changes in the specification of the variable in Wasichay. For example, we create an additional indicator of compliance on the submission of the expenditure planning sheet within a two-week window around the visit of Semaforo Escuela. In other words, we considered that a school had similar answers if the answers were similar on both data sources and if the Wasichay data was within a two weeks window around the visit of SE

14

4. METHODOLOGY

We estimate the effect of the SMS campaign treatment using a Difference-in-

Difference (DD) estimation. In other words, we will compare the change in average

treatment effect in schools with the change in average compliance of the

submission of expenditure planning sheet after the SMS campaign started.

Equation 1: itXittiDDtiit XPostSMSPostSMSy *210

Where i denotes a school and t the day when the surveyor visited the school.

SMS is an indicator variable for whether the school was selected to the SMS

campaign. Post is a binary variable that takes the value of zero before the SMS

campaign launch and one after the campaign launch. The interaction between

SMS and Post is an indicator of the schools selected for SMS campaign after its

beginning. The coefficient βDD is the main coefficient of interest, and gives the

average difference of the difference before and after the SMS campaign start

between treatment and control groups.

We denote Xit as the set of covariates that are correlated with the outcome or with

the treatment selection, school district fixed effects, and week fixed effects. The

complete list of control variables and the main descriptive statistics can be found in

Table A3.

We consider this approach to be the most appropriate since Semáforo Escuela is

an unbalanced data panel for which we have information from schools during the

entire year. This method allows a more efficient use of the data since it includes

the observations prior to the beginning of the campaign.16

We define the survey data sample as the set of schools for which there is data in

the Wasichay administrative database and for which SE collected data at least

once throughout the 2015 school year. Using the survey data sample, we perform

a multivariate regression analysis comparing the results on the average effect of

treatment and control group schools using for models of data sets.

The four models of data sets are:

i) Full information: Uses the data of the entire sample of 29,878 schools

represented in the Wasichay database included in the experiment.

ii) Full information for survey data sample: For the 14,931 schools that have

been surveyed at least once in Semaforo Escuela and are in the SMS, we will use

Wasichay data.

16 We use a missing indicator to handle the small number of missing observations on covariates.

15

iii) Corrected information on survey data sample: For each school in the survey

data sample and in the experiment, we will use the Wasichay administrative data

registered for the day that Semaforo visited the school. In other words, we will

simulate what the data would look like had Semaforo Escuela collected the

administrative data of Wasichay.

iv) Self-reported data sample: This model is limited to Semaforo Escuela, and

does not use the administrative data at all. In other words, this scenario reflects the

analysis methods that RCTs traditionally follow.

We summarize the most important features of the four datasets in Table 3. As can

be seen in this table, moving from Scenarios 1 to 4 signifies a decrease in quantity

of data as well as a decrease in precision of measurement of the indicator. In the

table, we see that for the first Scenario, we have complete information for the total

of schools, for all school days.

For example, when we go from Scenario 1 to Scenario 2, we change from using

the whole universe to using a survey sample (in this case, a large survey sample);

therefore, we change our estimation results only because of choosing a sample for

the survey.

Moving from Scenario 2 to Scenario 3, we change a time-sensitive gradient: as

data for schools is collected on different days, the timeframe becomes more

ambiguous and comparability between treated and control groups is weakened.

Finally, moving from Scenario 3 to Scenario 4, we add the most interesting feature

of survey-collected data, which is that the outcome information obtained is self-

reported. Therefore, a risk of misreported information (error in data recollection

processes, respondent-level bias such as social desirability, etc.) is introduced,

permitting possible attenuation bias on the estimation.

16

Table 3. Description of datasets scenarios

Scenarios Sample Time accuracy

Data source

Description

Full information

All the sample included on the experiment

All schools have complete data for established timeframes

Admin. Data

We use all schools included in the original RCT experiment using the data from prior to the beginning of the experiment and from the day after the deadline for completing the maintenance activities

Full information for survey data sample

Only a sample (45%)

All schools have complete data for established timeframes

Admin. Data

We use all schools included in the experiment that were at least surveyed once on the Semaforo Survey using the data prior to the beginning of the experiment and from the day after the deadline for completing the maintenance activities.

Corrected information on survey data sample

Only a sample (45%)

Data for each school available for different days

Admin. Data

The scenario of a "quality survey". We use the information on Wasichay that would have been collected on the day of the visit of Semaforo, some before the experiment began and some after its start. This means that we are in the survey scenario but only changing the data source for the outcome variable.

Self-reported data sample

Only a sample (45%)

Data for each school available for different days

Self-reported

The traditional scenario. Only self-reported data on surveys collected on different days, some before the experiment begins and some after the experiment begins.

17

5. RESULTS

Graph 1 illustrates the key results of this study. Here, we present the point

estimates of the same Difference in Difference estimation, presented on the

previous section, for the four data scenarios. As was mentioned above, in Scenario

1 we have full information and advantages of administrative data, whereas in

Scenario 4 we only have survey data (the traditional scenario).

Graph 1. Results of the DD impact estimator depending on the data source

Note: See Table 4 for full results. 90% Confidence intervals.

In the first three scenarios, the SMS campaign point estimate of the impact

changes very little, and remains approximately at 2.5 percentage points increase in

probability of compliance with the submission date for the expenditure planning

sheet.

As can be seen in Graph 1, from Scenario 1 to 2 the confidence intervals of the

point estimates increase, accompanied by minor changes in the point estimates.

Similarly, from Scenario 2 to Scenario 3, we see a slight increase in the point

estimate and almost no change in the size of the confidence intervals.

The most interesting result is in the progression from Scenario 3 to Scenario 4,

where the point estimate suffers a drop in magnitude and now is not statistically

different from zero. The only difference between these scenarios is the source of

the data. In scenario 3, we have all the characteristics of a survey data collection,

but the outcome variable is based on the administrative data from the date of the

-10

12

34

Estim

ate

d im

pa

ct o

f S

MS

Cam

paig

n

Scenario 1 Scenario 2 Scenario 3 Scenario 4

18

SE visit. One possible explanation for the change from Scenario 3 to 4 is the

measurement error that can be attributed to the survey recollection.

In Graph 2, we present the coefficient of variation of the point estimates presented

in Graph 1. Graph 2 shows that the coefficient for Scenario 4 is significantly larger

than for the previous three Scenarios, indicating that this estimator is very likely to

be imprecise.

Graph 2. Coefficient of Variation of the DD impact estimator depending on

the data source

Source: Estimations by the case study research team

Finally we present our four estimations of the impact of the SMS campaign each in

three different models with different sets of covariates. The first column of results

for each scenario presents the DD model without any control variables, while the

second column presents the results only including UGEL-level controls. Finally, the

third column presents the results including all control variables presented in Table

A3. Our point estimates in the four scenarios are almost unchanged between

models, demonstrating that our results are robust to the model specification.

0.2

.4.6

Co

effic

ient o

f V

ari

atio

n


19

Table 4. Difference in Difference estimation for the four datasets scenarios


(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

DD Impact estimator

2.58*** (0.73)

2.58*** (0.7)

2.58*** (0.7)

2.39** (0.99)

2.39*** (0.91)

2.39*** (0.91)

2.46** (1.16)

2.64*** (0.96)

2.65*** (0.96)

1.58 (1.4)

1.09 (1.07)

1.16 (1.07)

Average increase after treatment starts

14.42*** (0.62)

14.42*** (0.59)

14.42*** (0.59)

14.32*** (0.85)

14.32*** (0.77)

14.32*** (0.77)

39.29*** (0.99)

2.36 (1.9)

2.34 (1.91)

4.56*** (1.2)

-1.36 (1.81)

-1.17 (1.82)

Control mean before treatment starts

72.24*** (0.49)

73.98*** (0.68)

47.01*** (0.79)

51.6*** (0.8)

N 59,756 29,864 26,354 25,486

Educational Districts fixed effects

No Yes Yes No Yes Yes No Yes Yes No Yes Yes

Control variables No No Yes No No Yes No No Yes No No Yes

Note:

20

6. CONCLUSIONS

This case study clearly illustrates a case where measurement error is present in

the results of an RCT impact evaluation. As can be seen in Table 4, the impact

estimator drops from Scenario 1 to 4 and ceases to be statistically different from

zero. In addition, this is accompanied by an increase of 100 percent in the

coefficient of variation of the point estimate. We can interpret the differences in the

resulting effects on the expenditure declaration submission dates between

Scenarios 3 and 4 as being due to the measurement error present in the survey

data collection. This is most likely due to several sources of attenuation bias

associated with the self-reported data collection methods. This unique comparison

was possible because the results of an RCT were available, as well as both

administrative and self-reported data for the same primary outcomes with

comparable timing.

The policy implications of these results are not negligible. Wherever possible, the

benefits in precision gained in calculating outcome indicators highlights the

importance of using administrative data. Additionally, being able to identify how

measurement error affects effect estimates may also affect the cost-effectiveness

aspect of policies, and ultimately the direction and design of development policy.

Finally, it is important to clarify that this study not be interpreted as a motion to

replace or reduce primary data collection of self-reported data. We are aware that

there are several important areas of research where administrative data is not

available. For these cases, we hope this paper at least provides guidance and

awareness of how to mitigate measurement error risk. For areas where

administrative data is available, it is a highly informative and inexpensive practice

to perform a complementary analysis of results.

21

7. REFERENCES

Baird, S., Chirwa, E., McIntosh, C., and Özler, B. (2012). Examining the reliability of self-reported data on school participation. Journal of Development Economics, 89-93.

Barrera-Osorio, F., Bertrand, M., Linden, L., and Perez-Calle, F. (2011). Improving the design of conditional transfer programs: Evidence from a randomized education experiment in Colombia. American Economic Journal: Applied Economics, 167-195. Beegle, K., De Weerdt, J., Friedman, J., and Gibson, J. (2012). Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics, 98(1), 3-18. Bertrand, M., and Mullainathan, S. (2001). Do people mean what they say? Implications for subjective survey data. The American Economic Review, 91(2), 67-72. Castleman, B., and Lindsay, P. (2014) Working Paper, Summer Nudging: Can Personalized Text Message and Peer Mentor Outreach Increase College Going Among Low income High School Graduates? EdPolicyWorks. Chong A, Karlan D, Shapiro J, and Zinman J. (2013) (Ineffective) Messages to Encourage Recycling: Evidence from a Randomized Evaluation in Peru. World Bank Working Paper #6548, July 2013. Das, J., Hammer, J., and Sánchez-Paramo, C. (2012). The impact of recall periods on reported morbidity and health seeking behavior. Journal of Development Economics, 98(1), 76-88. Duflo, E., Glennerster, R., and Kremer, M. (2007). Using randomization in

development economics research: A toolkit. Handbook of Development

Economics, 4, 3895-3962.

Feeney, L., Bauman, J. and Chabrier, J. (2015).Using administrative data for

randomized evaluations. J-PAL North America, Cambridge, MA.

Fink, G., Lanthorn, H., Raifman, J., and Rokicki, S. (2014) The Impact of Text

Message Reminders on Adherence to Antimalarial Treatment in Northern Ghana:

A Randomized Trial. Published: October 28, 2014

Finkelstein, A., and Taubman, S. (2014). Using randomized evaluations to improve

the efficiency of US healthcare delivery. J-PAL North America, Cambridge, MA.

Karlan, D., McConnell, M., Mullainathan, S., and Zinman J. (2011). “Getting on the

Top of Mind: How Reminders Increase Savings," Working Paper. January 2011.

22

Karlan, D., Morton, M., and Zinman, J. (2012) “A Personal Touch: Text Messaging

for Loan Repayment.” Working Paper, February 2012.

McKenzie, D. (2012). Beyond baseline and follow-up: The case for more T in

experiments. Journal of Development Economics, 99(2), 210-221.

Meyer, B., and Mittag, N. (2015). Using linked survey and administrative data to

better measure income: Implications for poverty, program effectiveness and holes

in the safety net (No. w21676). National Bureau of Economic Research.

Millimet, D. (2010). The Elephant in the corner: A cautionary tale about Measurement Error in Treatment Effect Models. (No. 5140). IZA Discussion Paper Series. Stecklov, G., and Weinreb, A. (2010). Improving the quality of data and impact-

evaluation studies in developing countries. Inter-American Development Bank.

Taubman, S., Allen, H., Wright, B., Baicker, K., and Finkelstein, A. (2014). Medicaid increases emergency-department use: evidence from Oregon's Health Insurance Experiment. Science, 343(6168), 263-268. Zwane, A. P., Zinman, J., Van Dusen, E., Pariente, W., Null, C., Miguel, E., Kremer, M., Karlan, D.S., Hornbeck, R., Giné, X. and Duflo, E. (2011). Being surveyed can change later behavior and related parameter estimates. Proceedings of the National Academy of Sciences, 108(5), 1821-1826.

23

8. ANNEX

Table A1. Number of schools surveyed in Semaforo Escuela by number of visits during the year

Number of visits N %

Only once 14,021 51%

Two 6,756 25%

Three 6,506 24%

Four or more 102 0%

Total 24,061 100%

24

Table A2. Descriptive stats on Wasichay System for the sample included and excluded from SMS PRONIED experiment

Full Sample included on maintenance program Sample excluded because maintenance

manager did not have cellphone Sample excluded because they finish all maintenance activities before 13/08 Sample included on the SMS PRONIED

experiment

N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max Number of classrooms in the school

59,700 6.5 7.6 0 86 18,598 5.9 7.9 0 86 11,224 5.9 7.0 0 69 29,878 7.1 7.6 0 83

Mount assigned for maintenance (Nuevos Soles)

59,700 8,596 8,872 0 30,000 18,598 7,903 8,737 4 30,000 11,224 7,700 8,301 2,000 30,000 29,878 9,364 9,094 0 30,000

% Form maintenance committee

59,700 0.95 0.21 0 1 18,598 0.91 0.29 0 1 11,224 1.00 0.00 1 1 29,878 0.97 0.18 0 1

% Form oversight committee 59,700 0.95 0.21 0 1 18,598 0.91 0.29 0 1 11,224 1.00 0.00 1 1 29,878 0.97 0.18 0 1

% Submission Expenditure Planning

59,700 0.94 0.23 0 1 18,598 0.89 0.31 0 1 11,224 1.00 0.06 0 1 29,878 0.96 0.20 0 1

% Register commitment act 59,700 0.91 0.28 0 1 18,598 0.86 0.35 0 1 11,224 0.97 0.17 0 1 29,878 0.93 0.26 0 1

% Register expenditure declaration

59,700 0.94 0.24 0 1 18,598 0.88 0.33 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1

% Register final oversight inform 59,700 0.55 0.50 0 1 18,598 0.52 0.50 0 1 11,224 0.60 0.49 0 1 29,878 0.55 0.50 0 1

% Form maintenance committee on time

59,700 0.93 0.25 0 1 18,598 0.86 0.35 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1

% Form oversight committee on time 59,700 0.93 0.26 0 1 18,598 0.85 0.35 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1

% Timely Submission Expenditure Planning

59,700 0.86 0.35 0 1 18,598 0.77 0.42 0 1 11,224 0.95 0.21 0 1 29,878 0.87 0.33 0 1

% Register commitment act on time

59,700 0.82 0.38 0 1 18,598 0.73 0.44 0 1 11,224 0.93 0.26 0 1 29,878 0.84 0.37 0 1

% Register expenditure declaration on time

59,700 0.77 0.42 0 1 18,598 0.64 0.48 0 1 11,224 1.00 0.00 1 1 29,878 0.77 0.42 0 1

% Register final oversight inform on time

59,700 0.12 0.32 0 1 18,598 0.11 0.31 0 1 11,224 0.20 0.40 0 1 29,878 0.09 0.28 0 1

25

Table A3. Descriptive Statistics

Full Sample Semaforo Escuela matched with Wasichay

Full Sample Semaforo Escuela matched with Wasichay that were included on the SMS experiment

Control Group Treatment Group

N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max

Maintenance Program manager characteristics % Male 43,655 0.6 0.5 0 1 25,400 0.7 0.5 0 1 6,975 0.7 0.5 0 1 18,425 0.7 0.5 0 1 % Hired with temporal contract 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1

% Hired with permanent contract 45,827 0.9 0.3 0 1 26,354 0.9 0.2 0 1 7,275 0.9 0.3 0 1 19,079 0.9 0.2 0 1

% Hired with other type of contracts 45,827 0.0 0.1 0 1 26,354 0.0 0.1 0 1 7,275 0.0 0.1 0 1 19,079 0.0 0.1 0 1

Amount assigned to maintenance (Nuevos Soles)

45,827 15,844 10,055 0 30,000 26,354 16,049 9,891 0 30,000 7,275 15,942 9,962 3,507 30,000 19,079 16,089 9,864 0 30,000

School Characteristics from Semaforo Escuela % where surveyed report being the school principal

45,827 0.9 0.3 0 1 26,354 0.9 0.3 0 1 7,275 0.9 0.4 0 1 19,079 0.9 0.3 0 1

% that report that students speak native languages

45,755 0.1 0.3 0 1 26,317 0.1 0.3 0 1 7,267 0.1 0.3 0 1 19,050 0.1 0.3 0 1

% that report that students speak spanish only

45,755 0.9 0.3 0 1 26,317 0.9 0.3 0 1 7,267 0.9 0.3 0 1 19,050 0.9 0.3 0 1

% multigrade 45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0.1 0.3 0 1 19,079 0.1 0.3 0 1 % that have only on teacher for all the grades on the school

45,680 0.2 0.4 0 1 26,274 0.2 0.4 0 1 7,256 0.2 0.4 0 1 19,018 0.2 0.4 0 1

% that have at least a teacher for each grade (regular school)

45,680 0.7 0.5 0 1 26,274 0.7 0.5 0 1 7,256 0.7 0.5 0 1 19,018 0.7 0.5 0 1

% that are only for girls 45,680 0.1 0.3 0 1 26,274 0.1 0.3 0 1 7,256 0.1 0.3 0 1 19,018 0.1 0.3 0 1

% that are only for boys 45,808 0.0 0.1 0 1 26,344 0.0 0.1 0 1 7,274 0.0 0.1 0 1 19,070 0.0 0.1 0 1 % that allows both girls and boys 45,808 1.0 0.1 0 1 26,344 1.0 0.1 0 1 7,274 1.0 0.1 0 1 19,070 1.0 0.1 0 1

% managed with some degree of collaboration with the private sector

45,827 0.0 0.2 0 1 26,354 0.0 0.1 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.1 0 1

26

Table A3. Descriptive Statistics

Full Sample Semaforo Escuela matched with Wasichay

Full Sample Semaforo Escuela matched with Wasichay that were included on the SMS experiment

Control Group Treatment Group

N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max % managed solely by the public sector 45,827 1.0 0.2 0 1 26,354 1.0 0.1 0 1 7,275 1.0 0.2 0 1 19,079 1.0 0.1 0 1

% on geographic region Coast 45,818 0.3 0.5 0 1 26,353 0.3 0.5 0 1 7,275 0.3 0.5 0 1 19,078 0.3 0.5 0 1

% on geographic region Highlands 45,818 0.2 0.4 0 1 26,353 0.2 0.4 0 1 7,275 0.2 0.4 0 1 19,078 0.2 0.4 0 1

% on geographic region Jungle 45,818 0.5 0.5 0 1 26,353 0.5 0.5 0 1 7,275 0.5 0.5 0 1 19,078 0.5 0.5 0 1

% on rural areas 45,827 0.5 0.5 0 1 26,354 0.5 0.5 0 1 7,275 0.5 0.5 0 1 19,079 0.5 0.5 0 1 % of initial educational level 45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0.1 0.3 0 1 19,079 0.1 0.3 0 1

% of primary educational level 45,827 0.6 0.5 0 1 26,354 0.6 0.5 0 1 7,275 0.6 0.5 0 1 19,079 0.6 0.5 0 1

% of secondary educational level 45,827 0.3 0.5 0 1 26,354 0.3 0.5 0 1 7,275 0.3 0.5 0 1 19,079 0.3 0.5 0 1

Size of the School

Number of students 45,827 186.6 242.5 0 2,932 26,354 186.3 239.6 1 2,932 7,275 182.4 228.2 1 2,285 19,079 187.9 243.8 1 2,932

Number of teachers 45,827 10.6 12.3 0 170 26,354 10.7 12.1 1 170 7,275 10.6 12.0 1 120 19,079 10.7 12.2 1 170

Number of sections 45,827 8.9 7.5 0 96 26,354 8.9 7.5 1 96 7,275 8.8 7.3 1 70 19,079 8.9 7.6 1 96

Number of classrooms 45,827 12.1 9.1 0 86 26,354 12 9 0 83 7,275 12 9 0 62 19,079 12.1 8.6 0 83

Important Educational or Social Programs Percentage of schools beneficiary of Educational Program "Jornada Escolar Completa"

45,827 0.0 0.2 0 1 26,354 0.1 0.2 0 1 7,275 0.1 0.2 0 1 19,079 0.0 0.2 0 1

% of schools on VRAEM region 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1

% of schools on the frontier 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1

% of schools beneficiary of Educational Program "Soporte Docente"

45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0 0 0 1 19,079 0.1 0.3 0 1

27

Graph A1.A. Evolution on the % of maintenance managers who submitted their

Expenditure Planning on Wasichay system during year 2015 for the SMS experiment

sample and excluded groups

Graph A1.B. Evolution on the % of maintenance managers who submitted their

Expenditure Declaration on Wasichay system during year 2015 for the SMS

experiment sample and excluded groups

020

40

60

80

10

0

Evolu

tio

n o

f m

ain

tenan

ce m

an

age

rs w

ho

de

live

red

the

ir E

xpe

nditu

re P

lannin

g

April 6 April 23 May 14 June 4 June 25 July 16 Aug 13 Sept 3 Sept 24 Oct 15 Nov 5 Nov 26

Week

Excluded - No cellphone Excluded - Already completed ED

SMS Experiment sample

020

40

60

80

10

0

Evolu

tio

n o

f re

spo

nsib

les w

ho d

eliv

ere

d t

he

ir

Expe

nd

iture

De

cla

ration

April 6 April 23 May 14 June 4 June 25 July 16 Aug 13 Sept 3 Sept 24 Oct 15 Nov 5 Nov 26

Week

Excluded - No cellphone Excluded - Already complete ED

SMS experiment sample

what are the limitations of survey data versus...

Documents