what are the limitations of survey data versus...
TRANSCRIPT
1
What are the limitations of survey data versus
administrative data in impact evaluation?
The case of an SMS campaign in Peru
Preliminary Version 13/06/2016 (do not cite or reproduce)
César Huaroto (Universidad Nacional de la Plata, Argentina)1
Andrea Cornejo (Columbia University)
Luis Baiocchi (Pontifical Catholic University of Peru)2
Abstract
This paper studies the effect that measurement error of survey data has on the precision
of results of a rigorous impact evaluation, also known as attenuation bias. This case study
compares the results of an RCT that evaluates the impact of an SMS campaign to improve
maintenance budget expenditure, when the outcome variable is measured using
administrative data and survey self-reported data.
Four important conditions are met that allow this study to take place: 1) a randomized
control trial (RCT), 2) Both administrative and survey data collected the same indicator, 3)
Impact of the treatment (different from zero), 4) Comparable timing on both sources of
data.
The comparison between self-reported and administrative data for the outcome variable
for the same schools reveals over-reporting of the outcome variable on the former, where
an activity is reported as being completed when in fact it has not been, according to the
administrative data. This attenuation bias leads to an underestimation of the impact of the
intervention tested by the RCT, an SMS campaign.
These results are consistent with those of previous studies on the subject, and suggest
that impact evaluation results can be negatively affected by attenuation bias. Furthermore,
they indicate that the reason behind this is not related merely to sampling bias, but also to
the quality of the self-reported data collected using surveys. In a context where the vast
majority of impact evaluations rely on primary data collection, these results highlight the
benefits of using administrative data for impact evaluation analysis.
1 Corresponding author: [email protected]. 2 The authors are part of a new initiative within the Ministry of Education of Perú (MINEDU) called MineduLab, a laboratory dedicated to performing impact evaluations of policy innovations using Randomized Control Trials (RCTs) and relying on mainly administrative data. We would like to thank the Principal Investigators of the original RCT impact evaluation of the PRONIED SMS campaign, Juan Manuel Hernandez-Agramonte, Stanislao Maldonado, and Andrew Dustan, for sharing the details of the RCT and for their feedback on this document. We also thank all the different actors within MINEDU involved in different stages of the experiment who shared their data and knowledge about how data was generated (both for Wasichay system and Semaforo Escuela survey) and how the PRONIED maintenance program worked. The opinions presented in this paper are entirely those of the authors, and are not endorsed by MINEDU.
2
1. INTRODUCTION
An important limitation of self-reported data is that researchers cannot identify or
control for all possible sources of measurement error that can be introduced during
field work that can affect precision and, therefore, accuracy of the impact
evaluation results (see Stecklov and Weinreb, 2010). Moreover, research tends to
assume that this error is distributed randomly between treatment and control
groups or that this issue is easily solved only by increasing the number of
observations (Millimet, 2010, Bertrand and Mullainathan, 2001).
Nevertheless, in the past decade rigorous impact evaluations have come to rely
strongly upon the primary data collection, which often overlooks non-sampling error
sources (ie. respondent bias, cognitive errors). During this time, impact evaluations
in development studies have acquired the common practice to perform data
collection and analysis in many developing countries (McKenzie, 2012, Duflo,
Kremer & Glennerster, 2007).
This case study allows the comparison between the use of self-reported3 and
administrative data to calculate the outcome variable for the same schools. The
analysis reveals that self-reported surveys tend to over-report compliance of the
outcome variable. In other words, we see that an activity is reported as being
completed when in fact it has not been. Consequently, we observe an
underestimation of the impact of the SMS campaign.
The results presented in this paper are consistent with those of previous studies on
the subject, and suggests that impact evaluation results could be negatively
affected by attenuation bias. Furthermore, they indicate that the reason behind this
is not merely related to sampling bias, but also to the quality of the self-reported
data collected using surveys.
The organization of this paper is as follows: Section 2 will first review contributions
of recent literature to the discussion of measurement error and attenuation bias,
identifying key pitfalls of self-reported data for estimating effects; Section 3 will
describe the case study, the administrative and self-reported databases used for
analysis, and the original design of the RCT; Section 4 details the methodology of
the comparison analysis of measurement error; Section 5 summarizes the findings
and consolidates key conclusions.
3 For purposes of this paper, we consider the terms survey data and self-reported as interchangeable concepts
3
2. MOTIVATION
There is a scant but growing body of literature that has begun to explore the effects
of measurement error on impact evaluation analysis (Taubman et al. 2014, Milimet,
2010, Zwane et al. 2011). For instance, in 2001 Bertrand and Mullainathan
discussed the potential of self-reported results to introduce measurement error that
correlates with important characteristics and behaviors has become more
prominent over the past few years.
Measurement error itself can be broken down into several types and can be found
at all levels of the data collection process. In 2010, Stecklov and Weinreb
performed an extensive review of the major sources of measurement error beyond
sampling and coverage error, including error due to respondents and interviewers,
comparability effects and post-survey errors.
These sources of attenuations bias deeply affect the precision of impact evaluation
results in the context of public policy, as has been demonstrated by Barrera-Osorio
et al. (2011), and Baird and Ozler (2011), that identify an overstatement of the
school attendance rates. More recently in 2014, Taubman et al. also find that there
is statistically significant discrepancies for the impact of Medicaid coverage on
emergency care usage, mentioning that using administrative data for RCTs as
opposed to traditionally collected self-reported data can be crucial for precision of
impacts4.
However, there have been few case studies on the effect that measurement error
of survey data has on the precision of results of a rigorous impact evaluation
(Taubman et al. 2014, Beegle et al. 2010, Millimet, 2010). This is largely because
performing this study requires that at least four important conditions are met: 1) A
completed randomized control trial (the gold standard of impact evaluations), 2)
Administrative and survey data available for the same indicator, 3) Impacts of the
RCT that are different from zero, 4) Comparable timing for both sources of data.
This study meets all of these requirements, allowing us to look compare the main
outcomes when both data sources are used.
In large part, the motivation behind this paper was to elucidate the advantages of
using administrative data, most of which are associated with a reduction in risk of
measurement error in primary data collection in several ways:
4 Self-reported data collection is usually conducted by an enumerator or surveyor (Baird and Özler, 2010, Barrera-Osorio et al. 2011). In the academic social sciences, it is performed internally - through a team of surveyors recruited specifically for the evaluation - or by subcontracting a local data collection firm. Surveyors receive training prior to initiating data collection, accompanied by a systematic monitoring and supervision throughout the process to assure quality data collection. These methods aims to secure high quality data collection by following a series of good practices - however they do not preclude the presence of attenuation bias
4
a) Systematic: The fact that administrative database by nature are systematic
increases the likelihood that procedural issues of data collection have been
identified and resolved. Data collected digitally can also help avoid post survey
errors that are associated to manual data entry.
b) Non-personal: Databases that electronically register and upload indicators over
time do not require an interviewed person to self-report results to a surveyor,
reducing both respondent and interviewer bias; additionally, recurring visits by
surveyors may also alter the behavior being measured and risk underestimating
effects.
c) Periodical: A continuous collection of the same indicator over a prolonged
period of time permits a time-sensitive analysis, which in turns allows for better
identification of several sources of attenuation bias, such as recall bias (Das et
al, 2011)
These comparative benefits to utilizing administrative data to calculate and validate
effects of rigorous impact evaluations are important considerations that can
improve precision of results that shape policy design (Feeney et al. 2015).
Furthermore, as it becomes clear that administrative data is both less costly and
more reliable to evaluate impact, more and more countries are strengthening
systematic and reliable data systems (Meyer and Mittag, 2015, Feeney et al. 2015,
Finkelstein and Taubman, 2015).
In parallel, institutions dedicated to generating to shape policy, such as the
Behavioral Insights Team and the Global Insights Initiative at the World Bank have
grown in number and visibility. These institutions are pushing the barriers of
traditional rigorous impact evaluations to generate a focus on low-cost and cost-
effectiveness evaluations that center on behavioral economics and administrative
data to measure impacts.
Several governments have followed suit, including the Social Behavioral Science
Team on the White House, as well as emerging nation initiatives such as
MineduLab, the Peruvian Ministry of Education initiative to generative evidence on
cost-effective education policy innovations. These initiatives seem to have
unlocked critical advantages to usage of administrative data beyond a reduction in
cost that translates into a larger sample size and robust results.
5
3. THE CASE STUDY SETTING
a. Antecedents
This paper takes advantage of a unique case study that allows a comparison
between an administrative data based analysis and primary data collection data on
the same exact indicators, over the same window of time. The case performs a
quantitative comparison that calculates the primary outcome indicator utilizing
administrative data as well as self-reported data from surveys in the context of an
RCT-designed impact evaluation on an SMS Campaign. The SMS campaign
consisted of reminder messages that were sent to almost 30,000 school
infrastructure maintenance managers5 (one per school) in Peru to improve the
infrastructure budget management indicators.
The administrative data system, called Wasichay, allows each school infrastructure
maintenance manager, hereafter simply called maintenance manager, to
electronically submit two required management documents: 1) An infrastructure
budget expenditure planning sheet, submitted early in the school year that details
how resources will be spent; 2) An expenditure declaration sheet, submitted near
the end of the school year that details how much and how assigned resources
were spent.
The second source of data is a large scale monitoring system launched by the
Ministry of Education of Peru, hereafter called Semáforo Escuela (SE). Each
month, the SE program sends a team of over 300 surveyors to visit a national
representative sample of approximately four thousand different schools. The
monitoring system gathers information on many indicators, a few of which address
administrative operations of the maintenance program.
The SE monitoring system has all the common features of normal surveys: they
are unannounced, guided by the interviewer, correspond to self-reported
information from school principals, and represent a very expensive sample design
(a large sample for which indicators are collected monthly throughout the school
year). In other words, it represents a much more effort-intensive counterfactual of
the type of primary data that would be used to measure the impact of the treatment
had the administrative database of Wasichay not been available.
The results tell an interesting story about the downfalls of using self-reported data
in contrast to administrative data. Comparing outcomes for the same schools, we
find that maintenance managers tend to over-report the timeliness with which they
5 The Peruvian National Program for Educational Infrastructure Maintenance designates one faculty member per public school to serve as the school infrastructure maintenance manager. Typically, he or she tends to be the school principal, though often it may also be a school teacher.
6
report their expenditure activities. There is also a non-trivial percentage of
maintenance managers that report not completing the activity when in fact they
have done so. More importantly, the findings of this study reveal that while there
are no statistically significant results of the SMS campaign when using the self-
reported SE data, the same analysis using administrative data from the Wasichay
database yields statistically significant results. These findings illustrate the costs of
attenuation bias for the case a public policy program.
b. The Maintenance Program and Wasichay system
The National Program for Educational Infrastructure´s (PRONIED) Maintenance
Program is an important intervention aimed at closing the gap in educational
infrastructure of public schools in Peru. In 2014, PRONIED´s Maintenance program
transferred S/ 571 million (US$ 197 million), and S/. 350 million (US$ 100 million)
in 2015 to more than 50,000 schools around the country.6 However, of these
assigned funds, only about 90% is withdrawn from official accounts and about 85%
of the withdrawn budget is submitted in a timely manner.7
The PRONIED Maintenance program works directly with the maintenance
manager of each school8, whom is responsible for the appropriate spending of the
allocated maintenance budget assigned to his/her school, within the given
timelines established by the PRONIED program. The manager is responsible for
organizing oversight committees, for planning the maintenance related activities,
withdrawing and making use of the allocated maintenance funds, and finally
declaring all the expenditures to PRONIED. A brief description of activities involved
in the Maintenance Program can be seen in Figure 1.
To manage and track the information regarding withdrawals and declaration of
expenditures, the PRONIED program manages a large database called Wasichay.
This database was designed so that maintenance managers can log in to their
account when connected to the internet, and update their progress regarding the
planning of maintenance activities throughout the school year as well as the
withdrawal, spending and declaration of the maintenance funds assigned to the
school. The Wasichay database therefore is a live database, with central servers at
the Ministry of Education in Lima, Peru.
6 Infrastructure maintenance funds are transferred to each school manager; managers are responsible for planning, executing and declaring expenses. Funds are monitored through the Wasichay database 7 The activities allowed this budget are: 1) Roof Reparations, 2) Repair of floors, 3) Sanitary bathroom repairs, 4) Repair of walls 5) Repair of doors, 6) Repair of windows 7) Repair of electrical installations, 8) Repair of school furniture, 9) Replacement of school furniture, 10) Painting of walls, 11) School supplies and equipment, and materials for educational use 8 Each public school in the education system has one official school maintenance manager that is in charge of receiving, appropriately executing and declaring the expenditures for these funds
7
Figure 1. Timeframe for PRONIED Maintenance Program9
9 UGELs are the Local Units of Education Management, the lowest level of autonomous intermediary agencies that receive central government education funding for distribution and spending. UGELs on average have 200 schools under their jurisdiction throughout 1-5 districts. In total there are 223 UGEL on the country.
PRONIED PREPARATION
- Selection of Schools- Selection of school maintenance managers
- Budget allocation.
- Preparation of expenditure guidelines (items and deadlines)
MAINTENANCE MANAGER
PREPARATION
- Forming the school maintenance committee
- Forming the oversight committee
- Submission of Expenditure Planning Sheet
EDUCATIONAL DISTRICT
(UGEL) AND PRONIED
VALIDATION
- Approval of Expenditure Planning Sheet by PRONIED program specialists
MAINTENANCE MANAGER
BUDGET EXECUTION
- Withdrawal of funds assigned to the school.
- Expenditure of funds assigned to the school.
MAINTENANCE MANAGER
EXPENDITURE DECLARATION
- Submission of Expenditure Declaration Sheet.
- Re-depositing of funds that weren't used.
- Registering final oversight inform
EDUCATIONAL DISTRICT (UGEL) VALIDATION
- Approval of Expenditure Declarations
PRONIED FINAL VALIDATION
- Approval of Expenditure Declarations
- Revision of total funds assigned to the Maintenance Program
Stage 1: Preparation Stage 3: Evaluation Stage 2: Execution by maintenance manager
8
c. Semáforo Escuela
The Ministry of Education has created Semáforo Escuela (SE), or School traffic
light in its English translation, a management tool designed to improve the
management of education resources at the decentralized levels of the education
sector, such as the regional and district-level intermediary agencies by increasing
the quality of information used to generate public policies10.
SE aims to reach this objective by generating trustworthy and prioritized monitoring
indicators on the continuous progress of the educational services provided to
schools with a representative sample at the UGEL level11 and with a monthly
frequency. Table 1 presents the sample of schools visited during the year.
Table 1. Quantity of surveys collected by Semáforo Escuela by month
Month N %
March 243 1%
April 6,306 13%
May 6,309 13%
June 5,493 12%
July 6,314 13%
August 5,148 11%
September 5,758 12%
October 5,631 12%
November 6,260 13%
Total 47,462 100%
A total of 27,385 thousand public schools across the country were visited by SE
monitors at least once a year, for a total of 47,462 collected surveys12. The SE data
collection obtains detailed information about the quality of education by
administering a survey to each school principal and also collecting non self-
reported data through monitoring at the school premises.
The information that SE collects is used to generate monthly reports on the status
of these indicators and is shared with the intermediary agencies so that they have
10 The SE model is one of many initiatives that were designed based on Prime Minister Delivery Unit System implemented in the UK, launch in 2001 to improve monitoring of public service targets; In 2015, the Pakistani government also followed suit and launch the Prime Minister’s Delivery Unit 11 Peru has a decentralized educational system and there are 223 UGELs, Local Units of Education Management, the lowest level of autonomous intermediary agencies that receive central government education funding for distribution and spending. UGELs on average have 200 schools under their jurisdiction throughout 1-5 districts. 12 Some schools were visited more than once because of their size or because UGELs had too few schools in total. In Table A.1 we present the number of visits to each school by SE. Nearly half of the sample was visited only once.
9
to capacity to identify shortcomings and present proposals to improve programs
and/or projects.
Semáforo Escuela collects information on:
a) The internal management of the school.
b) Minimum inputs for the development of educational services.
c) Attendance of the school principal, teachers and students.
Procedures and Data Flow:
- The surveyors have a tablet with Internet access where they record
information from visited schools.
- The information is processed in real time in the central database of the
Ministry of Education.
- The information obtained during each visit is used to generate monthly
reports and sent to all Regional Education and UGELs in the country.
One of the many variables collected by SE is the adequate fulfilment of the
administrative processes of the Maintenance Program of PRONIED. In this
particular case, each surveyor asks school maintenance managers to report which
processes of the program (listed in Figure 1) have been completed to date. At the
moment of the survey, the SE surveyor has no way to verify these self-reported
responses. Additionally, there are no tangible consequences for the school
manager should he or she lie, a common feature for primary collected and self-
reported survey data.
d. Explaining the SMS PRONIED experiment
The PRONIED program, complete with a database of over 30,000 cell phone
numbers for school maintenance managers nationwide as well as the smoothly-
operating Wasichay database system, is an ideal implementing office for rigorously
testing effects of innovations to policy. As such, PRONIED agreed to cooperate
with the impact evaluation of its SMS campaign to maintenance managers.
In order to do so, PRONIED allowed the design of the SMS campaign to include a
pure control group and well as four treatment groups, each delivering a different
message. Considering the evidence produced by similar interventions (see: Karlan,
2011, 2012, Fink et al 2014, Chong et al, 2013, Castleman and Lindsay, 2014), the
treatment consisted of five rounds of SMS, the first two with a biweekly frequency
and the last three were weekly. The last SMS was delivered the week before the
submission date for declaration of expenditures. All SMS were personalized with
the maintenance manager´s first name.
10
In 2015, the PRONIED Maintenance program distributed maintenance funds for
59,700 schools nationwide in Peru. All of these schools were assigned a budget
and a school maintenance manager. The sample of schools considered for the
original RCT evaluation of the SMS Campaign excludes a fraction of these schools
from the experiment for two reasons. First, for 18,598 of these maintenance
managers, there was no registered cellphone. Additionally, 11,224 schools that
had already completed their expenditure declaration at the beginning of the
experiment were also excluded. Therefore, the final sample of the RCT experiment
to evaluate the impact of the SMS campaign consisted of 29,878 schools, nearly
half of the total coverage of the program.
In Table A.2 we present the means of these three groups in order to understand
the distinction between the final experiment sample and the group of schools
excluded from the experiment for the aforementioned reasons. Schools excluded
due to unavailable cellphone numbers for maintenance managers have a lower
level of compliance for maintenance activities than the experiment sample. Schools
excluded because expenditure declarations were complete prior to the experiment
start have higher levels of compliance.13
In graph A.1A an A.1B, in the Annex, we can see how these groups compare in
their fulfilment of the two most important program activities: submission of the
expenditure planning sheet and submission of the expenditure declaration. In the
graph, the two vertical red dashed lines identify the beginning and the end of the
experiment. For submission of the expenditure planning sheet, the average
compliance rate for the RCT experiment sample is between that of the two
excluded groups. In the case of the submission of expenditure declaration, the
experiment sample had 0% before the start of the experiment and the sample
excluded due to early completion of expenditure declaration had 100% compliance
rate during the entire duration of the experiment.
The SMS message itself has an informational component (e.g. the deadline for
perform an activity) and content applying principles of behavioral economics. In
order to evaluate the impact of the SMS messages, where all maintenance
managers in the final sample were randomly assigned to six different groups:
- Group 1: Alert - Group 2: Social Norm - Group 3: Monitoring with detailed information - Group 4: Notice of public reporting of non-compliant schools - Group 5: Reminder of potential auditory visits - Control: Did not receive SMS.
13 This could be because managers without cellphone could be poorer or be living or working in rural areas where there is lower access to mobile technology. In contrast, the main reason for the difference in the second group might be that these managers are likely to be more motivated.
11
We briefly summarize the experiment sample and design on Figure 2.
Figure 2. Brief description of PRONIED SMS Experiment
e. Description of the Experiment Sample:
For our analysis, we combine the survey information collected by Semaforo
Escuela with the administrative data of the maintenance program collected on the
Wasichay System. For both datasets we have the same indicator: compliance on
the submission of the expenditure planning, a document that maintenance
12
managers are required to submit detailing the budget planification for maintenance
activities in the school.
We matched schools both in the Wasichay and Semáforo Escuela database, since
there is information on Wasichay system for 26,544 of 27,385 schools visited at
least once by SE (about 97% of the total sample). The 26,544 schools matched in
both databases represents about 45% of the total of schools participating in the
PRONIED maintenance program.
From this total number of schools included in the maintenance program and
present in the Wasichay database, about half were excluded for reasons already
mentioned in the previous section d. detailing the original RCT experiment. For
these reasons, of the 26,544 matched schools, only 14,931 belonged to the SMS
experiment sample. We briefly summarize our sample of analysis on Figure 3.
Figure 3. Description of sample of analysis
Total schools
participating in the
maintenance national program
(N=59,704)
Included in Semaforo Escuela
Sample N=26,544
45% from total
Included in SMS PRONIED campaign
N=14,931 55% from sample on
Semaforo and Wasichay
T
Treated N=10,803 5 arms of treatment
C Control
N=4,128
Not included in SMS PRONIED campaign
N=11,613 45% from sample on
Semaforo and Wasichay
Not included in Semaforo Escuela
sample N=33,160
55% from total
In Table A3, we present descriptive school-level characteristics for our analysis
sample including: Average number of students, average number of classrooms,
gender of maintenance manager, geographic location of school (rural/urban) ,
school educational level (primary or secondary), native language spoken at school,
13
among others. We can see there are few differences between the full matched
sample between Semaforo and Wasichay (1) and the RCT subset (2).14
We compare the measurement for this indicator in Wasichay (our objective
benchmark of the accurate information) on the same day when Semáforo Escuela
visited the school and collected the data. In Table 2 we can see how this indicator
varies depending on the source of the data:15
Table 2. Comparison between Wasichay and Semaforo Escuela answers on "Registration of
Technical Sheet"
Panel A: Complete match between Semaforo Escuela and Wasichay
Semaforo Escuela Survey
Frequencies Percentages
No Yes Missing
Doesn't know
Total No Yes Missing Doesn't know
Total
Wasichay System
No 11,144 4,078 189 176 15,587 32.27 11.81 0.55 0.51 45.14
Yes 2,319 16,381 150 94 18,944 6.72 47.44 0.43 0.27 54.86
Total 13,463 20,459 339 270 34,531 38.99 59.25 0.98 0.78 100.00
Panel B: Only matched sample included on the SMS experiment
Semaforo Escuela Survey
Frequencies Percentages
No Yes Missing
Doesn't know
Total No Yes Missing Doesn't know
Total
Wasichay System
No 6,342 2,338 102 92 8,874 31.86 11.75 0.51 0.46 44.59
Yes 1,397 9,494 89 49 11,029 7.02 47.70 0.45 0.25 55.41
Total 7,739 11,832 191 141 19,903 38.88 59.45 0.96 0.71 100.00
For this comparison, our sample includes the total number of visits SE performed for each
school. This maximizes the number of observations in the analysis we use, as often
schools were visited more than once in the mentioned period. Additionally, as will be
explained in the following section, our econometric model makes considers them different
observations.
14 These differences can be misleading since, as seen in sub-section 2.c, two samples were excluded from the experiment that appear to be statistically different; However, since for one group the average compliance was higher and for the other excluded group it was lower, the mean of the experiment sample appears similar to that of the original sample although it is not. 15 This comparison is robust to changes in the specification of the variable in Wasichay. For example, we create an additional indicator of compliance on the submission of the expenditure planning sheet within a two-week window around the visit of Semaforo Escuela. In other words, we considered that a school had similar answers if the answers were similar on both data sources and if the Wasichay data was within a two weeks window around the visit of SE
14
4. METHODOLOGY
We estimate the effect of the SMS campaign treatment using a Difference-in-
Difference (DD) estimation. In other words, we will compare the change in average
treatment effect in schools with the change in average compliance of the
submission of expenditure planning sheet after the SMS campaign started.
Equation 1: itXittiDDtiit XPostSMSPostSMSy *210
Where i denotes a school and t the day when the surveyor visited the school.
SMS is an indicator variable for whether the school was selected to the SMS
campaign. Post is a binary variable that takes the value of zero before the SMS
campaign launch and one after the campaign launch. The interaction between
SMS and Post is an indicator of the schools selected for SMS campaign after its
beginning. The coefficient βDD is the main coefficient of interest, and gives the
average difference of the difference before and after the SMS campaign start
between treatment and control groups.
We denote Xit as the set of covariates that are correlated with the outcome or with
the treatment selection, school district fixed effects, and week fixed effects. The
complete list of control variables and the main descriptive statistics can be found in
Table A3.
We consider this approach to be the most appropriate since Semáforo Escuela is
an unbalanced data panel for which we have information from schools during the
entire year. This method allows a more efficient use of the data since it includes
the observations prior to the beginning of the campaign.16
We define the survey data sample as the set of schools for which there is data in
the Wasichay administrative database and for which SE collected data at least
once throughout the 2015 school year. Using the survey data sample, we perform
a multivariate regression analysis comparing the results on the average effect of
treatment and control group schools using for models of data sets.
The four models of data sets are:
i) Full information: Uses the data of the entire sample of 29,878 schools
represented in the Wasichay database included in the experiment.
ii) Full information for survey data sample: For the 14,931 schools that have
been surveyed at least once in Semaforo Escuela and are in the SMS, we will use
Wasichay data.
16 We use a missing indicator to handle the small number of missing observations on covariates.
15
iii) Corrected information on survey data sample: For each school in the survey
data sample and in the experiment, we will use the Wasichay administrative data
registered for the day that Semaforo visited the school. In other words, we will
simulate what the data would look like had Semaforo Escuela collected the
administrative data of Wasichay.
iv) Self-reported data sample: This model is limited to Semaforo Escuela, and
does not use the administrative data at all. In other words, this scenario reflects the
analysis methods that RCTs traditionally follow.
We summarize the most important features of the four datasets in Table 3. As can
be seen in this table, moving from Scenarios 1 to 4 signifies a decrease in quantity
of data as well as a decrease in precision of measurement of the indicator. In the
table, we see that for the first Scenario, we have complete information for the total
of schools, for all school days.
For example, when we go from Scenario 1 to Scenario 2, we change from using
the whole universe to using a survey sample (in this case, a large survey sample);
therefore, we change our estimation results only because of choosing a sample for
the survey.
Moving from Scenario 2 to Scenario 3, we change a time-sensitive gradient: as
data for schools is collected on different days, the timeframe becomes more
ambiguous and comparability between treated and control groups is weakened.
Finally, moving from Scenario 3 to Scenario 4, we add the most interesting feature
of survey-collected data, which is that the outcome information obtained is self-
reported. Therefore, a risk of misreported information (error in data recollection
processes, respondent-level bias such as social desirability, etc.) is introduced,
permitting possible attenuation bias on the estimation.
16
Table 3. Description of datasets scenarios
Scenarios Sample Time accuracy
Data source
Description
Full information
All the sample included on the experiment
All schools have complete data for established timeframes
Admin. Data
We use all schools included in the original RCT experiment using the data from prior to the beginning of the experiment and from the day after the deadline for completing the maintenance activities
Full information for survey data sample
Only a sample (45%)
All schools have complete data for established timeframes
Admin. Data
We use all schools included in the experiment that were at least surveyed once on the Semaforo Survey using the data prior to the beginning of the experiment and from the day after the deadline for completing the maintenance activities.
Corrected information on survey data sample
Only a sample (45%)
Data for each school available for different days
Admin. Data
The scenario of a "quality survey". We use the information on Wasichay that would have been collected on the day of the visit of Semaforo, some before the experiment began and some after its start. This means that we are in the survey scenario but only changing the data source for the outcome variable.
Self-reported data sample
Only a sample (45%)
Data for each school available for different days
Self-reported
The traditional scenario. Only self-reported data on surveys collected on different days, some before the experiment begins and some after the experiment begins.
17
5. RESULTS
Graph 1 illustrates the key results of this study. Here, we present the point
estimates of the same Difference in Difference estimation, presented on the
previous section, for the four data scenarios. As was mentioned above, in Scenario
1 we have full information and advantages of administrative data, whereas in
Scenario 4 we only have survey data (the traditional scenario).
Graph 1. Results of the DD impact estimator depending on the data source
Note: See Table 4 for full results. 90% Confidence intervals.
In the first three scenarios, the SMS campaign point estimate of the impact
changes very little, and remains approximately at 2.5 percentage points increase in
probability of compliance with the submission date for the expenditure planning
sheet.
As can be seen in Graph 1, from Scenario 1 to 2 the confidence intervals of the
point estimates increase, accompanied by minor changes in the point estimates.
Similarly, from Scenario 2 to Scenario 3, we see a slight increase in the point
estimate and almost no change in the size of the confidence intervals.
The most interesting result is in the progression from Scenario 3 to Scenario 4,
where the point estimate suffers a drop in magnitude and now is not statistically
different from zero. The only difference between these scenarios is the source of
the data. In scenario 3, we have all the characteristics of a survey data collection,
but the outcome variable is based on the administrative data from the date of the
-10
12
34
Estim
ate
d im
pa
ct o
f S
MS
Cam
paig
n
Scenario 1 Scenario 2 Scenario 3 Scenario 4
18
SE visit. One possible explanation for the change from Scenario 3 to 4 is the
measurement error that can be attributed to the survey recollection.
In Graph 2, we present the coefficient of variation of the point estimates presented
in Graph 1. Graph 2 shows that the coefficient for Scenario 4 is significantly larger
than for the previous three Scenarios, indicating that this estimator is very likely to
be imprecise.
Graph 2. Coefficient of Variation of the DD impact estimator depending on
the data source
Source: Estimations by the case study research team
Finally we present our four estimations of the impact of the SMS campaign each in
three different models with different sets of covariates. The first column of results
for each scenario presents the DD model without any control variables, while the
second column presents the results only including UGEL-level controls. Finally, the
third column presents the results including all control variables presented in Table
A3. Our point estimates in the four scenarios are almost unchanged between
models, demonstrating that our results are robust to the model specification.
0.2
.4.6
Co
effic
ient o
f V
ari
atio
n
Scenario 1 Scenario 2 Scenario 3 Scenario 4
19
Table 4. Difference in Difference estimation for the four datasets scenarios
Scenario 1 Scenario 2 Scenario 3 Scenario 4
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
DD Impact estimator
2.58*** (0.73)
2.58*** (0.7)
2.58*** (0.7)
2.39** (0.99)
2.39*** (0.91)
2.39*** (0.91)
2.46** (1.16)
2.64*** (0.96)
2.65*** (0.96)
1.58 (1.4)
1.09 (1.07)
1.16 (1.07)
Average increase after treatment starts
14.42*** (0.62)
14.42*** (0.59)
14.42*** (0.59)
14.32*** (0.85)
14.32*** (0.77)
14.32*** (0.77)
39.29*** (0.99)
2.36 (1.9)
2.34 (1.91)
4.56*** (1.2)
-1.36 (1.81)
-1.17 (1.82)
Control mean before treatment starts
72.24*** (0.49)
73.98*** (0.68)
47.01*** (0.79)
51.6*** (0.8)
N 59,756 29,864 26,354 25,486
Educational Districts fixed effects
No Yes Yes No Yes Yes No Yes Yes No Yes Yes
Control variables No No Yes No No Yes No No Yes No No Yes
Note:
20
6. CONCLUSIONS
This case study clearly illustrates a case where measurement error is present in
the results of an RCT impact evaluation. As can be seen in Table 4, the impact
estimator drops from Scenario 1 to 4 and ceases to be statistically different from
zero. In addition, this is accompanied by an increase of 100 percent in the
coefficient of variation of the point estimate. We can interpret the differences in the
resulting effects on the expenditure declaration submission dates between
Scenarios 3 and 4 as being due to the measurement error present in the survey
data collection. This is most likely due to several sources of attenuation bias
associated with the self-reported data collection methods. This unique comparison
was possible because the results of an RCT were available, as well as both
administrative and self-reported data for the same primary outcomes with
comparable timing.
The policy implications of these results are not negligible. Wherever possible, the
benefits in precision gained in calculating outcome indicators highlights the
importance of using administrative data. Additionally, being able to identify how
measurement error affects effect estimates may also affect the cost-effectiveness
aspect of policies, and ultimately the direction and design of development policy.
Finally, it is important to clarify that this study not be interpreted as a motion to
replace or reduce primary data collection of self-reported data. We are aware that
there are several important areas of research where administrative data is not
available. For these cases, we hope this paper at least provides guidance and
awareness of how to mitigate measurement error risk. For areas where
administrative data is available, it is a highly informative and inexpensive practice
to perform a complementary analysis of results.
21
7. REFERENCES
Baird, S., Chirwa, E., McIntosh, C., and Özler, B. (2012). Examining the reliability of self-reported data on school participation. Journal of Development Economics, 89-93.
Barrera-Osorio, F., Bertrand, M., Linden, L., and Perez-Calle, F. (2011). Improving the design of conditional transfer programs: Evidence from a randomized education experiment in Colombia. American Economic Journal: Applied Economics, 167-195. Beegle, K., De Weerdt, J., Friedman, J., and Gibson, J. (2012). Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics, 98(1), 3-18. Bertrand, M., and Mullainathan, S. (2001). Do people mean what they say? Implications for subjective survey data. The American Economic Review, 91(2), 67-72. Castleman, B., and Lindsay, P. (2014) Working Paper, Summer Nudging: Can Personalized Text Message and Peer Mentor Outreach Increase College Going Among Low income High School Graduates? EdPolicyWorks. Chong A, Karlan D, Shapiro J, and Zinman J. (2013) (Ineffective) Messages to Encourage Recycling: Evidence from a Randomized Evaluation in Peru. World Bank Working Paper #6548, July 2013. Das, J., Hammer, J., and Sánchez-Paramo, C. (2012). The impact of recall periods on reported morbidity and health seeking behavior. Journal of Development Economics, 98(1), 76-88. Duflo, E., Glennerster, R., and Kremer, M. (2007). Using randomization in
development economics research: A toolkit. Handbook of Development
Economics, 4, 3895-3962.
Feeney, L., Bauman, J. and Chabrier, J. (2015).Using administrative data for
randomized evaluations. J-PAL North America, Cambridge, MA.
Fink, G., Lanthorn, H., Raifman, J., and Rokicki, S. (2014) The Impact of Text
Message Reminders on Adherence to Antimalarial Treatment in Northern Ghana:
A Randomized Trial. Published: October 28, 2014
Finkelstein, A., and Taubman, S. (2014). Using randomized evaluations to improve
the efficiency of US healthcare delivery. J-PAL North America, Cambridge, MA.
Karlan, D., McConnell, M., Mullainathan, S., and Zinman J. (2011). “Getting on the
Top of Mind: How Reminders Increase Savings," Working Paper. January 2011.
22
Karlan, D., Morton, M., and Zinman, J. (2012) “A Personal Touch: Text Messaging
for Loan Repayment.” Working Paper, February 2012.
McKenzie, D. (2012). Beyond baseline and follow-up: The case for more T in
experiments. Journal of Development Economics, 99(2), 210-221.
Meyer, B., and Mittag, N. (2015). Using linked survey and administrative data to
better measure income: Implications for poverty, program effectiveness and holes
in the safety net (No. w21676). National Bureau of Economic Research.
Millimet, D. (2010). The Elephant in the corner: A cautionary tale about Measurement Error in Treatment Effect Models. (No. 5140). IZA Discussion Paper Series. Stecklov, G., and Weinreb, A. (2010). Improving the quality of data and impact-
evaluation studies in developing countries. Inter-American Development Bank.
Taubman, S., Allen, H., Wright, B., Baicker, K., and Finkelstein, A. (2014). Medicaid increases emergency-department use: evidence from Oregon's Health Insurance Experiment. Science, 343(6168), 263-268. Zwane, A. P., Zinman, J., Van Dusen, E., Pariente, W., Null, C., Miguel, E., Kremer, M., Karlan, D.S., Hornbeck, R., Giné, X. and Duflo, E. (2011). Being surveyed can change later behavior and related parameter estimates. Proceedings of the National Academy of Sciences, 108(5), 1821-1826.
23
8. ANNEX
Table A1. Number of schools surveyed in Semaforo Escuela by number of visits during the year
Number of visits N %
Only once 14,021 51%
Two 6,756 25%
Three 6,506 24%
Four or more 102 0%
Total 24,061 100%
24
Table A2. Descriptive stats on Wasichay System for the sample included and excluded from SMS PRONIED experiment
Full Sample included on maintenance program Sample excluded because maintenance
manager did not have cellphone Sample excluded because they finish all maintenance activities before 13/08 Sample included on the SMS PRONIED
experiment
N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max Number of classrooms in the school
59,700 6.5 7.6 0 86 18,598 5.9 7.9 0 86 11,224 5.9 7.0 0 69 29,878 7.1 7.6 0 83
Mount assigned for maintenance (Nuevos Soles)
59,700 8,596 8,872 0 30,000 18,598 7,903 8,737 4 30,000 11,224 7,700 8,301 2,000 30,000 29,878 9,364 9,094 0 30,000
% Form maintenance committee
59,700 0.95 0.21 0 1 18,598 0.91 0.29 0 1 11,224 1.00 0.00 1 1 29,878 0.97 0.18 0 1
% Form oversight committee 59,700 0.95 0.21 0 1 18,598 0.91 0.29 0 1 11,224 1.00 0.00 1 1 29,878 0.97 0.18 0 1
% Submission Expenditure Planning
59,700 0.94 0.23 0 1 18,598 0.89 0.31 0 1 11,224 1.00 0.06 0 1 29,878 0.96 0.20 0 1
% Register commitment act 59,700 0.91 0.28 0 1 18,598 0.86 0.35 0 1 11,224 0.97 0.17 0 1 29,878 0.93 0.26 0 1
% Register expenditure declaration
59,700 0.94 0.24 0 1 18,598 0.88 0.33 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1
% Register final oversight inform 59,700 0.55 0.50 0 1 18,598 0.52 0.50 0 1 11,224 0.60 0.49 0 1 29,878 0.55 0.50 0 1
% Form maintenance committee on time
59,700 0.93 0.25 0 1 18,598 0.86 0.35 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1
% Form oversight committee on time 59,700 0.93 0.26 0 1 18,598 0.85 0.35 0 1 11,224 1.00 0.00 1 1 29,878 0.95 0.22 0 1
% Timely Submission Expenditure Planning
59,700 0.86 0.35 0 1 18,598 0.77 0.42 0 1 11,224 0.95 0.21 0 1 29,878 0.87 0.33 0 1
% Register commitment act on time
59,700 0.82 0.38 0 1 18,598 0.73 0.44 0 1 11,224 0.93 0.26 0 1 29,878 0.84 0.37 0 1
% Register expenditure declaration on time
59,700 0.77 0.42 0 1 18,598 0.64 0.48 0 1 11,224 1.00 0.00 1 1 29,878 0.77 0.42 0 1
% Register final oversight inform on time
59,700 0.12 0.32 0 1 18,598 0.11 0.31 0 1 11,224 0.20 0.40 0 1 29,878 0.09 0.28 0 1
25
Table A3. Descriptive Statistics
Full Sample Semaforo Escuela matched with Wasichay
Full Sample Semaforo Escuela matched with Wasichay that were included on the SMS experiment
Control Group Treatment Group
N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max
Maintenance Program manager characteristics % Male 43,655 0.6 0.5 0 1 25,400 0.7 0.5 0 1 6,975 0.7 0.5 0 1 18,425 0.7 0.5 0 1 % Hired with temporal contract 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1
% Hired with permanent contract 45,827 0.9 0.3 0 1 26,354 0.9 0.2 0 1 7,275 0.9 0.3 0 1 19,079 0.9 0.2 0 1
% Hired with other type of contracts 45,827 0.0 0.1 0 1 26,354 0.0 0.1 0 1 7,275 0.0 0.1 0 1 19,079 0.0 0.1 0 1
Amount assigned to maintenance (Nuevos Soles)
45,827 15,844 10,055 0 30,000 26,354 16,049 9,891 0 30,000 7,275 15,942 9,962 3,507 30,000 19,079 16,089 9,864 0 30,000
School Characteristics from Semaforo Escuela % where surveyed report being the school principal
45,827 0.9 0.3 0 1 26,354 0.9 0.3 0 1 7,275 0.9 0.4 0 1 19,079 0.9 0.3 0 1
% that report that students speak native languages
45,755 0.1 0.3 0 1 26,317 0.1 0.3 0 1 7,267 0.1 0.3 0 1 19,050 0.1 0.3 0 1
% that report that students speak spanish only
45,755 0.9 0.3 0 1 26,317 0.9 0.3 0 1 7,267 0.9 0.3 0 1 19,050 0.9 0.3 0 1
% multigrade 45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0.1 0.3 0 1 19,079 0.1 0.3 0 1 % that have only on teacher for all the grades on the school
45,680 0.2 0.4 0 1 26,274 0.2 0.4 0 1 7,256 0.2 0.4 0 1 19,018 0.2 0.4 0 1
% that have at least a teacher for each grade (regular school)
45,680 0.7 0.5 0 1 26,274 0.7 0.5 0 1 7,256 0.7 0.5 0 1 19,018 0.7 0.5 0 1
% that are only for girls 45,680 0.1 0.3 0 1 26,274 0.1 0.3 0 1 7,256 0.1 0.3 0 1 19,018 0.1 0.3 0 1
% that are only for boys 45,808 0.0 0.1 0 1 26,344 0.0 0.1 0 1 7,274 0.0 0.1 0 1 19,070 0.0 0.1 0 1 % that allows both girls and boys 45,808 1.0 0.1 0 1 26,344 1.0 0.1 0 1 7,274 1.0 0.1 0 1 19,070 1.0 0.1 0 1
% managed with some degree of collaboration with the private sector
45,827 0.0 0.2 0 1 26,354 0.0 0.1 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.1 0 1
26
Table A3. Descriptive Statistics
Full Sample Semaforo Escuela matched with Wasichay
Full Sample Semaforo Escuela matched with Wasichay that were included on the SMS experiment
Control Group Treatment Group
N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max N Mean S.D. Min Max % managed solely by the public sector 45,827 1.0 0.2 0 1 26,354 1.0 0.1 0 1 7,275 1.0 0.2 0 1 19,079 1.0 0.1 0 1
% on geographic region Coast 45,818 0.3 0.5 0 1 26,353 0.3 0.5 0 1 7,275 0.3 0.5 0 1 19,078 0.3 0.5 0 1
% on geographic region Highlands 45,818 0.2 0.4 0 1 26,353 0.2 0.4 0 1 7,275 0.2 0.4 0 1 19,078 0.2 0.4 0 1
% on geographic region Jungle 45,818 0.5 0.5 0 1 26,353 0.5 0.5 0 1 7,275 0.5 0.5 0 1 19,078 0.5 0.5 0 1
% on rural areas 45,827 0.5 0.5 0 1 26,354 0.5 0.5 0 1 7,275 0.5 0.5 0 1 19,079 0.5 0.5 0 1 % of initial educational level 45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0.1 0.3 0 1 19,079 0.1 0.3 0 1
% of primary educational level 45,827 0.6 0.5 0 1 26,354 0.6 0.5 0 1 7,275 0.6 0.5 0 1 19,079 0.6 0.5 0 1
% of secondary educational level 45,827 0.3 0.5 0 1 26,354 0.3 0.5 0 1 7,275 0.3 0.5 0 1 19,079 0.3 0.5 0 1
Size of the School
Number of students 45,827 186.6 242.5 0 2,932 26,354 186.3 239.6 1 2,932 7,275 182.4 228.2 1 2,285 19,079 187.9 243.8 1 2,932
Number of teachers 45,827 10.6 12.3 0 170 26,354 10.7 12.1 1 170 7,275 10.6 12.0 1 120 19,079 10.7 12.2 1 170
Number of sections 45,827 8.9 7.5 0 96 26,354 8.9 7.5 1 96 7,275 8.8 7.3 1 70 19,079 8.9 7.6 1 96
Number of classrooms 45,827 12.1 9.1 0 86 26,354 12 9 0 83 7,275 12 9 0 62 19,079 12.1 8.6 0 83
Important Educational or Social Programs Percentage of schools beneficiary of Educational Program "Jornada Escolar Completa"
45,827 0.0 0.2 0 1 26,354 0.1 0.2 0 1 7,275 0.1 0.2 0 1 19,079 0.0 0.2 0 1
% of schools on VRAEM region 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1
% of schools on the frontier 45,827 0.0 0.2 0 1 26,354 0.0 0.2 0 1 7,275 0.0 0.2 0 1 19,079 0.0 0.2 0 1
% of schools beneficiary of Educational Program "Soporte Docente"
45,827 0.1 0.3 0 1 26,354 0.1 0.3 0 1 7,275 0 0 0 1 19,079 0.1 0.3 0 1
27
Graph A1.A. Evolution on the % of maintenance managers who submitted their
Expenditure Planning on Wasichay system during year 2015 for the SMS experiment
sample and excluded groups
Graph A1.B. Evolution on the % of maintenance managers who submitted their
Expenditure Declaration on Wasichay system during year 2015 for the SMS
experiment sample and excluded groups
020
40
60
80
10
0
Evolu
tio
n o
f m
ain
tenan
ce m
an
age
rs w
ho
de
live
red
the
ir E
xpe
nditu
re P
lannin
g
April 6 April 23 May 14 June 4 June 25 July 16 Aug 13 Sept 3 Sept 24 Oct 15 Nov 5 Nov 26
Week
Excluded - No cellphone Excluded - Already completed ED
SMS Experiment sample
020
40
60
80
10
0
Evolu
tio
n o
f re
spo
nsib
les w
ho d
eliv
ere
d t
he
ir
Expe
nd
iture
De
cla
ration
April 6 April 23 May 14 June 4 June 25 July 16 Aug 13 Sept 3 Sept 24 Oct 15 Nov 5 Nov 26
Week
Excluded - No cellphone Excluded - Already complete ED
SMS experiment sample