j pal courseexecutivesocialcourse 16participants 19 ...€pal lecturers .. 9 list of participants...
TRANSCRIPT
1
J‐PAL
ExecutiveEducationCourseinEvaluatingSocialProgrammesCourseMaterialParticipants Accra16–19May2012
2
3
TableofContents
Program ……………………………………………………………………………………………………. 5
CourseObjectives …………………………………………………………………………………….. 7
J‐PALLecturers ……………………………………………………………………………………….. 9
ListofParticipants …………………………………………………………………………………… 11
GroupAssignment …………………………………………………………………………………… 13
Vocabulary ……………………………………………………………………………………………… 14
CaseStudy1 ……………………………………………………………………………………………… 16
CaseStudy2 ……………………………………………………………………………………………… 22
CaseStudy3 ……………………………………………………………………………………………… 32
ExerciseA ……………………………………………………………………………………………… 36
ExerciseB ……………………………………………………………………………………………… 37
PrimeronPowerCalculation ………………………………………………………………….. 45
ExerciseC …………………………………………………………………………………………………. 51
CaseStudy4 ……………………………………………………………………………………………… 57
GroupPresentation …………………………………………………………………………………. 65
4
5
PROGRAMMEExecutiveEducationCourseinEvaluatingSocialProgrammes,16–19May2012
Accra,Ghana
Wednesday
16May2012
Thursday
17May2012
Friday
18May2012
Saturday
19May2012
8:30 –10:00
WelcomingRemarks
Lecture1:WhatisEvaluation
RebeccaThornton(UniversityofMichigan)
Lecture3:WhyRandomise
KehindeAjayi(BostonUniversity)
Lecture5:SamplingandSampleSize
PaulGlewwe(UniversityofMinnesota)
Lecture7:ProjectfromStarttoFinish
KehindeAjayi(BostonUniversity)
10:30 –12:30
GroupWorkSession1
IntroductiontogroupmembersGroupworkoncasestudy1:SchoolMonitoringReformin
MadagascarDecisionongroupproject(45min)
GroupWorkSession3
Groupworkoncasestudy3:ExtraTeacherProgramme
(60min)
Groupworkonpresentation(60min)
GroupWorkSession5
GroupexerciseConsamplesizeestimation(60min)
Groupworkoncasestudy4:EvaluationDesign(60min)
GroupWorkSession7
Groupworktofinalisepresentations
Lunch Lunch Lunch Lunch
13:30 –15:00
Lecture2:MeasuringImpacts
IsaacMbiti(SouthernMethodistUniversity)
Lecture4:HowtoRandomise
PaulGlewwe(UniversityofMinnesota)
Lecture6:ThreatsandAnalysis
TBD
Grouppresentations(eachgroup:
15minpresentation,15mindiscussion)
15:30 –17:00
GroupWorkSession2
Groupworkoncasestudy2:LearntoRead(45min)
Groupworkonpresentation(45min)
GroupWorkSession4
GroupexerciseAonrandomsampling(30min)andexerciseBonmechanismsofrandomisation
(30min)StatsPrimer(30min)
GroupWorkSession6
Groupworkonpresentation(90min)
6
J-PAL Executive Education Course in Evaluating Social Programmes
7
CourseObjectives
Our executive trainingprogramme is designed forpeople froma variety of backgrounds:managersand researchers from international development organisations, foundations, governments and non‐governmentalorganisationsfromaroundtheworld,aswellastrainedeconomistslookingtoretool.
The course is a full‐timecourse. It is important forparticipants toattendall lecturesandgroupworkinordertosuccessfullycompletethecourseandreceivethecertificateofcompletion.
CourseCoverage
Thefollowingkeyquestionsandconceptswillbecovered:
Whyandwhenisarigorousevaluationofsocialimpactneeded? Thecommonpitfallsofevaluations,andhowrandomisationcanhelp. Thekeycomponentsofagoodrandomisedevaluationdesign. Alternativetechniquesforincorporatingrandomisationintoprojectdesign. Howdoyoudeterminetheappropriatesamplesize, identifyoutcomemeasures,andmanage
data? Guardingagainstthreatsthatmayunderminetheintegrityoftheresults. Techniquesfortheanalysisandinterpretationofresults. Howtomaximisepolicyimpactandtestexternalvalidity.
Theprogrammewillachievethesegoalsthroughadiversesetofintegratedteachingmethods.Expertresearchers will provide both theoretical and example‐based classes complemented by workshopswhereparticipantscanapplykeyconceptstorealworldexamples.Byexaminingbothsuccessfulandproblematicevaluations,participantswillbetterunderstandthesignificanceofvariousspecificdetailsof randomisedevaluations.Furthermore, theprogrammewillofferextensiveopportunities toapplythese ideas ensuring that participants will leave with the knowledge, experience, and confidencenecessarytoconducttheirownrandomizedevaluations.
8
J-PAL Executive Education Course
9
J‐PALLecturersKehindeAjayiAssistantProfessorBostonUniversity Kehinde Ajayi is an Assistant Professor at Boston University. Herresearch interests are in the areas of economic development and theeconomics of education, with a particular focus on school choice,educational mobility, and the effects of school quality on students'academic performance. She is currently conducting a set of studiesrelatingtosecondaryeducationinGhana. PaulGlewweProfessorofAppliedEconomicsUniversityofUniversityofMinnesota Paul Glewwe is a Professor of Applied Economics at the University ofMinnesotaandthecurrentDirectoroftheCenterforInternationalFoodand Agricultural Policy. His interests are economics of education,poverty and inequality in developing countries, and appliedeconometrics.HisrecentpublicationshaveappearedintheHandbookofthe Economics of Education, Economic Development and CulturalChange, Journal of Development Economics, Journal of EconomicLiterature, Journal of Human Resources, Journal of Public EconomicsandWorldBankEconomicReview.IsaacMbitiAssistantProfessorSouthernMethodistUniversityIsaacMbitiisanassistantprofessorintheDepartmentofEconomicsofSouthernMethodistUniversity.Hisresearchinterestsareineconomicdevelopment,laboureconomicsanddemography.
J-PAL Executive Education Course J-PAL Lecturers
10
RebeccaThorntonAssistantProfessorofEconomicsUniversityofMichiganRebeccaThorntonbeganherappointmentasanassistantProfessorattheUniversity of Michigan economics department in 2008. Her researchfocuses on education and health as well as how individuals respond tofinancial incentives in these areas. She has worked on a randomizedevaluationofamerit‐basedscholarshipinKenya.Sheisalsoworkingonrandomized evaluations examining HIV testing and prevention andmenstruationandeducationinNepal.
J-PAL Executive Education Course
11
J‐PALExecutiveEducationCourse
12
ListofParticipants
# LastName FirstName Organization Country
1 Adamu‐Issah Madeez UNICEF Ghana
2 Adeola Aminat FATEFoundation Nigeria
3 AmaniampongKwabenaAdu
CouncilforTechnicalandVocationalEducationandTraining Ghana
4 Asiegbor Issac AssessmentUnit,CRDD Ghana
5 Banashek Sarah USAID Ghana
6 Brikorang Fred GES,BasicEducation Ghana
7 Brown Victoria MangoTree Uganda
8 Cohen Lee USAID Mali
9 Coulibaly Lazare InstituteforPopularEducation Mali
10 Ekuri EmmanuelAfricanPopulationandHealthResearchCentre Kenya
11 Hounkpodote HilaireLaboratoryforResearchonSocialTransformations Senegal
12 Ilomu Jessica USAID Uganda
13 Imoka Chizoba UnveilingAfricaFoundation Nigeria
14 Ka Awa ARED Senegal
15 Kabaka Stewart MoPHS Kenya
16 Kalanda McKnight MinistryofEducation,ScienceandTechnology Malawi
17 Korr Jacob GES,CurriculumandResearchDevelopment Ghana
18 Kotin Veronica EMIS Ghana
19 Kwame Agyeapong MinistryofEducation Ghana
20 Lamptey Elliot MinistryofEducation Ghana
21 Lowe Zev Worldreader Spain
22 Maiga Eugenie AfricanCentreforEconomicTransformation Ghana
23 Mugerwa Fred OfficeofthePrimeMinister Uganda
24Mukonka Remmy MinistryofEducation,ScienceandVocational
TrainingZambia
25 Muvunyi Emmanuel EducationBoard Rwanda26 Oduro Evelyn GES,TeacherEducation Ghana
27 Ogle Muktar NationalAssessmentCentre Kenya
28 Oldmeadow Emily DFID Nigeria
29 Otieno Mary CentreforUniversalEducationatBrookings USA
30 Otim Daniel MangoTree Uganda
31 Otoo Ernest MinistryofEducation Ghana
32 Pealore Dominic MinistryofEducation Ghana
33 Perez Marisol USAID Ghana
34 Rotich Leah MinistryofEducation Kenya
35 Ruto Sara UwezoEastAfrica Kenya
36 SalieuKamara Mohamed MinistryofEducation Sierra
J‐PALExecutiveEducationCourse
13
Leone
37 Sarpong Anthony GES,Mathematicscurriculum Ghana
38 Thompson SamuelCouncilforTechnicalandVocationalEducationandTraining Ghana
39 Traore Bréhima OeuvreMalienned'Aideàl'EnfanceduSahel Mali
40 Umubyeyi Marcienne Selfemployed Rwanda
41 Uneze EberechukwuCentrefortheStudiesoftheEconomiesofAfrica Nigeria
42 Weisenhorn Nina IRC Congo
J‐PALExecutiveEducationCourse
14
GroupAssignment
Group1TA:CarolinaCorral
Group2TA:ClareHofmeyr
1 AwaKa 1 EberechukwuUneze
2 BréhimaTraore 2 EmmanuelEkuri
3 EmmanuelMuvunyi 3 EugenieMaiga
4 HilaireHounkpodote 4 FredMugerwa
5 LazareCoulibaly 5 MaryOtieno
6 MarcienneUmubyeyi 6 StewartKabakaGroup3TA:ElizabethSchultz
Group4TA:AmaraKallon
1 EmilyOldmeadow 1 AminatAdeola
2 JessicaIlomu 2 AnthonySarpong
3 LeeCohen 3 FredBrikorang
4 MarisolPerez 4 IssacAsiegbor
5 NinaWeisenhorn 5 SamuelThompson
6 SarahBanashek 6 MohamedSalieuKamaraGroup5TA:CaitlinTulloch
Group6TA:MichaelPolansky
1 DominicPealore 1 ChizobaImoka
2 ErnestOtoo 2 JacobKorr
3 KwabenaAduAmaniampong 3 EvelynOduro
4 MadeezAdamu‐Issah 4 McKnightKalanda
5 VeronicaKotin 5 MuktarOgle
6 ZevLowe 6 RemmyMukonkaGroup7TA:PacePhillips
1 AgyeapongKwame
2 DanielOtim
3 ElliotLamptey
4 LeahRotich
5 SaraRuto
6 VictoriaBrown
J‐PALExecutiveEducationCourse
15
Vocabulary
1. Attrition: the process of individuals joining in or dropping out of either the treatment orcomparisongroupoverthecourseofthestudy.
2. AttritionBias:statisticalbiaswhichoccurswhenindividualssystematicallyjoininordropoutof either the treatment or the comparison group for reasons related to the treatment oroutcomes.
3. Baseline:datadescribingthecharacteristicsofparticipantsmeasuredacrossbothtreatment
andcomparisongroupspriortoimplementationofintervention.
4. Cluster:thelevelofobservationatwhichasamplesizeismeasured.Generally,observationswhicharehighlycorrelatedwitheachothershouldbeclusteredandthesamplesizeshouldbemeasuredatthisclusteredlevel.
5. ComparisonGroup: in an experimental design, a randomly assigned group from the same
populationas the treatmentgroupthatdoesnotreceive the intervention.Participants in thecomparisongroupareusedasastandardforcomparisonagainstthetreatedsubjectsinordertovalidatetheresultsoftheintervention.
6. Counterfactual: whatwould have happened to the participants in a program had they notreceivedtheintervention.Thecounterfactualcannotbeobservedfromthetreatmentgroup,itcanonlybeinferredfromthecomparisongroup.
7. Endline: datadescribing the characteristics of participantsmeasuredacrossboth treatment
andcomparisongroupsafterimplementationofintervention.
8. Equivalence: groups are identical on all baseline characteristics, both observable andunobservable.Ensuredbyrandomization.
9. Externality:anindirectcostorbenefitincurredbyindividualswhodidnotdirectlyreceivethe
treatment.Alsotermed"spillover."
10. Hypothesis:aproposedexplanationofandfortheeffectsofagivenintervention.Hypothesesareintendedtobemadeex‐ante,orpriortotheimplementationoftheintervention.
11. IntentiontoTreat:themeasuredimpactofaprogramthatincludesalldatafromparticipants
inthegroupstowhichtheywererandomized,regardlessofwhethertheyactuallyreceivedthetreatment.Intention‐to‐treatanalysispreventsbiascausedbythelossofparticipants,whichmaydisruptthebaselineequivalenceestablishedbyrandomizationandwhichmayreflectnon‐adherencetotheprotocol.
12. Intra‐clustercorrelationcoefficient:ameasureofthecorrelationbetweenobservations
withinacluster;i.e.thelevelofcorrelationindrinkingwatersourceforindividualsinahousehold.
13. LevelofRandomization:thelevelofobservation(ex.individual,household,school,village)at
whichtreatmentandcomparisongroupsarerandomlyassigned.
J‐PALExecutiveEducationCourse
16
14. PartialCompliance:individualsdonotcomplywiththeirassignment(totreatmentorcomparison).Alsotermed"diffusion"or"contamination."
15. Phase‐inDesign:astudydesigninwhichgroupsareindividuallyphasedintotreatmentover
aperiodoftime;groupswhicharescheduledtoreceivetreatmentlateractasthecomparisongroupsinearlierrounds.
16. Power:thelikelihoodthat,whentheprogramhasaneffect,onewillbeabletodistinguishthe
effectfromzerogiventhesamplesize.
17. ProgrammeImpact:estimatedbymeasuringthedifferenceinoutcomesbetweencomparisonandtreatmentgroups.Thetrueimpactoftheprogramisthedifferenceinoutcomesbetweenthetreatmentgroupanditscounterfactual.
18. SelectionBias:statisticalbiasbetweencomparisonandtreatmentgroupsinwhichindividuals
inonegrouparesystematicallydifferentfromthoseintheother.Thesecanoccurwhenthetreatmentandcomparisongroupsarechoseninanon‐randomfashionsothattheydifferfromeachotherbyoneormorefactorsthatmayaffecttheoutcomeofthestudy.
19. Significance:thelikelihoodthatthemeasuredeffectdidnotoccurbychance.Statisticaltests
areperformedtodeterminewhetheronegroup(e.g.theexperimentalgroup)isdifferentfromanothergroup(e.g.comparisongroup)onthemeasurableoutcomevariablesusedintheevaluation.
20. StandardDeviation:astandardizedmeasureofthevariationofasamplepopulationfromits
meanonagivencharacteristic/outcome.Mathematically,thesquarerootofthevariance.
21. StandardizedEffectSize:astandardizedmeasureofthe[expected]magnitudeoftheeffectofaprogram.
22. TheoryofChange(ToC):describesastrategyorblueprintforachievingagivenlong‐term
goal.Itidentifiesthepreconditions,pathwaysandinterventionsnecessaryforaninitiative'ssuccess.
23. TreatmentGroup:inanexperimentaldesign,arandomlyassignedgroupfromthepopulation
thatreceivestheinterventionthatisthesubjectofevaluation.
24. TreatmentontheTreated:themeasuredimpactofaprogramthatincludesonlythedataforparticipantswhoactuallyreceivedthetreatment.
J‐PALExecutiveEducationCourse
17
CASESTUDY1:REFORMINGSCHOOLMONITORING
ThiscasestudyisbasedontheJ‐PALStudy“PrimaryEducationManagementinMadagascar”byEstherDuflo,GerardLassibille,andTrangvanNguyen.
J‐PALthankstheauthorsforallowingustousetheirstudy.
Case 3: Women as Policymakers
Thinking about measurement and outcomes
Case1:ReformingSchoolMonitoringThinkingaboutmeasurementandoutcomes
TRANSLATING RESEARCH INTO ACTION
J‐PALExecutiveEducationCourse
18
KeyVocabulary
BackgroundOver the last 10 years, low‐income countries in Africa have made striking progress in expandingcoverageofprimaryeducation.However,inmanyofthesecountriestheeducationsystemcontinuestodeliverpoorresults,puttingthegoalofuniversalprimaryschoolcompletionatrisk. Incompetentadministration,inadequatefocusonlearningoutcomes,andweakgovernancestructuresarethoughttobesomeofthereasonsforthepoorresults.Thiscasestudywilllookataprogramwhichaimedtoimprovetheperformanceandefficiencyofeducationsystemsbyintroducingtoolsandamonitoringsystemateachlevelalongtheservicedeliverychain.
MadagascarSchoolSystemReforms:“ImprovingOutputsnotOutcomes”Madagascar’s public primary school system has been making progress in expanding coverage inprimaryeducationthanksinpartduetoincreasesinpublicspendingsincethelate1990s.Aspartofitspoverty reductionstrategy,publicexpenditureoneducationrose from2.2 to3.3percentofGDPbetween 2001 and 2007. In addition to increased funding, the government introduced importantreformssuchastheeliminationofschoolfeesforprimaryeducation,freetextbookstoprimaryschoolstudents,publicsubsidiestosupplementthewagesofnon–civilserviceteachersinpublicschools(inthepasttheywerehiredandpaidentirelybyparentassociations),andnewpedagogicalapproaches.Themostvisiblesignofprogresswas the large increase incoverage inprimaryeducation inrecentyears. In 2007, the education systemenrolled some3.8million students in bothpublic andprivateschools—morethantwicetheenrolmentin1996.Duringthelast10years,morethan4000newpublicprimaryschoolshavebeencreated,and thenumberofprimaryschool teachers in thepublic sectormorethandoubled.Whilethisprogressisimpressive,enormouschallengesremain.Entryratesintograde1arehigh,butless than half of each cohort reaches the end of the five‐year primary cycle. Despite government
1. Hypothesis: a proposed explanation of and for the effects of a given intervention. Hypotheses are
intended to be made ex‐ante, or prior to the implementation of the intervention.
2. Indicators: metrics used to quantify and measure the needs that a program aims to address (needs
assessment), how a program is implemented (process evaluation) and whether it affects specific short‐
term and long‐term goals (impact evaluation).
3. Logical Framework (LogFrame): a management tool used to facilitate the design, execution, and
evaluation of an intervention. It involves identifying strategic elements (inputs, outputs, outcomes and
impact) and their causal relationships, indicators, and the assumptions and risks that may influence
success and failure.
4. Theory of Change (ToC): describes a strategy or blueprint for achieving a given long‐term goal. It
identifies the preconditions, pathways and interventions necessary for an initiative's success.
J‐PALExecutiveEducationCourse
19
interventions, grade repetition rates are still uniformly high throughout the primarycycle,averagingabout18percent.Furthermore,testscoresrevealpoorperformance:studentsscoredanaverageof30percentonFrenchand50percentonMalagasyandmathematics.DiscussionTopic11. Wouldyouregardthereformsassuccessful?Whyorwhynot?
2. Whatare someof thepotential reasons forwhy the reformsdidnot translate intobetter
learningoutcomes?
Problemsremain....Asthestartingpointofthestudy,researchersworkedwiththeMinistryofEducationto identifytheremainingconstraintsintheschoolingsystem.Asurveyconductedin2005revealedthefollowingkeyproblems:
1. Teacher absenteeism: At 10 percent, teacher absenteeism remains a significant problem.Only 8 percent of school directors monitor teacher attendance (either by taking dailyattendance or tracking and posting a monthly summary of attendance), and more than 80percentfailtoreportteacherabsencestosub‐districtanddistrictadministrators.
2. Communicationwith parents: Communication between teachers and parents on studentlearningisoftenperfunctory,andstudentabsenteeismisrarelycommunicatedtoparents.
3. Teacherperformance: Essential pedagogical tasks are often neglected: only 15 percent ofteachers consistently prepare daily and biweekly lessons plans while 20 percent do notpreparelessonplansatall.Studentacademicprogressisalsopoorlymonitored:resultsoftestsandquizzesarerarelyrecordedand25percentofteachersdonotprepareindividualstudentreportcards.
Overall,manyofproblems seem tobe result of a lackof organization, control andaccountability ateverystageof thesystem,allofwhichare likely tocompromise theperformanceof thesystemandlowerthechanceofthereformsbeingsuccessful.
J‐PALExecutiveEducationCourse
20
InterventionIn order to address these issues, theMadagascarMinistry of Educationseekstotightenthemanagementandaccountabilityateachpointalongthe service delivery chain (see Figure 1) by making explicit to thevarious administrators and teachers what their responsibilities are,supportingthemwithteachingtools,andincreasingmonitoring.Theministryisconsideringtwoapproachestoevaluate1:
1. Top‐Down:Operational tools and guidebooks which outline their responsibilitiesare given to the relevant administrators. During a meeting,administrators are trained on how to carry out their tasks, and theirperformance criteria are clarified. This is followed up by regularmonitoringoftheirperformance,whichiscommunicatedthrough(sub‐)districtreportcardstohigherlevels.
2. Bottom‐Up:
Thisprogrampromotestheabilityofparentstomonitortheirschoolsandholdteachersaccountablewhen they perform below expectation. Report cards with easy‐to‐understand content are given toparentsandmembersofpoorruralcommunities.Theycontainasmallsetofperformanceindicators,informationonenrolmentsandschoolresources,aswellasdatathatallowaschool’sperformancetobe compared that of other schools. In addition, greater community participation in school‐basedmanagementisencouragedthroughstructuredschoolmeetingsinwhichstaffoftheschool,parents,andcommunitymembersreviewthereportcardanddiscusstheirschoolimprovementplan.
DiscussionTopic2
1 The actual evaluation included further interventions such as training of teachers. For more details, please refer to the paper. For pedagogical reasons, we focus only on two approaches in this case study.
1. BeforesettinguptheRCT,researcherscarefullyanalyzedtheexistingproblem.Whydoyouthinkthisisimportantasastartingpointofanevaluation?
2. Whataretheintermediateandultimategoalsthatthisprogramhopestoachieve?
3. Whatisthekeyhypothesisbeingtestedthroughthisimpactevaluation?
Figure 1: Education System
J‐PALExecutiveEducationCourse
21
TheoryofChange
A theoryof change (ToC) identifies the causal linkbetween theintervention and the final outcome. Figure 2 shows oneway inwhichaToCcanbestructured.For example, a program or intervention is implemented toaddress a specific problem identified in the needs assessment(e.g. low literacy levels). The intervention (e.g. text books)mayleadtooutputs(e.g.studentsusageoftextbooks)throughwhichintermediary outcomes (e.g. reading skills) could be affected.These may lead to longer‐term outcomes (e.g. drop‐out rates,employmentoutcomes).AnunderlyingassumptionofthisToCisthatstudentsdonotalreadyhavetextbooks.DiscussionTopic3
Whatdatatocollect?DatacollectionandmeasurementBeforedecidingwhichdatatocollect,youneedtobeveryclearontheoutcomeyouaretargetingandinwhatway the intervention is theorized to impact thisoutcome. Inotherwords, identifyingakeyhypothesis and theory of change at the beginning of an evaluation helps you to decide whatinformationtocollect.For each step of the theory of change, we need to identify indicators (what to measure) andinstruments(howtocollectdata).Continuingwiththeexampleofthetextbookprogram,anindicatorcouldbereadinglevelofstudentsandtheinstrumentcouldbestandardizedreadingtests.Inaddition,weneedtocollectdataonourassumptionstoseewhetherornottheyholdtrue.DiscussionTopic4
1. WhichindicatorswouldyoumeasureateachstepintheToCsyoudrewup?
2. Howwouldyoucollectdatafortheseindicators?Inotherwords,whatinstrumentswouldyouuse?Doyouforeseechallengeswiththeseformsofdatacollection?
1. DrawoutthecausalchainusingtheformatinFigure2foreachofthebottom‐upandtop‐downinterventions(useaseparateToCforeach).
2. Whatarethenecessaryconditions/assumptionsunderlyingtheseToCs?
Figure 2: Theory of Change
J‐PALExecutiveEducationCourse
22
HowtointerprettheresultsTheevaluationfoundthatthebottom‐upapproachledtosuccessfulresults.Attendanceatmeetingsbetweenteachersandcommunitymemberswashigh,andalthoughcommunicationbetweenteachersand parents did not change, teachers improved the quality of teaching as shown by an increase inlessonplansandtestscores.However,thefindingsofthetop‐downinterventionwerequitedifferent:DiscussionTopic5
1. Howdoyouinterprettheresultsofthetop‐downintervention?
2. Whyisitimportanttointerprettheresultsinthecontextofaprogramtheoryofchange?
3. Whatarethepolicyimplications?Howmightyourespondtothesefindings?
J‐PALExecutiveEducationCourse
23
CASESTUDY2:LEARNTOREADEVALUATIONS
Thiscasestudyisbasedon“PitfallsofParticipatoryPrograms:Evidence
fromaRandomizedEvaluationinIndia,”byAbhijitBanerjee(MIT),RukminiBanerjee(Pratham),EstherDuflo(MIT),RachelGlennerster(J‐PAL),and
StutiKhemani(TheWorldBank)
J‐PALthankstheauthorsforallowingustousetheirpaper
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
Case2:LearntoReadEvaluationsDifferentmethodsofevaluation
J‐PALExecutiveEducationCourse
24
KeyVocabulary1. Counterfactual: what would have happened to the participants in a program had they not received the
intervention. The counterfactual cannot be observed from the treatment group, it can only be inferred
from the comparison group.
2. Treatment Group: in an experimental design, a randomly assigned group from the population that
receives the intervention that is the subject of evaluation.
3. Comparison Group: in an experimental design, a randomly assigned group from the same population
as the treatment group that does not receive the intervention. Participants in the comparison group are
used as a standard for comparison against the treated subjects in order to validate the results of the
intervention.
4. Program Impact: estimated by measuring the difference in outcomes between comparison and
treatment groups. The true impact of the program is the difference in outcomes between the treatment
group and its counterfactual.
5. Baseline: data describing the characteristics of participants measured across both treatment and
comparison groups prior to implementation of intervention.
6. Endline: data describing the characteristics of participants measured across both treatment and
comparison groups after implementation of intervention.
7. Selection Bias: statistical bias between comparison and treatment groups in which individuals in one
group are systematically different from those in the other. These can occur when the treatment and
comparison groups are chosen in a non‐random fashion so that they differ from each other by one or
more factors that may affect the outcome of the study.
8. Omitted Variable Bias: statistical bias that occurs when certain variables/characteristics (often
unobservable), which affect the measured outcome, are omitted from a regression analysis. Because
they are not included as controls in the regression, one incorrectly attributes the measured impact solely
to the program.
WhyLearntoRead(L2R)?
In a large‐scale survey conducted in 2004, the Indian NGO, Pratham, discovered that only 39% ofchildren(aged7‐14)inruralUttarPradeshcouldreadandunderstandasimplestory,andnearly15%couldnotrecognizeevenaletter.During this period, Pratham was developing the “Learn‐to‐Read” (L2R) module of its Read Indiacampaign.L2Rincludedauniquepedagogyteachingbasicliteracyskills,combinedwithagrassrootsorganizingefforttorecruitvolunteerswillingtoteach.
J‐PALExecutiveEducationCourse
25
This program allowed the community to get involved in children’s education moredirectlythroughvillagemeetingswherePrathamstaffsharedinformationonthestatusofliteracyinthevillageandtherightsofchildrentoeducation. In thesemeetings,Prathamidentifiedcommunitymemberswhowerewilling to teach. Volunteers attended a training session on the pedagogy, afterwhichtheycouldholdafter‐schoolreadingclassesforchildren,usingmaterialsdesignedandprovidedbyPratham.Prathamstaffpaidoccasionalvisitstothesecampstoensurethattheclasseswerebeingheldandtoprovideadditionaltrainingasnecessary.
DidtheLearntoReadprojectwork?
DidPratham’s“LearntoRead”programwork?Whatisrequiredinorderforustomeasurewhetheraprogramworkedor,inotherwords,whetherithadimpact?In general, to ask if a programworks is to ask if the program achieves its goal of changing certainoutcomesforitsparticipants,andensurethatthosechangesarenotcausedbysomeotherfactorsoreventshappeningatthesametime.Toshowthattheprogramcausestheobservedchanges,weneedtosimultaneouslyshowthatiftheprogramhadnotbeenimplemented,theobservedchangeswouldnothaveoccurred (orwouldbedifferent).Buthowdoweknowwhatwouldhavehappened? If theprogramhappened,ithappened.Measuringwhatwouldhavehappenedrequiresenteringanimaginaryworld in which the program was never given to these participants. The outcomes of the sameparticipantsinthisimaginaryworldarereferredtoasthecounterfactual.Sincewecannotobservethetruecounterfactual,thebestwecandoistoestimateitbymimickingit.The key challenge of program impact evaluation is constructing or mimicking thecounterfactual.Wetypicallydothisbyselectingagroupofpeoplethatresembletheparticipantsasmuch as possible butwho did not participate in the program. This group is called the comparisongroup. Becausewewant to be able to say that it was the program and not some other factor thatcaused the changes in outcomes, it is important that the only difference between the comparisongroupandtheparticipantsisthatthecomparisongroupdidnotparticipateintheprogram.Wethenestimate“impact”asthedifferenceobservedattheendoftheprogrambetweentheoutcomesofthecomparisongroupandtheoutcomesoftheprogramparticipants.The impact estimate is only as accurate as the comparison group is successful at mimicking thecounterfactual. If thecomparisongrouppoorlyrepresents thecounterfactual, the impact is (inmostcircumstances)poorlyestimated.Thereforethemethodusedtoselectthecomparisongroupisakeydecisioninthedesignofanyimpactevaluation.
Thatbringsusback to ourquestions:Did theLearn toReadprojectwork?Whatwas its impactonchildren’sreadinglevels?Inthiscase,theintentionoftheprogramisto“improvechildren’sreadinglevels”andthereadinglevelis the outcomemeasure. So,whenwe ask if the Learn to Read projectworked,we are asking if itimproved children’s reading levels. The impact is the difference between reading levels after thechildrenhavetakenthereadingclassesandwhattheirreadinglevelwouldhavebeenif thereadingclasseshadneverexisted.
J‐PALExecutiveEducationCourse
26
Forreference,ReadingLevelisanindicatorvariablethattakesvalue0ifthechildcanreadnothing,1ifheknowsthealphabet,2ifhecanrecognizewords,3ifhecanreadaparagraph,and4ifhecanreadafullstory.Whatcomparisongroupscanweuse?Thefollowingexpertsillustratedifferentmethodsofevaluatingimpact.(Refertothetableonthelastpageofthecaseforalistofdifferentevaluationmethods).
EstimatingtheimpactoftheLearntoReadprojectMethod1NewsRelease:ReadIndiahelpschildrenLearntoRead.Prathamcelebratesthesuccessofits“LearntoRead”program—partoftheReadIndiaInitiative.Ithasmade significant progress in its goal of improving children’s literacy rates through better learningmaterials,pedagogicalmethods,andmostimportantly,committedvolunteers.Theachievementofthe“Learn to Read” (L2R) program demonstrates that a revised curriculum, galvanized by communitymobilization,canproducesignificantgains.Massivegovernmentexpenditures inmid‐daymealsandschool construction have failed to achieve similar results. In less than a year, the reading levels ofchildrenwhoenrolledintheL2Rcampsimprovedconsiderably.
Justbeforetheprogramstarted,halfthesechildrencouldnotrecognizeHindiwords—manynothingatall.ButafterspendingjustafewmonthsinPrathamreadingclasses,morethanhalfimprovedbyatleast one reading level,with a significantnumber capable of recognizingwords and several able toreadfullparagraphsandstories!Onaverage,theliteracymeasureofthesestudentsimprovedbynearlyonefullreadinglevelduringthisperiod.
Figures: Showendline levels forL2Rparticipants –organizedbybaseline levelsof reading levels (left:zerolevel,right:letterlevel)
J‐PALExecutiveEducationCourse
27
DiscussionTopic1
1. Whattypeofevaluationdoesthisnewsreleaseimply?
2. Whatrepresentsthecounterfactual?
3. Whataretheproblemswiththistypeofevaluation?
Method2Opinion:The“ReadIndia”projectnotuptothemarkPrathamhasraisedmillionsofdollars,expandingrapidlytocoverallofIndiawithitsso‐called“Learn‐to‐Read”program,butdo itsstudentsactually learntoread?Recentevidencesuggestsotherwise.AteamofevaluatorsfromEducationforAllfoundthatchildrenwhotookthereadingclassesendedupwith literacy levelssignificantlybelowthoseof theirvillagecounterparts.AfteroneyearofPrathamreading classes, Pratham students could only recognizewordswhereas thosewho steered clear ofPratham programs were able to read full paragraphs. If you have a dime to spare, and want tocontributetotheeducationof India’s illiteratechildren,youmaythinktwicebeforethrowing it intothefountainofPratham’spromises.Notestothegraph:ReadingLevelisanindicatorvariablethattakesvalue0ifthechildcanreadnothing,1ifheknowsthealphabet,2ifhecanrecognizewords,3ifhecanreadaparagraphand4ifhecanreadafullstory.DiscussionTopic2
1. Whattypeofevaluationisthisopinionpieceemploying?
2. Whatrepresentsthecounterfactual?
3. Whataretheproblemswiththistypeofevaluation?
Comparison of reading levels of children who took reading classes Vs. reading levels of children who did
not take them
0
0.5
1
1.5
2
2.5
3
Did not take reading classes/ Took reading classes
Rea
din
g L
evel
Mean reading level for children who did not take reading classesMean reading level for children who took reading classes
J‐PALExecutiveEducationCourse
28
Method3LettertotheEditor:EFAshouldconsiderEvaluatingFairlyandAccuratelyTherehavebeen severalunfair reports in thepress concerningprograms implementedby theNGOPratham. A recent article by a former Education for All bureaucrat claims that Pratham is actuallyhurting the children it recruits into its ‘Learn‐to‐Read’ camps. However, the EFA analysis uses thewrong metric to measure impact. It compares the reading levels of Pratham students with otherchildren in the village—not taking into account the fact that Pratham targets thosewhose literacylevels areparticularlypoor at thebeginning. If Prathamsimply recruited themost literate childreninto their programs, and compared them to their poorer counterparts, they could claim successwithoutconductingasingleclass.ButPrathamdoesnotdo this.Andrealistically,Prathamdoesnotexpectitsilliteratechildrentoovertakethestrongerstudentsinthevillage.Itsimplytriestoinitiateimprovementoverthecurrentstate.Thereforethemetricshouldbeimprovementinreadinglevels—not the final level.Whenwe repeatedEFA’sanalysisusing themore‐appropriateoutcomemeasure,the Pratham kids improved at twice the rate of the non‐Pratham kids (0.6 reading level increasecomparedto0.3).Thisdifferenceisstatisticallyverysignificant.HadtheEFAevaluatorsthoughtto lookat themoreappropriateoutcome, theywouldrecognizetheincrediblesuccessofReadIndia.PerhapstheyshouldenrollinsomePrathamclassesthemselves.DiscussionTopic3
1. Whattypeofevaluationisthisletterusing?
2. Whatrepresentsthecounterfactual?
3. Whataretheproblemswiththistypeofevaluation?
J‐PALExecutiveEducationCourse
29
Method4Thenumbersdon’tlie,unlessyourstatisticiansareasleepPratham celebrates victory, opponents cry foul. A closer look shows that, as usual, the truth issomewhereinbetween.There has been awar in the press between Pratham’s supporters and detractors. Pratham and itsadvocatesassertthattheReadIndiacampaignhasresultedinlargeincreasesinchildliteracy.Severaldetractors claim that Pratham programs, by pulling attention away from the schools, are in factcausingsignificantharmtothestudents.Unfortunately,thisbattleisbeingwagedusinginstrumentsofanalysisthatareseriouslyflawed.Theultimatevictimisthepublicwhoislookingforananswertothequestion:isPrathamhelpingitsintendedbeneficiaries?Thisreportusessophisticatedstatisticalmethodstomeasure thetrue impactofPrathamprograms.Wewere concerned about other variables confounding previous results.We therefore conducted asurveyinthesevillagestocollect informationonchildage,grade‐level,andparents’educationlevel,andusedthosetopredictchildtestscores.
Table 1: Reading outcomes
Level Improvement
(1) (2) (3) (4)
Reading Classes -0.68 ** 0.04 0.24 ** 0.11(0.0829) (0.1031) (0.0628) (0.1081)
Previous reading level 0.71 **(0.0215)
Age 0.00 -0.01(0.0182) (0.0194)
Sex -0.01 0.05(0.0469) (0.0514)
Standard 0.02 -0.08 **(0.0174) (0.0171)
Parents Literate 0.04 0.13 **(0.0457) (0.0506)
Constant 2.82 0.36 0.37 0.75(0.0239) (0.2648) (0.0157) (0.3293)
School-type controls No Yes No 0.37
Notes: The omitted category for school type is "Did not go to school". Reading Level is an indicator variable that
takes value 0 if the child can read nothing, 1 if he knows the alphabet, 2 if he can recognize words, 3 if he can read a
paragraph and 4 if he can read a full story
NOTE:Datausedinthiscasearereal.“Articles”onthedebatewereartificiallyproducedforthepurposeofthecase.EducationforAll(EFA)nevermadeanyoftheclaimsdescribedherein
Looking at Table 1, we find some positive results, some negative results and some “no‐results”,depending onwhich variables we control for. The results from column (1) suggest that Pratham’sprogramhurtthechildren.ThereisanegativecorrelationbetweenreceivingPrathamclassesandfinal
Control variables: (independent) variables other than the reading classes that may influence children’s reading outcomes
Key independent variable: reading classes are the treatment; the analysis tests the effect of these classes on reading outcomes
Statistical significance: the corresponding result is unlikely to have occurred by chance, and thus is statistically significant (credible)
Dependent variables: reading level and improvement in reading level are the primary outcomes in this analysis.
J‐PALExecutiveEducationCourse
30
reading outcomes (‐0.68). Column (3), which evaluates improvement, suggestsimpressiveresults(0.24).Butlookingatchildoutcomes(eitherlevelorimprovement)controllingforinitial reading levels, age, gender, standard and parent’s education level – all determinants of childreadinglevels–wefoundnoimpactofPrathamprograms.Therefore,controllingfortherightvariables,wehavediscoveredthatononehand,Prathamhasnotcausedtheharmclaimedbycertainopponents,butontheotherhand,ithasnothelpedchildrenlearn.Prathamhasthereforefailedinitsefforttoconvinceusthatitcanspenddonormoneyeffectively.DiscussionTopic4
1. Whattypeofevaluationisthisreportutilizing?
2. Whatrepresentsthecounterfactual?
3. Whataretheproblemswiththistypeofevaluation?Thetableonthenexttwopagesreviewsthedifferentimpactevaluationmethodologies.
J-PAL Executive Education Course
31
Methodology Description Who is in the comparison group? Required Assumptions Required Data
Quasi‐Experimental M
ethods
Pre‐Post Measure how program participants improved (or changed) over time.
Program participants themselves—before participating in the program.
The program was the only factor influencing any changes in the measured outcome over time.
Before and after data for program participants.
Simple Difference
Measure difference between program participants and non‐participants after the program is completed.
Individuals who didn’t participate in the program (for any reason), but for whom data were collected after the program.
Non‐participants are identical to participants except for program participation, and were equally likely to enter program before it started.
After data for program participants and non‐participants.
Differences in Differences
Measure improvement (change) over time of program participants relative to the improvement (change) of non‐participants.
Individuals who didn’t participate in the program (for any reason), but for whom data were collected both before and after the program.
If the program didn’t exist, the two groups would have had identical trajectories over this period.
Before and after data for both participants and non‐participants.
Multivariate Regression
Individuals who received treatment are compared with those who did not, and other factors that might explain differences in the outcomes are “controlled” for.
Individuals who didn’t participate in the program (for any reason), but for whom data were collected both before and after the program. In this case data is not comprised of just indicators of outcomes, but other “explanatory” variables as well.
The factors that were excluded (because they are unobservable and/or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and non‐participants.
Outcomes as well as “control variables” for both participants and non‐participants.
Statistical Matching
Individuals in control group are compared to similar individuals in experimental group.
Exact matching: For each participant, at least one non‐participant who is identical on selected characteristics. Propensity score matching: non‐participants who have a mix of characteristics which predict that they would be as likely to participate as participants.
The factors that were excluded (because they are unobservable and/or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and non‐participants.
Outcomes as well as “variables for matching” for both participants and non‐participants.
Regression Discontinuity Design
Individuals are ranked based on specific, measureable criteria. There is some cutoff that determines whether an individual is eligible to participate. Participants are then compared to non‐
Individuals who are close to the cutoff, but fall on the “wrong” side of that cutoff, and therefore do not get the program.
After controlling for the criteria (and other measures of choice), the remaining differences between individuals directly below and directly above the cut‐off score are not statistically significant and will not bias the results. A necessary but
Outcomes as well as measures on criteria (and any other controls).
J‐PALExecutiveEducationCourse
32
Methodology Description Who is in the comparison group? Required Assumptions Required Data
participants and the eligibility criterion is controlled for.
sufficient requirement for this to hold is that the cut‐off criteria are strictly adhered to.
Instrumental Variables
Participation can be predicted by an incidental (almost random) factor, or “instrumental” variable, that is uncorrelated with the outcome, other than the fact that it predicts participation (and participation affects the outcome).
Individuals who, because of this close to random factor, are predicted not to participate and (possibly as a result) did not participate.
If it weren’t for the instrumental variable’s ability to predict participation, this “instrument” would otherwise have no effect on or be uncorrelated with the outcome.
Outcomes, the “instrument,” and other control variables.
Experimen
tal M
ethod
Randomized Evaluation
Experimental method for measuring a causal relationship between two variables.
Participants are randomly assigned to the control groups.
Randomization “worked.” That is, the two groups are statistically identical (on observed and unobserved factors).
Outcome data for control and experimental groups. Control variables can help absorb variance and improve “power”.
J-PAL Executive Education Course
33
CASESTUDY3:EXTRATEACHERPROGRAM
Thiscasestudyisbasedonthepaper“PeerEffectsandtheImpactof
Tracking:EvidencefromaRandomizedEvaluationinKenya,”byEstherDuflo(MIT),PascalineDupas(Stanford),andMichaelKremer(Harvard)
J‐PALthankstheauthorsforallowingustousetheirpaper
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
Case3:Extra TeacherProgramDesigninganevaluationtoanswerthreekeyeducationpolicyquestions
J‐PALExecutiveEducationCourse
34
KeyVocabulary1. Level of Randomization: the level of observation (ex. individual, household, school, village) at
which treatment and comparison groups are randomly assigned.
Over‐crowdedschools
Likemanyotherdevelopingcountries,KenyahasrecentlymaderapidprogresstowardtheMillenniumDevelopment Goal of universal primary education. Largely due to the elimination of school fees in2003,primaryschoolenrollmentrosenearly30percent,from5.9millionto7.6millionbetween2002and2005.2Without accompanying government funding, however, this progress has created its own set of newchallengesinKenya:
1. Large class size: Due to budget constraints, the rise in primary school enrollment has notbeenmatchedbyproportionalincreasesinthenumberofteachers.(Teachersalariesalreadyaccount for the largest component of educational spending.) The result has been very largeclasssizes,particularlyinlowergrades.InasampleofschoolsinWesternKenya,forexample,theaveragefirstgradeclassin2005was83students.Thisisconcerningbecauseitisbelievedthat small classesaremost important for theyoungest students,whoarestill acclimating tothe school environment. TheKenyanNationalUnion of Teachers estimates that the countryneeds an additional 60,000 primary school teachers in addition to the existing 175,000 inordertoreachallprimarystudentsanddecreaseclasssizes.
2. Teacher absenteeism: Further exacerbating the problem of pupil‐teacher ratios, teacher
absenteeismremainshigh,reachingnearly20%insomeareasofKenya.Therearetypicallynosubstitutesforabsentteachers,sostudentssimplymillaround,gohomeorjoinanotherclass,often of a different grade. Small schools, which are prevalent in rural areas of developingcountries, may be closed entirely as a result of teacher absence. Families have to considerwhether school will even be open when deciding whether or not to send their children toschool.Anobviousresultislowstudentattendance—evenondayswhentheschoolisopen.
3. Heterogeneousclasses:ClassesinKenyaarealsoveryheterogeneouswithstudentsvarying
widely in terms of school preparedness and support from home. Grouping students intoclassessortedbyability(tracking,orstreaming)iscontroversialamongacademicsandpolicymakers.Ononehand,ifteachersfinditeasiertoteachahomogeneousgroupofstudents,trackingcouldimproveschooleffectivenessandtestscores.Manyargue,onthe other hand, that if students learn in part from their peers, tracking could
2 UNESCO. (2006). United Nations Education, Scientific and Cultural Organization. Fact Book on Education for All. Nairobi: UNESCO Publishing, 2006.
J‐PALExecutiveEducationCourse
35
disadvantage low achieving students while benefiting high achievingstudents,therebyexacerbatinginequality.
4. Scarce schoolmaterials: Because of the high costs of educational inputs and the rising
numberofstudents,educationalresourcesotherthantheteacherarestretched,andinsomecases up to four students must share one textbook. And an already over‐burdenedinfrastructuredeterioratesfasterwhenforcedtoservemorechildren.
5. Lowcompletionrates:Asa resultof these factors, completion rates arevery low inKenya
withonly45.1%ofboysand43.3%ofgirlscompletingthefirstgrade.All in all, these issues pose new challenges to communities: how to ensure minimum quality ofeducationgivenKenya’sbudgetconstraints.
ContractTeachers:Apossiblesolution?
Governmentsinseveraldevelopingcountrieshaverespondedtosimilarchallengesbystaffingunfilledteachingpositionswithlocally‐hiredcontractteacherswhoarenotcivilserviceemployees.Thefourmaincharacteristicsofcontractteachersarethattheyare:
1. Appointed on annual renewable contracts, with no guarantee of renewed employment(unlikeregularcivilserviceteachers)
2. Often lessqualified than regular teachers andmuch less likely tohave a formal teachertrainingcertificateordegree
3. Paidlowersalariesthanthoseofregularteachers(typicallylessthanafifthofthesalariespaidtoregularteachers)
4. Morelikelytobefromthelocalareawheretheschoolislocated.
AreContractTeachersEffective?
The increasing use of contract teachers has been one of themost significant policy innovations inproviding primary education in developing countries, but it has also been highly controversial.Supporters say that using contract teachers is an efficient way of expanding education access andqualitytoalargenumberoffirst‐generationlearners.Knowingthattheschoolcommittee’sdecisionofwhetheror not to rehire them the following yearmayhingeonperformance, contract teachers aremotivatedtotryharderthantheirtenuredgovernmentcounterparts.Contractteachersarealsooftenmoresimilartotheirstudents,geographically,culturally,andsocioeconomically.Opponentsarguethatusingunder‐qualifiedanduntrainedteachersmaystaffclassrooms,butwillnotproduce learning outcomes. Furthermore the use of contract teachers de‐professionalizes teaching,reducestheprestigeoftheentireprofession,andreducesmotivationofallteachers.Evenifithelpsintheshortterm,itmayhurteffortstorecruithighlyqualifiedteachersinthefuture.While the use of contract teachers has generated much controversy, there is very little rigorousevidenceregardingtheeffectivenessofcontractteachersinimprovingstudentlearningoutcomes.
J‐PALExecutiveEducationCourse
36
TheExtraTeacherProgramRandomizedEvaluation
InJanuary2005,InternationalChildSupport(ICS)Africainitiatedatwoyearprogramtoexaminetheeffect of contract teachers on education in Kenya. Under the program, ICS gave funds to 140 localschool committees to hire one extra contract teacher to teach an additional first grade class. ICSexpectedthisprogramtoimprovestudentlearningby,amongotherthings,decreasingclasssizeandusingteacherswhoaremoredirectlyaccountable to thecommunities theyserve.However,becausecontractteacherstendtohavelesstrainingandreceivealowermonthlysalarythantheircivilservantcounterparts,therewasconcernaboutwhethertheseteachersweresufficientlymotivated,giventheircompensation,orqualifiedgiventheircredentials.The purpose of this intervention was to address the first three challenges: class size, teacheraccountability, and heterogeneity of ability. The evaluationwas designed tomeasure the impact ofclass‐sizereductions,therelativeeffectivenessofcontractteachers,andhowtrackingbyabilitywouldimpactbothlowandhigh‐achievingstudents.Whatexperimentaldesignscouldtesttheimpactofthisinterventiononeducationalachievement?Whichofthesechangesintheschoollandscapeisprimarilyresponsibleforimprovedstudentperformance?
AddressingMultipleResearchQuestionsthroughExperimentalDesign
Differentrandomizationstrategiesmaybeusedtoanswerdifferentquestions.Whatstrategiescouldbeusedtoevaluatethefollowingquestions?Howwouldyoudesignthestudy?Whowouldbeinthetreatmentandcontrolgroups,andhowwouldtheyberandomlyassignedtothesegroups?DiscussionTopic1:Testingtheeffectivenessofcontractteachers
1. WhatistherelativeeffectivenessofcontractteachersversusregulargovernmentteachersDiscussionTopic2:Lookingatmoregeneralapproachesofimprovingeducation
1. Whatistheeffectofgroupingstudentsbyabilityonstudentperformance?
2. Whatistheeffectofsmallerclasssizesonstudentperformance?DiscussionTopic3:Addressingallquestionswithasingleevaluation
1. Couldasingleevaluationexplorealloftheseissuesatonce?2. Whatrandomizationstrategycoulddoso?
J‐PALExecutiveEducationCourse
37
ExerciseA:RandomSampling&TheLawofLargeNumbers(30min)‐Excel
Inthisexercise,wewillvisuallyexplorerandomsamplesofdifferentsizesfromagivenpopulation.Inparticular,wewilltrytodemonstratethatlargersamplesizestendtobemorereflectiveoftheunderlyingpopulation.YourGroupLeaderhasthedataforthisexercise.
1. Openthefile“ExerciseA_SamplingDistributions_NEW.xlsm”.
2. Ifprompted,select“EnableMacros”.
3. Navigatetothe“Randomize”worksheet,whichallowsyoutochoosearandomsampleofsize“SampleSize”fromthedatacontainedinthe“control”worksheet.
4. Enter“10”for“SampleSizeandclickthe“Randomize”button.Observethedistributionof
thevariouscharacteristicsbetweenTreatment,ControlandExpected.Withasamplesizethissmall,thepercentagedifferencefromtheexpectedaverageisquitehighforreadingscores.Click“Randomize”multipletimesandobservehowthedistributionchanges.
5. Now,try“50”forthesamplesize.Whathappenstothedistributions?Randomizeafew
timesandobservethepercentagedifferenceforthereadingscores.
6. Increasethesamplesizeto“500”,“2000”and“10000”,andrepeattheobservationsfromstep5.Whatcanwesayaboutlargersamplesizes?HowdotheyaffectourTreatmentandControlsamples?ShouldthepercentagedifferencebetweenTreatment,ControlandExpectedalwaysgodownasweincreasesamplesize?
J‐PALExecutiveEducationCourse
38
ExerciseB:MechanicsofRandomization(90min)Part1:SimpleRandomizationLikemostspreadsheetprograms,MSExcelhasarandomnumbergeneratorfunction.Saywehadalistofschoolsandwantedtoassignhalftotreatmentandhalftocontrol.(1) Wehaveourlistofallschools.
(2)
J‐PALExecutiveEducationCourse
39
(2) Assignarandomnumbertoeachschool:The function RAND() is Excel’s random number generator. To use it, in Column C, type in thefollowing= RAND()ineachcelladjacenttoeveryname.Oryoucantypethisfunctioninthetoprow(row2)andsimplycopyandpastetotheentirecolumn,orclickanddrag.Typing“=RAND()”putsa15‐digitrandomnumberbetween0and1inthecell.(3) CopythecellsinColumC,thenpastethevaluesoverthesamecellsThefunction=RAND()willre‐randomizeeachtimeyoumakeanychangestoanyotherpartofthespreadsheet.Exceldoesthisbecauseitrecalculatesallvalueswithanychangetoanycell.(Youcanalsoinducerecalculation,andhencere‐randomization,bypressingthekeyF9.)
J‐PALExecutiveEducationCourse
40
Thiscanbeconfusing,however.Oncewe’vegeneratedourcolumnofrandomnumbers,wedonotneed to re‐randomize. We already have a clean column of random values. To stop excel fromrecalculating,youcanreplacethe“functions”inthiscolumnwiththe“values”.Todothis,highlightallvaluesinColumnC.Thenright‐clickanywhereinthehighlightedcolumn,andchooseCopy.ThenrightclickanywhereinthatcolumnandchosePasteSpecial.The“PasteSpecialwindowwillappear.Clickon“Values”.
(4)
J‐PALExecutiveEducationCourse
41
(4) SortthecolumnsineitherdescendingorascendingorderofcolumnC:HighlightcolumnsA,B,andC.Inthedatatab,presstheSortbutton:
ASortboxwillpopup.
IntheSortbycolumn,select“random#”.ClickOK.Doingthissortsthelistbytherandomnumberinascendingordescendingorder,whicheveryouchose.
J‐PALExecutiveEducationCourse
42
There!Youhavearandomlysortedlist.
(5) SortthecolumnsineitherdescendingorascendingorderofcolumnC:Becauseyourlistisrandomlysorted,itiscompletelyrandomwhetherschoolsareinthetophalfofthe list, or thebottomhalf. Therefore, if you assign the tophalf to the treatment groupand thebottomhalftothecontrolgroup,yourschoolshavebeen“randomlyassigned”.In columnD, type “T” for the firsthalfof the rows (rows2‐61). For the secondhalf of the rows(rows62‐123),type“C”
J‐PALExecutiveEducationCourse
43
Re‐sort your list back in order of school id. You’ll see that your schools have been randomlyassignedtotreatmentandcontrolgroups.
J‐PALExecutiveEducationCourse
44
Part2:StratifiedRandomizationStratification is the process of dividing a sample into groups, and then randomly assigningindividualswithineachgrouptothetreatmentandcontrol.Thereasonsfordoingthisarerathertechnical.Onereasonforstratifyingisthatitensuressubgroupsarebalanced,makingiteasiertoperform certain subgroup analyses. For example, if youwant to test the effectiveness of a neweducationprogramseparatelyforschoolswherechildrenaretaughtinHindiversusschoolswherechildrenaretaughtinGujarati,youcanstratifyby“languageofinstruction”andensurethatthereareanequalnumberschoolsofeachlanguagetypeinthetreatmentandcontrolgroups.(1) Wehaveourlistofschoolsandpotential“strata”.Mechanically,theonlydifferenceinrandomsortingisthatinsteadofsimplysortingbytherandomnumber,youwouldfirstsortbylanguage,andthentherandomnumber.Obviously,thefirststepistoensureyouhavethevariablesbywhichyouhopetostratify.(2) Sortbystrataandthenbyrandomnumber.Assumingyouhaveall thevariablesyouneed: in thedatatab,click“Sort”.TheSortwindowwillpopup.Sortby“Language”.Pressthebutton,“AddLevel”.Thenselect,“Random#”.
J‐PALExecutiveEducationCourse
45
(3) AssignTreatment–ControlStatusforeachgroup.Withineachgroupoflanguages,type“T”forthefirsthalfoftherows,and“C”forthesecondhalf.
J‐PALExecutiveEducationCourse
46
PrimeronPowerCalculation(30min)
Samplemeansandthenormaldistribution
J‐PALExecutiveEducationCourse
47
J‐PALExecutiveEducationCourse
48
J‐PALExecutiveEducationCourse
49
J‐PALExecutiveEducationCourse
50
J‐PALExecutiveEducationCourse
51
J‐PALExecutiveEducationCourse
52
GroupExerciseC:PowerCalculation(30min)
KeyVocabulary1. Power: the likelihood that, when the program has an effect, one will be able to distinguish the
effect from zero given the sample size.
2. Significance: the likelihood that the measured effect did not occur by chance. Statistical tests are
performed to determine whether one group (e.g. the experimental group) is different from another
group (e.g. comparison group) on the measurable outcome variables used in the evaluation.
3. Standard deviation: a standardized measure of the variation of a sample population from its
mean on a given characteristic/outcome. Mathematically, the square root of the variance.
4. Standardized effect size: a standardized measure of the [expected] magnitude of the effect of a
program.
5. Cluster: the level of observation at which a sample size is measured. Generally, observations
which are highly correlated with each other should be clustered and the sample size should be
measured at this clustered level.
6. Intra‐cluster correlation coefficient: a measure of the correlation between observations within a
cluster; i.e. the level of correlation in drinking water source for individuals in a household.
Clusterrandomizedtrials
TheExtraTeacherProgram(ETP)casestudydiscussedtheconceptofclusterrandomizedtrials.Itcouldbethatouroutcomeofinterestiscorrelatedforstudentsinthesameclassroom,forreasonsthathavenothingtodowiththeextrateacher.Forexample,allthestudentsinaclassroomwillbeaffectedbytheiroriginalteacher,bywhethertheirclassroomisunusuallydark,oriftheyhaveachalkboard;these factorsmeanthatwhenonestudent intheclassdoesparticularlywell for thisreason,allthestudentsinthatclassroomprobablyalsodobetter—whichmighthavenothingtodowithanextrayear.
Therefore, if we sample 100 kids from 10 randomly selected schools, that sample is lessrepresentativeofthepopulationofschoolsinthecitythanifweselected100randomkidsfromthewhole population of schools, and therefore absorbs less variance. In effect, we have a smallersamplesizethanwethink.Thiswillleadtomorenoiseinoursample,andhencealargerstandarderrorthanintheusualcaseofindependentsampling.Whenplanningboththesamplesizeandthebestwaytosampleclassrooms,weneedtotakethisintoaccount.
Thisexercisewillhelpyouunderstandhowtodothat.Shouldyousampleeverystudentinjustafewschools?Shouldyousampleafewstudentsfrommanyschools?Howdoyoudecide?
Wewillwork throughthesequestionsbydetermining thesamplesize thatallowsus todetectaspecific effectwith at least 80%power. Remember that power is the likelihood thatwhen the
J‐PALExecutiveEducationCourse
53
treatmenthasaneffectyouwillbeabletodistinguishitfromzeroinyoursample.
In this example, “clusters” refer to “clusters of children”—in other words, “classrooms” or“schools”. This exercise shows you how the power of your sample changeswith the number ofclusters, the size of the clusters, the size of the treatment effect and the Intraclass CorrelationCoefficient.WewilluseasoftwareprogramcalledOptimalDesigndevelopedbySteveRaudebushwithfundingfromtheWilliamT.GrantFoundation.Youcanfindadditionalresourcesonclustereddesignsontheirwebsite.
Section1:UsingtheODSoftwareFirstdownloadtheODsoftwarefromthewebsite(asoftwaremanualisalsoavailable): http://sitemaker.umich.edu/group‐based/optimal_design_softwareNotesonOptimalDesign
Menu PersonalLevelRandomization(students) Clusterwithindividualoutcomes(schoolsrandomized,lookatindividualtest
scores) Clusterwithclusteroutcomes(schoolrandomized,schoolleveloutcomes)
Clusterwithindividualoutcomes ClusterRandomizedTrialassignclusters(schools,districts,etc)toCandT BlockedTrialstratification:ex.Stratifyschoolsintosimilarpairsandrandomize
whichareTandC ClusterRandomizedTrial,TreatmentatLevel2
Explain:fivevariables:rho,numberofclusters,power,clustersize,andR2…setthreefixed(oratdifferentlevelsfordifferentlines)togetrelationshipbetweentheothertwoonaxis.
Whenyouopen it,youwillseeascreenwhichlooks liketheonebelow. Select themenuoption“Design”toseetheprimarymenu.Selecttheoption“ClusterRandomizedTrialswithperson‐leveloutcomes,”“ClusterRandomizedTrials,”andthen“Treatmentatlevel2.”You’llseeseveraloptionstogenerategraphs;choose“Powervs.Totalnumberofclusters(J).”
J‐PALExecutiveEducationCourse
54
Anewwindowwillappear:
Selectα(alpha).You’llseeitisalreadysetto0.050fora95%significancelevel.Firstlet’sassumewewanttotestonly40studentsperschool.Howmanyschoolsdoyouneedtogotoinordertohaveastatisticallysignificantanswer?Click onn, which represents the number of students per school. Since we are testing only 40studentsperschool,fillinn(1)with40andclickOK.Now we have to determine δ (delta), the standard effect size (the effect size divided by thestandard deviation of the variable of interest). Assumewe are interested in detectingwhetherthereisanincreaseof10%intestscores.(Ormoreaccurately,areuninterestedinanincreaseofless than10%.)Ourbaseline survey indicated that the average test score is26,with a standarddeviationof20.Wewanttodetectaneffectsizeof10%of26,whichis2.6.Wedivide2.6bythestandarddeviationtogetδequalto2.6/20,or0.13.Selectδfromthemenu.Inthedialogueboxthatappearsthereisaprefilledvalueof0.20fordelta(1).Changethevalueto0.13,andchangethevalueofdelta(2)toempty.SelectOK.Finallyweneedtochooseρ(rho),whichistheintra‐clustercorrelation.ρtellsushowstronglytheoutcomesarecorrelatedforunitswithinthesamecluster.Ifstudentsfromthesameschoolwereclones (novariation) andall scored the sameon the test, thenρwould equal1. If, on theotherhand, students from the same schools are in fact independent—and there were no differencesbetweenschools‐thenρwouldequal0.Youhavedeterminedinyourpilotstudythatρis0.17.Fillinrho(1)to0.17,andsetrho(2)tobeempty.Youshouldseeagraphsimilartotheonebelow.
J‐PALExecutiveEducationCourse
55
You’llnoticethatyourxaxisisn’tlongenoughtoallowyoutoseewhatnumberofclusterswould
giveyou80%power. Clickonthe buttontosetyourxaxismaximumto400.Thenyoucanclickonthegraphwithyourmousetoseetheexactpowerandnumberofclustersforaparticularpoint.
Exercise11. Howmanyschoolsareneededtoachieve80%power?90%power?
J‐PALExecutiveEducationCourse
56
Nowyouhaveseenhowmanyclustersyouneedfor80%powerwhensampling40studentsperschool. Suppose instead that you only have the ability to go to 124 schools (this is the actualnumberthatwassampledintheBalsakhiprogram).Exercise2
1. Howmanychildrenareneededperschooltoachieve80%power?90%power?2. Choosedifferentvaluesforntoseehowyourgraphchanges.
Finally,let’sseehowtheIntraclassCorrelationCoefficient(ρ)changesthepowerofagivensample.Leaverho(1)tobe0.17butforcomparisonchangerho(2)to0.0.Youshouldseeagraphliketheonebelow.Thesolidbluecurveistheonewiththeparametersyou’veset.Thebluedashedcurveisthereforcomparison–toseehowmuchpoweryouwouldgetfromyoursampleifρwerezero.Lookcarefullyatthegraph.Exercise3
1. Howdoesthepowerofthesamplechangewiththeintraclasscorrelationcoefficient(ρ)?Totakealookatsomeoftheothermenuoptions,closethegraphbyclickingonthe inthetoprighthandcorneroftheinnerwindow.SelecttheClusterRandomizedTrialmenuagain.
J‐PALExecutiveEducationCourse
57
Exercise4
1. Try generating graphs for how power changes with the cluster size (n), intra‐classcorrelation(rho)andeffectsize(delta).Youwillhavetore‐enteryourpre‐testparameterseachtimeyouopenanewgraph.
J‐PALExecutiveEducationCourse
58
CASESTUDY4:DEWORMINGINKENYA
ThiscasestudyisbasedonEdwardMiguelandMichaelKremer,“Worms:IdentifyingImpactsonEducationandHealthinthePresenceofTreatment
Externalities,”Econometrica72(1):159‐217,2004
J‐PALthankstheauthorsforallowingustousetheirpaper
Case 4: Deworming in KenyaManaging threats to experimental integrity
Case4:DeworminginKenyaManagingthreatstoexperimentalintegrity
J‐PALExecutiveEducationCourse
59
KeyVocabulary1. Phase‐in Design: a study design in which groups are individually phased into treatment over a
period of time; groups which are scheduled to receive treatment later act as the comparison groups
in earlier rounds.
2. Equivalence: groups are identical on all baseline characteristics, both observable and
unobservable. Ensured by randomization.
3. Attrition: the process of individuals joining in or dropping out of either the treatment or
comparison group over the course of the study.
4. Attrition Bias: statistical bias which occurs when individuals systematically join in or drop out of
either the treatment or the comparison group for reasons related to the treatment or outcomes.
5. Partial Compliance: individuals do not comply with their assignment (to treatment or
comparison). Also termed "diffusion" or "contamination."
6. Intention to Treat: the measured impact of a program that includes all data from participants in
the groups to which they were randomized, regardless of whether they actually received the
treatment. Intention‐to‐treat analysis prevents bias caused by the loss of participants, which may
disrupt the baseline equivalence established by randomization and which may reflect non‐
adherence to the protocol.
7. Treatment on the Treated: the measured impact of a program that includes only the data for
participants who actually received the treatment.
8. Externality: an indirect cost or benefit incurred by individuals who did not directly receive the treatment. Also termed "spillover."
Worms‐Acommonproblemwithacheapsolution
Worminfectionsaccountforover40percentoftheglobaltropicaldiseaseburden.Infectionsarecommon in areas with poor sanitation. More than 2 billion people are affected. Children, stilllearning good sanitary habits, are particularly vulnerable: 400 million school‐age children arechronicallyinfectedwithintestinalworms.Thesymptomsassociatedwithworminfectionsincludelistlessness,diarrhea,abdominalpain,andanemia.Beyondtheireffectsonhealthandnutrition,heavyworminfectionscanimpairchildren’sphysicalandmentaldevelopmentandreducetheirattendanceandperformanceinschool.Poorsanitationandpersonalhygienehabitsfacilitatetransmission.Infectedpeopleexcretewormeggsintheirfecesandurine.Inareaswithpoorsanitation,theeggscontaminatethesoilorwater.Otherpeopleareinfectedwhentheyingestcontaminatedfoodorsoil(hookworm,whipworm,androundworm),orwhenhatchedwormlarvaepenetratetheirskinuponcontactwithcontaminatedsoil (hookworm) or fresh water (schistosome). School‐age children are more likely to spreadwormsbecause theyhaveriskierhygienepractices (more likely toswim incontaminatedwater,
J‐PALExecutiveEducationCourse
60
morelikelytonotusethelatrine,lesslikelytowashhandsbeforeeating).Sotreatingachildnotonly reduces her ownworm load; itmay also reduce disease transmission—and so benefit thecommunityatlarge.Treatmentkillswormsinthebody,butdoesnotpreventre‐infection.Oralmedicationthatcankill99percentofwormsinthebodyisavailable:albendazoleormebendazolefortreatinghookworm,roundworm,andwhipworminfections;andpraziquantelfortreatingschistosomiasis.Thesedrugsarecheapandsafe.Adoseofalbendazoleormebendazolecostslessthan3UScentswhileonedoseofpraziquantelcostslessthan20UScents.Thedrugshaveveryfewandminorsideeffects.Wormscolonizetheintestinesandtheurinarytract,buttheydonotreproduceinthebody;theirnumbers build up only through repeated contact with contaminated soil or water. The WHOrecommendspresumptive school‐basedmassdeworming in areaswithhighprevalence. Schoolswithhookworm,whipworm,androundwormprevalenceover50percentshouldbemasstreatedwith albendazole every 6months, and schoolswith schistosomiasis prevalence over 30 percentshouldbemasstreatedwithpraziquantelonceayear.
PrimarySchoolDewormingProgram
International Child Support Africa (ICS) implemented the Primary School Deworming Program(PSDP)intheBusiaDistrictinwesternKenya,adensely‐settledregionwithhighwormprevalence.TreatmentfollowedWHOguidelines.ThemedicinewasadministeredbypublichealthnursesfromtheMinistryofHealthinthepresenceofhealthofficersfromICS.ThePSDPwasexpectedtoaffecthealth,nutrition,andeducation.Tomeasureimpact,ICScollecteddata on a series of outcomes: prevalence of worm infection, worm loads (severity of worminfection);self‐reportedillness;andschoolparticipationratesandtestscores.
Evaluationdesign—theexperimentasplanned
Because of administrative and financial constraints the PSDP could not be implemented in allschools immediately. Instead, the75schoolswererandomlydivided into3groupsof25schoolsandphased‐inover3years.Group1schoolsweretreatedstartinginboth1998and1999,Group2schoolsin1999,andGroup3startingin2001.Group1schoolswerethetreatmentgroupin1998,whileschoolsGroup2andGroup3were thecomparison. In1999Group1andGroup2schoolswerethetreatmentandGroup3schoolsthecomparison.
1998 1999 2001Group1 Treatment Treatment Treatment
Group2 Comparison Treatment Treatment
Group3 Comparison Comparison Treatment
J‐PALExecutiveEducationCourse
61
ThreatstointegrityoftheplannedexperimentDiscussionTopic1:Threatstoexperimentalintegrity
Randomizationensuresthatthegroupsareequivalent,andthereforecomparable,atthebeginningof the program. The impact is then estimated as the difference in the average outcome of thetreatment group and the average outcome of the comparison group, both at the end of theprogram.Tobeabletosaythattheprogramcausedtheimpact,youneedtobeabletosaythattheprogramwastheonlydifferencebetweenthetreatmentandcomparisongroupsoverthecourseoftheevaluation.
1. Whatdoesitmeantosaythatthegroupsareequivalentatthestartoftheprogram?2. Canyoucheckifthegroupsareequivalentatthebeginningoftheprogram?How?3. Otherthantheprogram’sdirectandindirectimpacts,whatcanhappenoverthecourseofthe
evaluation(afterconductingtherandomassignment)tomakethegroupsnon‐equivalent?4. Howdoesnon‐equivalenceattheendthreatentheintegrityoftheexperiment?
Managingattrition‐Whenthegroupsdonotremainequivalent
Attritioniswhenpeoplejoinordropoutofthesample—bothtreatmentandcomparisongroups—overthecourseof theexperiment.Onecommonexample inclinical trials iswhenpeopledie;socommonindeedthatattritionissometimescalledexperimentalmortality.DiscussionTopic2:Managingattrition
Youarelookingatthehealtheffectsofdeworming.Inparticularyouarelookingatthewormload(severityofworminfection).Wormloadsarescaledasfollows:
‐ Heavyworminfections=scoreof3‐ Mediumworminfections=scoreof2‐ Lightinfections=scoreof1
Thereare30,000children:15,000 in treatmentschoolsand15,000 incomparisonschools.Afteryourandomize,thetreatmentandcomparisongroupsareequivalent,meaningchildrenfromeachofthethreecategoriesareequallyrepresentedinbothgroups.Supposeprotocolcomplianceis100percent:allchildrenwhoareinthetreatmentgettreatedandnoneofthechildreninthecomparisonaretreated.Childrenthatweredewormedatthebeginningoftheschoolyear(thatis,childreninthetreatmentgroup)endupwithawormloadof1attheendoftheyearbecauseofre‐infection.Childrenwhohaveawormloadof3onlyattendhalfthetimeanddropoutofschooliftheyarenottreated.Thenumberofchildrenineachworm‐loadcategoryisshownforboththepretestandposttest.
J‐PALExecutiveEducationCourse
62
Pretest PosttestWormLoad Treatment Comparison Treatment Comparison3 5,000 5,000 0 Droppedout2 5,000 5,000 0 5,0001 5,000 5,000 15,000 5,000Total childrentestedatschool 15,000 15,000 15,000 10,000
1. a) Atposttest,whatistheaveragewormloadforthetreatmentgroup
b) Atposttest,whatistheaveragewormloadforthecomparisongroup?c) Whatisthedifference?d) Isthisoutcomedifferenceanaccurateestimateoftheimpactoftheprogram?Whyor
whynot?e) Ifitisnotaccurate,doesitoverestimateorunderestimatetheimpact?f) Howcanwegetabetterestimateoftheprogram’simpact?
2. Besides worm load, the PSDP also looked at outcomemeasures such as school attendanceratesandtestscores.
a) Would differential attrition (i.e. differences in drop‐outs between treatment andcomparisongroups)biaseitheroftheseoutcomes?How?
b) Would the impacts on these final outcome measures be underestimated oroverestimated?
3. In Case 1, you learned about othermethods to estimate program impact, such as pre‐post,
simpledifference,differencesindifferences,andmultivariateregression.a) Doesthethreatofattritiononlypresentitselfinrandomizedevaluations?
Managingpartialcompliance‐whenthetreatmentgroupdoesnotgettreatedorthecomparisongroupdoesgettreatedSomepeopleassignedtothetreatmentmayintheendnotactuallygettreated.Inanafter‐schooltutoringprogram,forexample,somechildrenassignedtoreceivetutoringmaysimplynotshowupfortutoring.Andtheothersassignedtothecomparisonmayobtainaccesstothetreatment,eitherfrom the program or from another provider. Or comparison group childrenmay get extra helpfromtheteachersoracquireprogrammaterialsandmethodsfromtheirclassmates.Inanyofthesescenarios, people are not complying with their assignment in the planned experiment. This iscalled “partial compliance” or “diffusion” or, less benignly, “contamination.” In contrast tocarefully‐controlledlabexperiments,diffusionisubiquitousinsocialprograms.Afterall,lifegoeson,peoplewillbepeople,andyouhavenocontroloverwhattheydecidetodooverthecourseoftheexperiment.Allyoucandoisplanyourexperimentandofferthemtreatments.How,then,canyoudealwiththecomplicationsthatarisefrompartialcompliance?
J‐PALExecutiveEducationCourse
63
DiscussionTopic3:Managingpartialcompliance
Supposenoneofthechildrenfromthepoorestfamilieshaveshoesandsotheyhavewormloadsof3.Thoughtheirparentshadnotpaidtheschool fees, thechildrenwereallowedtostay inschoolduringtheyear.Parentalconsentwasrequiredfortreatment,andtogiveconsent,theparentshadtocometotheschoolandsignaconsentformintheheadmaster’soffice.However,becausetheyhadnotpaidschoolfees,thepoorestparentswerereluctanttocometotheschool.Consequently,none of the children with worm loads of 3 were actually dewormed. Their worm load scoresremained3attheendoftheyear.Nooneassignedtocomparisonwastreated.Allthechildreninthesampleatthebeginningoftheyearwerefollowedup,ifnotatschoolthenathome. Pretest PosttestWormLoad Treatment Comparison Treatment Comparison3 5,000 5,000 5000 5,0002 5,000 5,000 0 5,0001 5,000 5,000 10,000 5,000Total childrentestedatschool
15,000 15,000 15,000 15,000
1. a) Calculatetheimpactestimatebasedontheoriginalgroupassignments.
b) Thisisanunbiasedmeasureoftheeffectoftheprogram,butinwhatwaysisitusefulandinwhatwaysisitnotasuseful?
2. Youareinterestedinlearningtheeffectoftreatmentonthoseactuallytreated(“treatmenton
thetreated”orToTestimate).Fiveofyourcolleaguesarepassingbyyourdesk;theyallagreethatyoushouldcalculatetheeffectofthetreatmentusingonlythe10,000childrenwhoweretreated.
a) Isthisadvicesound?Whyorwhynot?
3. Anothercolleaguesaysthatit’snotagoodideatodroptheuntreatedentirely;youshouldusethembutconsiderthemaspartofthecomparison.
a) Isthisadvicesound?Whyorwhynot?
4. Another colleague suggests that you use the compliance rates, the proportion of people ineachgroupthatdidordidnotcomplywiththeirtreatmentassignment.Youshoulddividethe“intentiontotreat”estimatebythedifferenceinthetreatmentratios(i.e.proportionsofeachexperimentalgroupthatreceivedthetreatment).
a) Isthisadvicesound?Whyorwhynot?
J‐PALExecutiveEducationCourse
64
Managingspillovers—whenthecomparison,itselfuntreated,benefitsfromthetreatmentgroupbeingtreated
People assigned to the control groupmaybenefit indirectly from those receiving treatment. Forexample, aprogramthatdistributes insecticide‐treatednetsmayreducemalaria transmission inthecommunity,indirectlybenefitingthosewhothemselvesdonotsleepunderanet.Sucheffectsarecalledexternalitiesorspillovers.DiscussionTopic4:Managingspillovers
In thedewormingprogram, randomizationwasat the school level.However,while all boysat agiventreatmentschoolweretreated,onlygirlsyoungerthanthirteenreceivedthedewormingpill.ThiswasduetothefactthattheWorldHealthOrganization(WHO)hadnottested(andthusnotyetapproved)thedewormingpillforpregnantwomen.Becauseitwasdifficulttodeterminewhichgirlswereatriskofgettingpregnant,theprogramdecidedtonotadministerthemedicationtoanygirlthirteenorolder.(Postscript:sincethedewormingevaluationwasimplemented,theWHOhasapprovedthedewormingmedicationforpregnantwomen).Thusatagiventreatmentschool,therewasadistinctgroupofstudentsthatwasnevertreatedbutlivedinverycloseproximitytoagroupthatwastreated.Supposeprotocolcomplianceis100percent:allboysandgirlsunderthirteenintreatmentschoolsgettreatedandallgirlsthirteenandoverintreatmentschoolsaswellasallchildrenincomparisonschoolsdonotgettreated.Youcanassumethatduetoproperrandomization,thedistributionofwormloadacrossthethreegroupsofstudentsisequivalentbetweentreatmentandcontrolschoolspriortotheintervention.
PosttestTreatment Comparison
WormLoad Allboys Girls<13yrs Girls≥13yrs Allboys Girls<13yrs Girls≥13yrs
3 0 0 0 5000 2000 20002 0 0 2000 5000 3000 30001 10000 5000 3000 0 0 0
Totalchildrentestedatschool
20000 20000
1. a) Ifthereareanyspillovers,wherewouldyouexpectthemtoshowup?
b) Isitpossibleforyoutocapturethesepotentialspillovereffects?How?
2. a) Whatisthetreatmenteffectforboysintreatmentv.comparisonschools?b) Whatisthetreatmenteffectforgirlsunderthirteenintreatmentv.comparisonschools?c) Whatisthedirecttreatmenteffectamongthosewhoweretreated?d) What is the treatment effect for girls thirteen and older in treatment v. comparison
schools?
J‐PALExecutiveEducationCourse
65
e) Whatistheindirecttreatmenteffectduetospillovers?f) Whatisthetotalprogrameffect?
References:Crompton, D.W.T. 1999. “How Much Helminthiasis Is There in the World?” Journal ofParasitology85:397–403.Kremer,MichaelandEdwardMiguel.2007.“TheIllusionofSustainability,”QuarterlyJournalofEconomics122(3)Miguel, Edward, and Michael Kremer. 2004. “Worms: Identifying Impacts on Education andHealthinthePresenceofTreatmentExternalities,”Econometrica72(1):159‐217.Shadish,William R, Thomas D. Cook, and Donald T. Campbell. 2002.Experimental andQuasi‐ExperimentalDesignsforGeneralizedCausalInference.Boston,MA:HoughtonMifflinCompanyWorld Bank. 2003. “School Deworming at a Glance,” Public Health at a Glance Series.http://www.worldbank.org/hnpWHO.1999.“TheWorldHealthReport1999,”WorldHealthOrganization,Geneva.WHO.2004.“ActionAgainstWorms”PartnersforParasiteControlNewsletter,Issue#1,January2004,www.who.int/wormcontrol/en/action_against_worms.pdf
J‐PALExecutiveEducationCourse
66
GroupPresentation
Participants will form 4‐6 person groups which will work through the design process for arandomisedevaluationofadevelopmentproject.Groupswillbeaidedinthisprojectbyboththefacultyandteachingassistants,withtheworkculminatinginpresentationsattheendoftheweek.
Thegoalofthegrouppresentationistoconsolidateandapplytheknowledgeofthelecturesandtherebyensurethatparticipantsleavewiththeknowledge,experience,andconfidencenecessarytoconducttheirownrandomisedevaluations.Weencouragegroupstoworkonprojectsthatarerelevanttoparticipants’organisations.
All groups will present on Friday. The 15‐minute presentation is followed by a 15‐minutediscussion led by J‐PAL affiliates and staff. We provide groups with template slides for theirpresentation(seenextpage).Whilethegroupsdonotneedtofollowthisexactly,thepresentationshouldhavenomore than9slides (including title slide,excludingappendix)andshould includethefollowingtopics:
Briefprojectbackground Theoryofchange Evaluationquestion Outcomes Evaluationdesign Dataandsamplesize Potentialvaliditythreatsandhowtomanagethem Disseminationstrategyofresults
Pleasetimeyourselfanddonotexceedtheallottedtime.Wehaveonlyalimitedamountoftimeforthesepresentationsandfollowastricttimelinetobefairtoallgroups.
J‐PALExecutiveEducationCourse
67
GroupPresentationTemplate
J‐PALExecutiveEducationCourse
68
J‐PALExecutiveEducationCourse
69