j pal courseexecutivesocialcourse 16participants 19 ...€pal lecturers .. 9 list of participants...

1

J‐PAL

ExecutiveEducationCourseinEvaluatingSocialProgrammesCourseMaterialParticipants Accra16–19May2012

3

TableofContents

Program ……………………………………………………………………………………………………. 5

CourseObjectives …………………………………………………………………………………….. 7

J‐PALLecturers ……………………………………………………………………………………….. 9

ListofParticipants …………………………………………………………………………………… 11

GroupAssignment …………………………………………………………………………………… 13

Vocabulary ……………………………………………………………………………………………… 14

CaseStudy1 ……………………………………………………………………………………………… 16

CaseStudy2 ……………………………………………………………………………………………… 22

CaseStudy3 ……………………………………………………………………………………………… 32

ExerciseA ……………………………………………………………………………………………… 36

ExerciseB ……………………………………………………………………………………………… 37

PrimeronPowerCalculation ………………………………………………………………….. 45

ExerciseC …………………………………………………………………………………………………. 51

CaseStudy4 ……………………………………………………………………………………………… 57

GroupPresentation …………………………………………………………………………………. 65

5

PROGRAMMEExecutiveEducationCourseinEvaluatingSocialProgrammes,16–19May2012

Accra,Ghana

Wednesday

16May2012

Thursday

17May2012

Friday

18May2012

Saturday

19May2012

8:30 –10:00

WelcomingRemarks

Lecture1:WhatisEvaluation

RebeccaThornton(UniversityofMichigan)

Lecture3:WhyRandomise

KehindeAjayi(BostonUniversity)

Lecture5:SamplingandSampleSize

PaulGlewwe(UniversityofMinnesota)

Lecture7:ProjectfromStarttoFinish

KehindeAjayi(BostonUniversity)

10:30 –12:30

GroupWorkSession1

IntroductiontogroupmembersGroupworkoncasestudy1:SchoolMonitoringReformin

MadagascarDecisionongroupproject(45min)

GroupWorkSession3

Groupworkoncasestudy3:ExtraTeacherProgramme

(60min)

Groupworkonpresentation(60min)

GroupWorkSession5

GroupexerciseConsamplesizeestimation(60min)

Groupworkoncasestudy4:EvaluationDesign(60min)

GroupWorkSession7

Groupworktofinalisepresentations

Lunch Lunch Lunch Lunch

13:30 –15:00

Lecture2:MeasuringImpacts

IsaacMbiti(SouthernMethodistUniversity)

Lecture4:HowtoRandomise

PaulGlewwe(UniversityofMinnesota)

Lecture6:ThreatsandAnalysis

TBD

Grouppresentations(eachgroup:

15minpresentation,15mindiscussion)

15:30 –17:00

GroupWorkSession2

Groupworkoncasestudy2:LearntoRead(45min)


GroupWorkSession4

GroupexerciseAonrandomsampling(30min)andexerciseBonmechanismsofrandomisation

(30min)StatsPrimer(30min)

GroupWorkSession6


J-PAL Executive Education Course in Evaluating Social Programmes

7

CourseObjectives

Our executive trainingprogramme is designed forpeople froma variety of backgrounds:managersand researchers from international development organisations, foundations, governments and non‐governmentalorganisationsfromaroundtheworld,aswellastrainedeconomistslookingtoretool.

The course is a full‐timecourse. It is important forparticipants toattendall lecturesandgroupworkinordertosuccessfullycompletethecourseandreceivethecertificateofcompletion.

CourseCoverage

Thefollowingkeyquestionsandconceptswillbecovered:

Whyandwhenisarigorousevaluationofsocialimpactneeded? Thecommonpitfallsofevaluations,andhowrandomisationcanhelp. Thekeycomponentsofagoodrandomisedevaluationdesign. Alternativetechniquesforincorporatingrandomisationintoprojectdesign. Howdoyoudeterminetheappropriatesamplesize, identifyoutcomemeasures,andmanage

data? Guardingagainstthreatsthatmayunderminetheintegrityoftheresults. Techniquesfortheanalysisandinterpretationofresults. Howtomaximisepolicyimpactandtestexternalvalidity.

Theprogrammewillachievethesegoalsthroughadiversesetofintegratedteachingmethods.Expertresearchers will provide both theoretical and example‐based classes complemented by workshopswhereparticipantscanapplykeyconceptstorealworldexamples.Byexaminingbothsuccessfulandproblematicevaluations,participantswillbetterunderstandthesignificanceofvariousspecificdetailsof randomisedevaluations.Furthermore, theprogrammewillofferextensiveopportunities toapplythese ideas ensuring that participants will leave with the knowledge, experience, and confidencenecessarytoconducttheirownrandomizedevaluations.

J-PAL Executive Education Course

9

J‐PALLecturersKehindeAjayiAssistantProfessorBostonUniversity Kehinde Ajayi is an Assistant Professor at Boston University. Herresearch interests are in the areas of economic development and theeconomics of education, with a particular focus on school choice,educational mobility, and the effects of school quality on students'academic performance. She is currently conducting a set of studiesrelatingtosecondaryeducationinGhana. PaulGlewweProfessorofAppliedEconomicsUniversityofUniversityofMinnesota Paul Glewwe is a Professor of Applied Economics at the University ofMinnesotaandthecurrentDirectoroftheCenterforInternationalFoodand Agricultural Policy. His interests are economics of education,poverty and inequality in developing countries, and appliedeconometrics.HisrecentpublicationshaveappearedintheHandbookofthe Economics of Education, Economic Development and CulturalChange, Journal of Development Economics, Journal of EconomicLiterature, Journal of Human Resources, Journal of Public EconomicsandWorldBankEconomicReview.IsaacMbitiAssistantProfessorSouthernMethodistUniversityIsaacMbitiisanassistantprofessorintheDepartmentofEconomicsofSouthernMethodistUniversity.Hisresearchinterestsareineconomicdevelopment,laboureconomicsanddemography.

J-PAL Executive Education Course J-PAL Lecturers

10

RebeccaThorntonAssistantProfessorofEconomicsUniversityofMichiganRebeccaThorntonbeganherappointmentasanassistantProfessorattheUniversity of Michigan economics department in 2008. Her researchfocuses on education and health as well as how individuals respond tofinancial incentives in these areas. She has worked on a randomizedevaluationofamerit‐basedscholarshipinKenya.Sheisalsoworkingonrandomized evaluations examining HIV testing and prevention andmenstruationandeducationinNepal.


11

J‐PALExecutiveEducationCourse

12

ListofParticipants

# LastName FirstName Organization Country

1 Adamu‐Issah Madeez UNICEF Ghana

2 Adeola Aminat FATEFoundation Nigeria

3 AmaniampongKwabenaAdu

CouncilforTechnicalandVocationalEducationandTraining Ghana

4 Asiegbor Issac AssessmentUnit,CRDD Ghana

5 Banashek Sarah USAID Ghana

6 Brikorang Fred GES,BasicEducation Ghana

7 Brown Victoria MangoTree Uganda

8 Cohen Lee USAID Mali

9 Coulibaly Lazare InstituteforPopularEducation Mali

10 Ekuri EmmanuelAfricanPopulationandHealthResearchCentre Kenya

11 Hounkpodote HilaireLaboratoryforResearchonSocialTransformations Senegal

12 Ilomu Jessica USAID Uganda

13 Imoka Chizoba UnveilingAfricaFoundation Nigeria

14 Ka Awa ARED Senegal

15 Kabaka Stewart MoPHS Kenya

16 Kalanda McKnight MinistryofEducation,ScienceandTechnology Malawi

17 Korr Jacob GES,CurriculumandResearchDevelopment Ghana

18 Kotin Veronica EMIS Ghana

19 Kwame Agyeapong MinistryofEducation Ghana

20 Lamptey Elliot MinistryofEducation Ghana

21 Lowe Zev Worldreader Spain

22 Maiga Eugenie AfricanCentreforEconomicTransformation Ghana

23 Mugerwa Fred OfficeofthePrimeMinister Uganda

24Mukonka Remmy MinistryofEducation,ScienceandVocational

TrainingZambia

25 Muvunyi Emmanuel EducationBoard Rwanda26 Oduro Evelyn GES,TeacherEducation Ghana

27 Ogle Muktar NationalAssessmentCentre Kenya

28 Oldmeadow Emily DFID Nigeria

29 Otieno Mary CentreforUniversalEducationatBrookings USA

30 Otim Daniel MangoTree Uganda

31 Otoo Ernest MinistryofEducation Ghana

32 Pealore Dominic MinistryofEducation Ghana

33 Perez Marisol USAID Ghana

34 Rotich Leah MinistryofEducation Kenya

35 Ruto Sara UwezoEastAfrica Kenya

36 SalieuKamara Mohamed MinistryofEducation Sierra


13

Leone

37 Sarpong Anthony GES,Mathematicscurriculum Ghana

38 Thompson SamuelCouncilforTechnicalandVocationalEducationandTraining Ghana

39 Traore Bréhima OeuvreMalienned'Aideàl'EnfanceduSahel Mali

40 Umubyeyi Marcienne Selfemployed Rwanda

41 Uneze EberechukwuCentrefortheStudiesoftheEconomiesofAfrica Nigeria

42 Weisenhorn Nina IRC Congo


14

GroupAssignment

Group1TA:CarolinaCorral

Group2TA:ClareHofmeyr

1 AwaKa 1 EberechukwuUneze

2 BréhimaTraore 2 EmmanuelEkuri

3 EmmanuelMuvunyi 3 EugenieMaiga

4 HilaireHounkpodote 4 FredMugerwa

5 LazareCoulibaly 5 MaryOtieno

6 MarcienneUmubyeyi 6 StewartKabakaGroup3TA:ElizabethSchultz

Group4TA:AmaraKallon

1 EmilyOldmeadow 1 AminatAdeola

2 JessicaIlomu 2 AnthonySarpong

3 LeeCohen 3 FredBrikorang

4 MarisolPerez 4 IssacAsiegbor

5 NinaWeisenhorn 5 SamuelThompson

6 SarahBanashek 6 MohamedSalieuKamaraGroup5TA:CaitlinTulloch

Group6TA:MichaelPolansky

1 DominicPealore 1 ChizobaImoka

2 ErnestOtoo 2 JacobKorr

3 KwabenaAduAmaniampong 3 EvelynOduro

4 MadeezAdamu‐Issah 4 McKnightKalanda

5 VeronicaKotin 5 MuktarOgle

6 ZevLowe 6 RemmyMukonkaGroup7TA:PacePhillips

1 AgyeapongKwame

2 DanielOtim

3 ElliotLamptey

4 LeahRotich

5 SaraRuto

6 VictoriaBrown


15

Vocabulary

1. Attrition: the process of individuals joining in or dropping out of either the treatment orcomparisongroupoverthecourseofthestudy.

2. AttritionBias:statisticalbiaswhichoccurswhenindividualssystematicallyjoininordropoutof either the treatment or the comparison group for reasons related to the treatment oroutcomes.

3. Baseline:datadescribingthecharacteristicsofparticipantsmeasuredacrossbothtreatment

andcomparisongroupspriortoimplementationofintervention.

4. Cluster:thelevelofobservationatwhichasamplesizeismeasured.Generally,observationswhicharehighlycorrelatedwitheachothershouldbeclusteredandthesamplesizeshouldbemeasuredatthisclusteredlevel.

5. ComparisonGroup: in an experimental design, a randomly assigned group from the same

populationas the treatmentgroupthatdoesnotreceive the intervention.Participants in thecomparisongroupareusedasastandardforcomparisonagainstthetreatedsubjectsinordertovalidatetheresultsoftheintervention.

6. Counterfactual: whatwould have happened to the participants in a program had they notreceivedtheintervention.Thecounterfactualcannotbeobservedfromthetreatmentgroup,itcanonlybeinferredfromthecomparisongroup.

7. Endline: datadescribing the characteristics of participantsmeasuredacrossboth treatment

andcomparisongroupsafterimplementationofintervention.

8. Equivalence: groups are identical on all baseline characteristics, both observable andunobservable.Ensuredbyrandomization.

9. Externality:anindirectcostorbenefitincurredbyindividualswhodidnotdirectlyreceivethe

treatment.Alsotermed"spillover."

10. Hypothesis:aproposedexplanationofandfortheeffectsofagivenintervention.Hypothesesareintendedtobemadeex‐ante,orpriortotheimplementationoftheintervention.

11. IntentiontoTreat:themeasuredimpactofaprogramthatincludesalldatafromparticipants

inthegroupstowhichtheywererandomized,regardlessofwhethertheyactuallyreceivedthetreatment.Intention‐to‐treatanalysispreventsbiascausedbythelossofparticipants,whichmaydisruptthebaselineequivalenceestablishedbyrandomizationandwhichmayreflectnon‐adherencetotheprotocol.

12. Intra‐clustercorrelationcoefficient:ameasureofthecorrelationbetweenobservations

withinacluster;i.e.thelevelofcorrelationindrinkingwatersourceforindividualsinahousehold.

13. LevelofRandomization:thelevelofobservation(ex.individual,household,school,village)at

whichtreatmentandcomparisongroupsarerandomlyassigned.


16

14. PartialCompliance:individualsdonotcomplywiththeirassignment(totreatmentorcomparison).Alsotermed"diffusion"or"contamination."

15. Phase‐inDesign:astudydesigninwhichgroupsareindividuallyphasedintotreatmentover

aperiodoftime;groupswhicharescheduledtoreceivetreatmentlateractasthecomparisongroupsinearlierrounds.

16. Power:thelikelihoodthat,whentheprogramhasaneffect,onewillbeabletodistinguishthe

effectfromzerogiventhesamplesize.

17. ProgrammeImpact:estimatedbymeasuringthedifferenceinoutcomesbetweencomparisonandtreatmentgroups.Thetrueimpactoftheprogramisthedifferenceinoutcomesbetweenthetreatmentgroupanditscounterfactual.

18. SelectionBias:statisticalbiasbetweencomparisonandtreatmentgroupsinwhichindividuals

inonegrouparesystematicallydifferentfromthoseintheother.Thesecanoccurwhenthetreatmentandcomparisongroupsarechoseninanon‐randomfashionsothattheydifferfromeachotherbyoneormorefactorsthatmayaffecttheoutcomeofthestudy.

19. Significance:thelikelihoodthatthemeasuredeffectdidnotoccurbychance.Statisticaltests

areperformedtodeterminewhetheronegroup(e.g.theexperimentalgroup)isdifferentfromanothergroup(e.g.comparisongroup)onthemeasurableoutcomevariablesusedintheevaluation.

20. StandardDeviation:astandardizedmeasureofthevariationofasamplepopulationfromits

meanonagivencharacteristic/outcome.Mathematically,thesquarerootofthevariance.

21. StandardizedEffectSize:astandardizedmeasureofthe[expected]magnitudeoftheeffectofaprogram.

22. TheoryofChange(ToC):describesastrategyorblueprintforachievingagivenlong‐term

goal.Itidentifiesthepreconditions,pathwaysandinterventionsnecessaryforaninitiative'ssuccess.

23. TreatmentGroup:inanexperimentaldesign,arandomlyassignedgroupfromthepopulation

thatreceivestheinterventionthatisthesubjectofevaluation.

24. TreatmentontheTreated:themeasuredimpactofaprogramthatincludesonlythedataforparticipantswhoactuallyreceivedthetreatment.


17

CASESTUDY1:REFORMINGSCHOOLMONITORING

ThiscasestudyisbasedontheJ‐PALStudy“PrimaryEducationManagementinMadagascar”byEstherDuflo,GerardLassibille,andTrangvanNguyen.

J‐PALthankstheauthorsforallowingustousetheirstudy.

Case 3: Women as Policymakers

Thinking about measurement and outcomes

Case1:ReformingSchoolMonitoringThinkingaboutmeasurementandoutcomes

TRANSLATING RESEARCH INTO ACTION


18

KeyVocabulary

BackgroundOver the last 10 years, low‐income countries in Africa have made striking progress in expandingcoverageofprimaryeducation.However,inmanyofthesecountriestheeducationsystemcontinuestodeliverpoorresults,puttingthegoalofuniversalprimaryschoolcompletionatrisk. Incompetentadministration,inadequatefocusonlearningoutcomes,andweakgovernancestructuresarethoughttobesomeofthereasonsforthepoorresults.Thiscasestudywilllookataprogramwhichaimedtoimprovetheperformanceandefficiencyofeducationsystemsbyintroducingtoolsandamonitoringsystemateachlevelalongtheservicedeliverychain.

MadagascarSchoolSystemReforms:“ImprovingOutputsnotOutcomes”Madagascar’s public primary school system has been making progress in expanding coverage inprimaryeducationthanksinpartduetoincreasesinpublicspendingsincethelate1990s.Aspartofitspoverty reductionstrategy,publicexpenditureoneducationrose from2.2 to3.3percentofGDPbetween 2001 and 2007. In addition to increased funding, the government introduced importantreformssuchastheeliminationofschoolfeesforprimaryeducation,freetextbookstoprimaryschoolstudents,publicsubsidiestosupplementthewagesofnon–civilserviceteachersinpublicschools(inthepasttheywerehiredandpaidentirelybyparentassociations),andnewpedagogicalapproaches.Themostvisiblesignofprogresswas the large increase incoverage inprimaryeducation inrecentyears. In 2007, the education systemenrolled some3.8million students in bothpublic andprivateschools—morethantwicetheenrolmentin1996.Duringthelast10years,morethan4000newpublicprimaryschoolshavebeencreated,and thenumberofprimaryschool teachers in thepublic sectormorethandoubled.Whilethisprogressisimpressive,enormouschallengesremain.Entryratesintograde1arehigh,butless than half of each cohort reaches the end of the five‐year primary cycle. Despite government

1. Hypothesis: a proposed explanation of and for the effects of a given intervention. Hypotheses are

intended to be made ex‐ante, or prior to the implementation of the intervention.

2. Indicators: metrics used to quantify and measure the needs that a program aims to address (needs

assessment), how a program is implemented (process evaluation) and whether it affects specific short‐

term and long‐term goals (impact evaluation).

3. Logical Framework (LogFrame): a management tool used to facilitate the design, execution, and

evaluation of an intervention. It involves identifying strategic elements (inputs, outputs, outcomes and

impact) and their causal relationships, indicators, and the assumptions and risks that may influence

success and failure.

4. Theory of Change (ToC): describes a strategy or blueprint for achieving a given long‐term goal. It

identifies the preconditions, pathways and interventions necessary for an initiative's success.


19

interventions, grade repetition rates are still uniformly high throughout the primarycycle,averagingabout18percent.Furthermore,testscoresrevealpoorperformance:studentsscoredanaverageof30percentonFrenchand50percentonMalagasyandmathematics.DiscussionTopic11. Wouldyouregardthereformsassuccessful?Whyorwhynot?

2. Whatare someof thepotential reasons forwhy the reformsdidnot translate intobetter

learningoutcomes?

Problemsremain....Asthestartingpointofthestudy,researchersworkedwiththeMinistryofEducationto identifytheremainingconstraintsintheschoolingsystem.Asurveyconductedin2005revealedthefollowingkeyproblems:

1. Teacher absenteeism: At 10 percent, teacher absenteeism remains a significant problem.Only 8 percent of school directors monitor teacher attendance (either by taking dailyattendance or tracking and posting a monthly summary of attendance), and more than 80percentfailtoreportteacherabsencestosub‐districtanddistrictadministrators.

2. Communicationwith parents: Communication between teachers and parents on studentlearningisoftenperfunctory,andstudentabsenteeismisrarelycommunicatedtoparents.

3. Teacherperformance: Essential pedagogical tasks are often neglected: only 15 percent ofteachers consistently prepare daily and biweekly lessons plans while 20 percent do notpreparelessonplansatall.Studentacademicprogressisalsopoorlymonitored:resultsoftestsandquizzesarerarelyrecordedand25percentofteachersdonotprepareindividualstudentreportcards.

Overall,manyofproblems seem tobe result of a lackof organization, control andaccountability ateverystageof thesystem,allofwhichare likely tocompromise theperformanceof thesystemandlowerthechanceofthereformsbeingsuccessful.


20

InterventionIn order to address these issues, theMadagascarMinistry of Educationseekstotightenthemanagementandaccountabilityateachpointalongthe service delivery chain (see Figure 1) by making explicit to thevarious administrators and teachers what their responsibilities are,supportingthemwithteachingtools,andincreasingmonitoring.Theministryisconsideringtwoapproachestoevaluate1:

1. Top‐Down:Operational tools and guidebooks which outline their responsibilitiesare given to the relevant administrators. During a meeting,administrators are trained on how to carry out their tasks, and theirperformance criteria are clarified. This is followed up by regularmonitoringoftheirperformance,whichiscommunicatedthrough(sub‐)districtreportcardstohigherlevels.

2. Bottom‐Up:

Thisprogrampromotestheabilityofparentstomonitortheirschoolsandholdteachersaccountablewhen they perform below expectation. Report cards with easy‐to‐understand content are given toparentsandmembersofpoorruralcommunities.Theycontainasmallsetofperformanceindicators,informationonenrolmentsandschoolresources,aswellasdatathatallowaschool’sperformancetobe compared that of other schools. In addition, greater community participation in school‐basedmanagementisencouragedthroughstructuredschoolmeetingsinwhichstaffoftheschool,parents,andcommunitymembersreviewthereportcardanddiscusstheirschoolimprovementplan.

DiscussionTopic2

1 The actual evaluation included further interventions such as training of teachers. For more details, please refer to the paper. For pedagogical reasons, we focus only on two approaches in this case study.

1. BeforesettinguptheRCT,researcherscarefullyanalyzedtheexistingproblem.Whydoyouthinkthisisimportantasastartingpointofanevaluation?

2. Whataretheintermediateandultimategoalsthatthisprogramhopestoachieve?

3. Whatisthekeyhypothesisbeingtestedthroughthisimpactevaluation?

Figure 1: Education System


21

TheoryofChange

A theoryof change (ToC) identifies the causal linkbetween theintervention and the final outcome. Figure 2 shows oneway inwhichaToCcanbestructured.For example, a program or intervention is implemented toaddress a specific problem identified in the needs assessment(e.g. low literacy levels). The intervention (e.g. text books)mayleadtooutputs(e.g.studentsusageoftextbooks)throughwhichintermediary outcomes (e.g. reading skills) could be affected.These may lead to longer‐term outcomes (e.g. drop‐out rates,employmentoutcomes).AnunderlyingassumptionofthisToCisthatstudentsdonotalreadyhavetextbooks.DiscussionTopic3

Whatdatatocollect?DatacollectionandmeasurementBeforedecidingwhichdatatocollect,youneedtobeveryclearontheoutcomeyouaretargetingandinwhatway the intervention is theorized to impact thisoutcome. Inotherwords, identifyingakeyhypothesis and theory of change at the beginning of an evaluation helps you to decide whatinformationtocollect.For each step of the theory of change, we need to identify indicators (what to measure) andinstruments(howtocollectdata).Continuingwiththeexampleofthetextbookprogram,anindicatorcouldbereadinglevelofstudentsandtheinstrumentcouldbestandardizedreadingtests.Inaddition,weneedtocollectdataonourassumptionstoseewhetherornottheyholdtrue.DiscussionTopic4

1. WhichindicatorswouldyoumeasureateachstepintheToCsyoudrewup?

2. Howwouldyoucollectdatafortheseindicators?Inotherwords,whatinstrumentswouldyouuse?Doyouforeseechallengeswiththeseformsofdatacollection?

1. DrawoutthecausalchainusingtheformatinFigure2foreachofthebottom‐upandtop‐downinterventions(useaseparateToCforeach).

2. Whatarethenecessaryconditions/assumptionsunderlyingtheseToCs?

Figure 2: Theory of Change


22

HowtointerprettheresultsTheevaluationfoundthatthebottom‐upapproachledtosuccessfulresults.Attendanceatmeetingsbetweenteachersandcommunitymemberswashigh,andalthoughcommunicationbetweenteachersand parents did not change, teachers improved the quality of teaching as shown by an increase inlessonplansandtestscores.However,thefindingsofthetop‐downinterventionwerequitedifferent:DiscussionTopic5

1. Howdoyouinterprettheresultsofthetop‐downintervention?

2. Whyisitimportanttointerprettheresultsinthecontextofaprogramtheoryofchange?

3. Whatarethepolicyimplications?Howmightyourespondtothesefindings?


23

CASESTUDY2:LEARNTOREADEVALUATIONS

Thiscasestudyisbasedon“PitfallsofParticipatoryPrograms:Evidence

fromaRandomizedEvaluationinIndia,”byAbhijitBanerjee(MIT),RukminiBanerjee(Pratham),EstherDuflo(MIT),RachelGlennerster(J‐PAL),and

StutiKhemani(TheWorldBank)

J‐PALthankstheauthorsforallowingustousetheirpaper

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Case2:LearntoReadEvaluationsDifferentmethodsofevaluation


24

KeyVocabulary1. Counterfactual: what would have happened to the participants in a program had they not received the

intervention. The counterfactual cannot be observed from the treatment group, it can only be inferred

from the comparison group.

2. Treatment Group: in an experimental design, a randomly assigned group from the population that

receives the intervention that is the subject of evaluation.

3. Comparison Group: in an experimental design, a randomly assigned group from the same population

as the treatment group that does not receive the intervention. Participants in the comparison group are

used as a standard for comparison against the treated subjects in order to validate the results of the

intervention.

4. Program Impact: estimated by measuring the difference in outcomes between comparison and

treatment groups. The true impact of the program is the difference in outcomes between the treatment

group and its counterfactual.

5. Baseline: data describing the characteristics of participants measured across both treatment and

comparison groups prior to implementation of intervention.

6. Endline: data describing the characteristics of participants measured across both treatment and

comparison groups after implementation of intervention.

7. Selection Bias: statistical bias between comparison and treatment groups in which individuals in one

group are systematically different from those in the other. These can occur when the treatment and

comparison groups are chosen in a non‐random fashion so that they differ from each other by one or

more factors that may affect the outcome of the study.

8. Omitted Variable Bias: statistical bias that occurs when certain variables/characteristics (often

unobservable), which affect the measured outcome, are omitted from a regression analysis. Because

they are not included as controls in the regression, one incorrectly attributes the measured impact solely

to the program.

WhyLearntoRead(L2R)?

In a large‐scale survey conducted in 2004, the Indian NGO, Pratham, discovered that only 39% ofchildren(aged7‐14)inruralUttarPradeshcouldreadandunderstandasimplestory,andnearly15%couldnotrecognizeevenaletter.During this period, Pratham was developing the “Learn‐to‐Read” (L2R) module of its Read Indiacampaign.L2Rincludedauniquepedagogyteachingbasicliteracyskills,combinedwithagrassrootsorganizingefforttorecruitvolunteerswillingtoteach.


25

This program allowed the community to get involved in children’s education moredirectlythroughvillagemeetingswherePrathamstaffsharedinformationonthestatusofliteracyinthevillageandtherightsofchildrentoeducation. In thesemeetings,Prathamidentifiedcommunitymemberswhowerewilling to teach. Volunteers attended a training session on the pedagogy, afterwhichtheycouldholdafter‐schoolreadingclassesforchildren,usingmaterialsdesignedandprovidedbyPratham.Prathamstaffpaidoccasionalvisitstothesecampstoensurethattheclasseswerebeingheldandtoprovideadditionaltrainingasnecessary.

DidtheLearntoReadprojectwork?

DidPratham’s“LearntoRead”programwork?Whatisrequiredinorderforustomeasurewhetheraprogramworkedor,inotherwords,whetherithadimpact?In general, to ask if a programworks is to ask if the program achieves its goal of changing certainoutcomesforitsparticipants,andensurethatthosechangesarenotcausedbysomeotherfactorsoreventshappeningatthesametime.Toshowthattheprogramcausestheobservedchanges,weneedtosimultaneouslyshowthatiftheprogramhadnotbeenimplemented,theobservedchangeswouldnothaveoccurred (orwouldbedifferent).Buthowdoweknowwhatwouldhavehappened? If theprogramhappened,ithappened.Measuringwhatwouldhavehappenedrequiresenteringanimaginaryworld in which the program was never given to these participants. The outcomes of the sameparticipantsinthisimaginaryworldarereferredtoasthecounterfactual.Sincewecannotobservethetruecounterfactual,thebestwecandoistoestimateitbymimickingit.The key challenge of program impact evaluation is constructing or mimicking thecounterfactual.Wetypicallydothisbyselectingagroupofpeoplethatresembletheparticipantsasmuch as possible butwho did not participate in the program. This group is called the comparisongroup. Becausewewant to be able to say that it was the program and not some other factor thatcaused the changes in outcomes, it is important that the only difference between the comparisongroupandtheparticipantsisthatthecomparisongroupdidnotparticipateintheprogram.Wethenestimate“impact”asthedifferenceobservedattheendoftheprogrambetweentheoutcomesofthecomparisongroupandtheoutcomesoftheprogramparticipants.The impact estimate is only as accurate as the comparison group is successful at mimicking thecounterfactual. If thecomparisongrouppoorlyrepresents thecounterfactual, the impact is (inmostcircumstances)poorlyestimated.Thereforethemethodusedtoselectthecomparisongroupisakeydecisioninthedesignofanyimpactevaluation.

Thatbringsusback to ourquestions:Did theLearn toReadprojectwork?Whatwas its impactonchildren’sreadinglevels?Inthiscase,theintentionoftheprogramisto“improvechildren’sreadinglevels”andthereadinglevelis the outcomemeasure. So,whenwe ask if the Learn to Read projectworked,we are asking if itimproved children’s reading levels. The impact is the difference between reading levels after thechildrenhavetakenthereadingclassesandwhattheirreadinglevelwouldhavebeenif thereadingclasseshadneverexisted.


26

Forreference,ReadingLevelisanindicatorvariablethattakesvalue0ifthechildcanreadnothing,1ifheknowsthealphabet,2ifhecanrecognizewords,3ifhecanreadaparagraph,and4ifhecanreadafullstory.Whatcomparisongroupscanweuse?Thefollowingexpertsillustratedifferentmethodsofevaluatingimpact.(Refertothetableonthelastpageofthecaseforalistofdifferentevaluationmethods).

EstimatingtheimpactoftheLearntoReadprojectMethod1NewsRelease:ReadIndiahelpschildrenLearntoRead.Prathamcelebratesthesuccessofits“LearntoRead”program—partoftheReadIndiaInitiative.Ithasmade significant progress in its goal of improving children’s literacy rates through better learningmaterials,pedagogicalmethods,andmostimportantly,committedvolunteers.Theachievementofthe“Learn to Read” (L2R) program demonstrates that a revised curriculum, galvanized by communitymobilization,canproducesignificantgains.Massivegovernmentexpenditures inmid‐daymealsandschool construction have failed to achieve similar results. In less than a year, the reading levels ofchildrenwhoenrolledintheL2Rcampsimprovedconsiderably.

Justbeforetheprogramstarted,halfthesechildrencouldnotrecognizeHindiwords—manynothingatall.ButafterspendingjustafewmonthsinPrathamreadingclasses,morethanhalfimprovedbyatleast one reading level,with a significantnumber capable of recognizingwords and several able toreadfullparagraphsandstories!Onaverage,theliteracymeasureofthesestudentsimprovedbynearlyonefullreadinglevelduringthisperiod.

Figures: Showendline levels forL2Rparticipants –organizedbybaseline levelsof reading levels (left:zerolevel,right:letterlevel)


27

DiscussionTopic1

1. Whattypeofevaluationdoesthisnewsreleaseimply?

2. Whatrepresentsthecounterfactual?

3. Whataretheproblemswiththistypeofevaluation?

Method2Opinion:The“ReadIndia”projectnotuptothemarkPrathamhasraisedmillionsofdollars,expandingrapidlytocoverallofIndiawithitsso‐called“Learn‐to‐Read”program,butdo itsstudentsactually learntoread?Recentevidencesuggestsotherwise.AteamofevaluatorsfromEducationforAllfoundthatchildrenwhotookthereadingclassesendedupwith literacy levelssignificantlybelowthoseof theirvillagecounterparts.AfteroneyearofPrathamreading classes, Pratham students could only recognizewordswhereas thosewho steered clear ofPratham programs were able to read full paragraphs. If you have a dime to spare, and want tocontributetotheeducationof India’s illiteratechildren,youmaythinktwicebeforethrowing it intothefountainofPratham’spromises.Notestothegraph:ReadingLevelisanindicatorvariablethattakesvalue0ifthechildcanreadnothing,1ifheknowsthealphabet,2ifhecanrecognizewords,3ifhecanreadaparagraphand4ifhecanreadafullstory.DiscussionTopic2

1. Whattypeofevaluationisthisopinionpieceemploying?



Comparison of reading levels of children who took reading classes Vs. reading levels of children who did

not take them

0

0.5

1

1.5

2

2.5

3

Did not take reading classes/ Took reading classes

Rea

din

g L

evel

Mean reading level for children who did not take reading classesMean reading level for children who took reading classes


28

Method3LettertotheEditor:EFAshouldconsiderEvaluatingFairlyandAccuratelyTherehavebeen severalunfair reports in thepress concerningprograms implementedby theNGOPratham. A recent article by a former Education for All bureaucrat claims that Pratham is actuallyhurting the children it recruits into its ‘Learn‐to‐Read’ camps. However, the EFA analysis uses thewrong metric to measure impact. It compares the reading levels of Pratham students with otherchildren in the village—not taking into account the fact that Pratham targets thosewhose literacylevels areparticularlypoor at thebeginning. If Prathamsimply recruited themost literate childreninto their programs, and compared them to their poorer counterparts, they could claim successwithoutconductingasingleclass.ButPrathamdoesnotdo this.Andrealistically,Prathamdoesnotexpectitsilliteratechildrentoovertakethestrongerstudentsinthevillage.Itsimplytriestoinitiateimprovementoverthecurrentstate.Thereforethemetricshouldbeimprovementinreadinglevels—not the final level.Whenwe repeatedEFA’sanalysisusing themore‐appropriateoutcomemeasure,the Pratham kids improved at twice the rate of the non‐Pratham kids (0.6 reading level increasecomparedto0.3).Thisdifferenceisstatisticallyverysignificant.HadtheEFAevaluatorsthoughtto lookat themoreappropriateoutcome, theywouldrecognizetheincrediblesuccessofReadIndia.PerhapstheyshouldenrollinsomePrathamclassesthemselves.DiscussionTopic3

1. Whattypeofevaluationisthisletterusing?




29

Method4Thenumbersdon’tlie,unlessyourstatisticiansareasleepPratham celebrates victory, opponents cry foul. A closer look shows that, as usual, the truth issomewhereinbetween.There has been awar in the press between Pratham’s supporters and detractors. Pratham and itsadvocatesassertthattheReadIndiacampaignhasresultedinlargeincreasesinchildliteracy.Severaldetractors claim that Pratham programs, by pulling attention away from the schools, are in factcausingsignificantharmtothestudents.Unfortunately,thisbattleisbeingwagedusinginstrumentsofanalysisthatareseriouslyflawed.Theultimatevictimisthepublicwhoislookingforananswertothequestion:isPrathamhelpingitsintendedbeneficiaries?Thisreportusessophisticatedstatisticalmethodstomeasure thetrue impactofPrathamprograms.Wewere concerned about other variables confounding previous results.We therefore conducted asurveyinthesevillagestocollect informationonchildage,grade‐level,andparents’educationlevel,andusedthosetopredictchildtestscores.

Table 1: Reading outcomes

Level Improvement

(1) (2) (3) (4)

Reading Classes -0.68 ** 0.04 0.24 ** 0.11(0.0829) (0.1031) (0.0628) (0.1081)

Previous reading level 0.71 **(0.0215)

Age 0.00 -0.01(0.0182) (0.0194)

Sex -0.01 0.05(0.0469) (0.0514)

Standard 0.02 -0.08 **(0.0174) (0.0171)

Parents Literate 0.04 0.13 **(0.0457) (0.0506)

Constant 2.82 0.36 0.37 0.75(0.0239) (0.2648) (0.0157) (0.3293)

School-type controls No Yes No 0.37

Notes: The omitted category for school type is "Did not go to school". Reading Level is an indicator variable that

takes value 0 if the child can read nothing, 1 if he knows the alphabet, 2 if he can recognize words, 3 if he can read a

paragraph and 4 if he can read a full story

NOTE:Datausedinthiscasearereal.“Articles”onthedebatewereartificiallyproducedforthepurposeofthecase.EducationforAll(EFA)nevermadeanyoftheclaimsdescribedherein

Looking at Table 1, we find some positive results, some negative results and some “no‐results”,depending onwhich variables we control for. The results from column (1) suggest that Pratham’sprogramhurtthechildren.ThereisanegativecorrelationbetweenreceivingPrathamclassesandfinal

Control variables: (independent) variables other than the reading classes that may influence children’s reading outcomes

Key independent variable: reading classes are the treatment; the analysis tests the effect of these classes on reading outcomes

Statistical significance: the corresponding result is unlikely to have occurred by chance, and thus is statistically significant (credible)

Dependent variables: reading level and improvement in reading level are the primary outcomes in this analysis.


30

reading outcomes (‐0.68). Column (3), which evaluates improvement, suggestsimpressiveresults(0.24).Butlookingatchildoutcomes(eitherlevelorimprovement)controllingforinitial reading levels, age, gender, standard and parent’s education level – all determinants of childreadinglevels–wefoundnoimpactofPrathamprograms.Therefore,controllingfortherightvariables,wehavediscoveredthatononehand,Prathamhasnotcausedtheharmclaimedbycertainopponents,butontheotherhand,ithasnothelpedchildrenlearn.Prathamhasthereforefailedinitsefforttoconvinceusthatitcanspenddonormoneyeffectively.DiscussionTopic4

1. Whattypeofevaluationisthisreportutilizing?


3. Whataretheproblemswiththistypeofevaluation?Thetableonthenexttwopagesreviewsthedifferentimpactevaluationmethodologies.


31

Methodology Description Who is in the comparison group? Required Assumptions Required Data

Quasi‐Experimental M

ethods

Pre‐Post Measure how program participants improved (or changed) over time.

Program participants themselves—before participating in the program.

The program was the only factor influencing any changes in the measured outcome over time.

Before and after data for program participants.

Simple Difference

Measure difference between program participants and non‐participants after the program is completed.

Individuals who didn’t participate in the program (for any reason), but for whom data were collected after the program.

Non‐participants are identical to participants except for program participation, and were equally likely to enter program before it started.

After data for program participants and non‐participants.

Differences in Differences

Measure improvement (change) over time of program participants relative to the improvement (change) of non‐participants.

Individuals who didn’t participate in the program (for any reason), but for whom data were collected both before and after the program.

If the program didn’t exist, the two groups would have had identical trajectories over this period.

Before and after data for both participants and non‐participants.

Multivariate Regression

Individuals who received treatment are compared with those who did not, and other factors that might explain differences in the outcomes are “controlled” for.

Individuals who didn’t participate in the program (for any reason), but for whom data were collected both before and after the program. In this case data is not comprised of just indicators of outcomes, but other “explanatory” variables as well.

The factors that were excluded (because they are unobservable and/or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and non‐participants.

Outcomes as well as “control variables” for both participants and non‐participants.

Statistical Matching

Individuals in control group are compared to similar individuals in experimental group.

Exact matching: For each participant, at least one non‐participant who is identical on selected characteristics. Propensity score matching: non‐participants who have a mix of characteristics which predict that they would be as likely to participate as participants.

The factors that were excluded (because they are unobservable and/or have been not been measured) do not bias results because they are either uncorrelated with the outcome or do not differ between participants and non‐participants.

Outcomes as well as “variables for matching” for both participants and non‐participants.

Regression Discontinuity Design

Individuals are ranked based on specific, measureable criteria. There is some cutoff that determines whether an individual is eligible to participate. Participants are then compared to non‐

Individuals who are close to the cutoff, but fall on the “wrong” side of that cutoff, and therefore do not get the program.

After controlling for the criteria (and other measures of choice), the remaining differences between individuals directly below and directly above the cut‐off score are not statistically significant and will not bias the results. A necessary but

Outcomes as well as measures on criteria (and any other controls).


32

Methodology Description Who is in the comparison group? Required Assumptions Required Data

participants and the eligibility criterion is controlled for.

sufficient requirement for this to hold is that the cut‐off criteria are strictly adhered to.

Instrumental Variables

Participation can be predicted by an incidental (almost random) factor, or “instrumental” variable, that is uncorrelated with the outcome, other than the fact that it predicts participation (and participation affects the outcome).

Individuals who, because of this close to random factor, are predicted not to participate and (possibly as a result) did not participate.

If it weren’t for the instrumental variable’s ability to predict participation, this “instrument” would otherwise have no effect on or be uncorrelated with the outcome.

Outcomes, the “instrument,” and other control variables.

Experimen

tal M

ethod

Randomized Evaluation

Experimental method for measuring a causal relationship between two variables.

Participants are randomly assigned to the control groups.

Randomization “worked.” That is, the two groups are statistically identical (on observed and unobserved factors).

Outcome data for control and experimental groups. Control variables can help absorb variance and improve “power”.


33

CASESTUDY3:EXTRATEACHERPROGRAM

Thiscasestudyisbasedonthepaper“PeerEffectsandtheImpactof

Tracking:EvidencefromaRandomizedEvaluationinKenya,”byEstherDuflo(MIT),PascalineDupas(Stanford),andMichaelKremer(Harvard)


Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Case3:Extra TeacherProgramDesigninganevaluationtoanswerthreekeyeducationpolicyquestions


34

KeyVocabulary1. Level of Randomization: the level of observation (ex. individual, household, school, village) at

which treatment and comparison groups are randomly assigned.

Over‐crowdedschools

Likemanyotherdevelopingcountries,KenyahasrecentlymaderapidprogresstowardtheMillenniumDevelopment Goal of universal primary education. Largely due to the elimination of school fees in2003,primaryschoolenrollmentrosenearly30percent,from5.9millionto7.6millionbetween2002and2005.2Without accompanying government funding, however, this progress has created its own set of newchallengesinKenya:

1. Large class size: Due to budget constraints, the rise in primary school enrollment has notbeenmatchedbyproportionalincreasesinthenumberofteachers.(Teachersalariesalreadyaccount for the largest component of educational spending.) The result has been very largeclasssizes,particularlyinlowergrades.InasampleofschoolsinWesternKenya,forexample,theaveragefirstgradeclassin2005was83students.Thisisconcerningbecauseitisbelievedthat small classesaremost important for theyoungest students,whoarestill acclimating tothe school environment. TheKenyanNationalUnion of Teachers estimates that the countryneeds an additional 60,000 primary school teachers in addition to the existing 175,000 inordertoreachallprimarystudentsanddecreaseclasssizes.

2. Teacher absenteeism: Further exacerbating the problem of pupil‐teacher ratios, teacher

absenteeismremainshigh,reachingnearly20%insomeareasofKenya.Therearetypicallynosubstitutesforabsentteachers,sostudentssimplymillaround,gohomeorjoinanotherclass,often of a different grade. Small schools, which are prevalent in rural areas of developingcountries, may be closed entirely as a result of teacher absence. Families have to considerwhether school will even be open when deciding whether or not to send their children toschool.Anobviousresultislowstudentattendance—evenondayswhentheschoolisopen.

3. Heterogeneousclasses:ClassesinKenyaarealsoveryheterogeneouswithstudentsvarying

widely in terms of school preparedness and support from home. Grouping students intoclassessortedbyability(tracking,orstreaming)iscontroversialamongacademicsandpolicymakers.Ononehand,ifteachersfinditeasiertoteachahomogeneousgroupofstudents,trackingcouldimproveschooleffectivenessandtestscores.Manyargue,onthe other hand, that if students learn in part from their peers, tracking could

2 UNESCO. (2006). United Nations Education, Scientific and Cultural Organization. Fact Book on Education for All. Nairobi: UNESCO Publishing, 2006.


35

disadvantage low achieving students while benefiting high achievingstudents,therebyexacerbatinginequality.

4. Scarce schoolmaterials: Because of the high costs of educational inputs and the rising

numberofstudents,educationalresourcesotherthantheteacherarestretched,andinsomecases up to four students must share one textbook. And an already over‐burdenedinfrastructuredeterioratesfasterwhenforcedtoservemorechildren.

5. Lowcompletionrates:Asa resultof these factors, completion rates arevery low inKenya

withonly45.1%ofboysand43.3%ofgirlscompletingthefirstgrade.All in all, these issues pose new challenges to communities: how to ensure minimum quality ofeducationgivenKenya’sbudgetconstraints.

ContractTeachers:Apossiblesolution?

Governmentsinseveraldevelopingcountrieshaverespondedtosimilarchallengesbystaffingunfilledteachingpositionswithlocally‐hiredcontractteacherswhoarenotcivilserviceemployees.Thefourmaincharacteristicsofcontractteachersarethattheyare:

1. Appointed on annual renewable contracts, with no guarantee of renewed employment(unlikeregularcivilserviceteachers)

2. Often lessqualified than regular teachers andmuch less likely tohave a formal teachertrainingcertificateordegree

3. Paidlowersalariesthanthoseofregularteachers(typicallylessthanafifthofthesalariespaidtoregularteachers)

4. Morelikelytobefromthelocalareawheretheschoolislocated.

AreContractTeachersEffective?

The increasing use of contract teachers has been one of themost significant policy innovations inproviding primary education in developing countries, but it has also been highly controversial.Supporters say that using contract teachers is an efficient way of expanding education access andqualitytoalargenumberoffirst‐generationlearners.Knowingthattheschoolcommittee’sdecisionofwhetheror not to rehire them the following yearmayhingeonperformance, contract teachers aremotivatedtotryharderthantheirtenuredgovernmentcounterparts.Contractteachersarealsooftenmoresimilartotheirstudents,geographically,culturally,andsocioeconomically.Opponentsarguethatusingunder‐qualifiedanduntrainedteachersmaystaffclassrooms,butwillnotproduce learning outcomes. Furthermore the use of contract teachers de‐professionalizes teaching,reducestheprestigeoftheentireprofession,andreducesmotivationofallteachers.Evenifithelpsintheshortterm,itmayhurteffortstorecruithighlyqualifiedteachersinthefuture.While the use of contract teachers has generated much controversy, there is very little rigorousevidenceregardingtheeffectivenessofcontractteachersinimprovingstudentlearningoutcomes.


36

TheExtraTeacherProgramRandomizedEvaluation

InJanuary2005,InternationalChildSupport(ICS)Africainitiatedatwoyearprogramtoexaminetheeffect of contract teachers on education in Kenya. Under the program, ICS gave funds to 140 localschool committees to hire one extra contract teacher to teach an additional first grade class. ICSexpectedthisprogramtoimprovestudentlearningby,amongotherthings,decreasingclasssizeandusingteacherswhoaremoredirectlyaccountable to thecommunities theyserve.However,becausecontractteacherstendtohavelesstrainingandreceivealowermonthlysalarythantheircivilservantcounterparts,therewasconcernaboutwhethertheseteachersweresufficientlymotivated,giventheircompensation,orqualifiedgiventheircredentials.The purpose of this intervention was to address the first three challenges: class size, teacheraccountability, and heterogeneity of ability. The evaluationwas designed tomeasure the impact ofclass‐sizereductions,therelativeeffectivenessofcontractteachers,andhowtrackingbyabilitywouldimpactbothlowandhigh‐achievingstudents.Whatexperimentaldesignscouldtesttheimpactofthisinterventiononeducationalachievement?Whichofthesechangesintheschoollandscapeisprimarilyresponsibleforimprovedstudentperformance?

AddressingMultipleResearchQuestionsthroughExperimentalDesign

Differentrandomizationstrategiesmaybeusedtoanswerdifferentquestions.Whatstrategiescouldbeusedtoevaluatethefollowingquestions?Howwouldyoudesignthestudy?Whowouldbeinthetreatmentandcontrolgroups,andhowwouldtheyberandomlyassignedtothesegroups?DiscussionTopic1:Testingtheeffectivenessofcontractteachers

1. WhatistherelativeeffectivenessofcontractteachersversusregulargovernmentteachersDiscussionTopic2:Lookingatmoregeneralapproachesofimprovingeducation

1. Whatistheeffectofgroupingstudentsbyabilityonstudentperformance?

2. Whatistheeffectofsmallerclasssizesonstudentperformance?DiscussionTopic3:Addressingallquestionswithasingleevaluation

1. Couldasingleevaluationexplorealloftheseissuesatonce?2. Whatrandomizationstrategycoulddoso?


37

ExerciseA:RandomSampling&TheLawofLargeNumbers(30min)‐Excel

Inthisexercise,wewillvisuallyexplorerandomsamplesofdifferentsizesfromagivenpopulation.Inparticular,wewilltrytodemonstratethatlargersamplesizestendtobemorereflectiveoftheunderlyingpopulation.YourGroupLeaderhasthedataforthisexercise.

1. Openthefile“ExerciseA_SamplingDistributions_NEW.xlsm”.

2. Ifprompted,select“EnableMacros”.

3. Navigatetothe“Randomize”worksheet,whichallowsyoutochoosearandomsampleofsize“SampleSize”fromthedatacontainedinthe“control”worksheet.

4. Enter“10”for“SampleSizeandclickthe“Randomize”button.Observethedistributionof

thevariouscharacteristicsbetweenTreatment,ControlandExpected.Withasamplesizethissmall,thepercentagedifferencefromtheexpectedaverageisquitehighforreadingscores.Click“Randomize”multipletimesandobservehowthedistributionchanges.

5. Now,try“50”forthesamplesize.Whathappenstothedistributions?Randomizeafew

timesandobservethepercentagedifferenceforthereadingscores.

6. Increasethesamplesizeto“500”,“2000”and“10000”,andrepeattheobservationsfromstep5.Whatcanwesayaboutlargersamplesizes?HowdotheyaffectourTreatmentandControlsamples?ShouldthepercentagedifferencebetweenTreatment,ControlandExpectedalwaysgodownasweincreasesamplesize?


38

ExerciseB:MechanicsofRandomization(90min)Part1:SimpleRandomizationLikemostspreadsheetprograms,MSExcelhasarandomnumbergeneratorfunction.Saywehadalistofschoolsandwantedtoassignhalftotreatmentandhalftocontrol.(1) Wehaveourlistofallschools.

(2)


39

(2) Assignarandomnumbertoeachschool:The function RAND() is Excel’s random number generator. To use it, in Column C, type in thefollowing= RAND()ineachcelladjacenttoeveryname.Oryoucantypethisfunctioninthetoprow(row2)andsimplycopyandpastetotheentirecolumn,orclickanddrag.Typing“=RAND()”putsa15‐digitrandomnumberbetween0and1inthecell.(3) CopythecellsinColumC,thenpastethevaluesoverthesamecellsThefunction=RAND()willre‐randomizeeachtimeyoumakeanychangestoanyotherpartofthespreadsheet.Exceldoesthisbecauseitrecalculatesallvalueswithanychangetoanycell.(Youcanalsoinducerecalculation,andhencere‐randomization,bypressingthekeyF9.)


40

Thiscanbeconfusing,however.Oncewe’vegeneratedourcolumnofrandomnumbers,wedonotneed to re‐randomize. We already have a clean column of random values. To stop excel fromrecalculating,youcanreplacethe“functions”inthiscolumnwiththe“values”.Todothis,highlightallvaluesinColumnC.Thenright‐clickanywhereinthehighlightedcolumn,andchooseCopy.ThenrightclickanywhereinthatcolumnandchosePasteSpecial.The“PasteSpecialwindowwillappear.Clickon“Values”.

(4)


41

(4) SortthecolumnsineitherdescendingorascendingorderofcolumnC:HighlightcolumnsA,B,andC.Inthedatatab,presstheSortbutton:

ASortboxwillpopup.

IntheSortbycolumn,select“random#”.ClickOK.Doingthissortsthelistbytherandomnumberinascendingordescendingorder,whicheveryouchose.


42

There!Youhavearandomlysortedlist.

(5) SortthecolumnsineitherdescendingorascendingorderofcolumnC:Becauseyourlistisrandomlysorted,itiscompletelyrandomwhetherschoolsareinthetophalfofthe list, or thebottomhalf. Therefore, if you assign the tophalf to the treatment groupand thebottomhalftothecontrolgroup,yourschoolshavebeen“randomlyassigned”.In columnD, type “T” for the firsthalfof the rows (rows2‐61). For the secondhalf of the rows(rows62‐123),type“C”


43

Re‐sort your list back in order of school id. You’ll see that your schools have been randomlyassignedtotreatmentandcontrolgroups.


44

Part2:StratifiedRandomizationStratification is the process of dividing a sample into groups, and then randomly assigningindividualswithineachgrouptothetreatmentandcontrol.Thereasonsfordoingthisarerathertechnical.Onereasonforstratifyingisthatitensuressubgroupsarebalanced,makingiteasiertoperform certain subgroup analyses. For example, if youwant to test the effectiveness of a neweducationprogramseparatelyforschoolswherechildrenaretaughtinHindiversusschoolswherechildrenaretaughtinGujarati,youcanstratifyby“languageofinstruction”andensurethatthereareanequalnumberschoolsofeachlanguagetypeinthetreatmentandcontrolgroups.(1) Wehaveourlistofschoolsandpotential“strata”.Mechanically,theonlydifferenceinrandomsortingisthatinsteadofsimplysortingbytherandomnumber,youwouldfirstsortbylanguage,andthentherandomnumber.Obviously,thefirststepistoensureyouhavethevariablesbywhichyouhopetostratify.(2) Sortbystrataandthenbyrandomnumber.Assumingyouhaveall thevariablesyouneed: in thedatatab,click“Sort”.TheSortwindowwillpopup.Sortby“Language”.Pressthebutton,“AddLevel”.Thenselect,“Random#”.


45

(3) AssignTreatment–ControlStatusforeachgroup.Withineachgroupoflanguages,type“T”forthefirsthalfoftherows,and“C”forthesecondhalf.


46

PrimeronPowerCalculation(30min)

Samplemeansandthenormaldistribution


47


48


49


50


51


52

GroupExerciseC:PowerCalculation(30min)

KeyVocabulary1. Power: the likelihood that, when the program has an effect, one will be able to distinguish the

effect from zero given the sample size.

2. Significance: the likelihood that the measured effect did not occur by chance. Statistical tests are

performed to determine whether one group (e.g. the experimental group) is different from another

group (e.g. comparison group) on the measurable outcome variables used in the evaluation.

3. Standard deviation: a standardized measure of the variation of a sample population from its

mean on a given characteristic/outcome. Mathematically, the square root of the variance.

4. Standardized effect size: a standardized measure of the [expected] magnitude of the effect of a

program.

5. Cluster: the level of observation at which a sample size is measured. Generally, observations

which are highly correlated with each other should be clustered and the sample size should be

measured at this clustered level.

6. Intra‐cluster correlation coefficient: a measure of the correlation between observations within a

cluster; i.e. the level of correlation in drinking water source for individuals in a household.

Clusterrandomizedtrials

TheExtraTeacherProgram(ETP)casestudydiscussedtheconceptofclusterrandomizedtrials.Itcouldbethatouroutcomeofinterestiscorrelatedforstudentsinthesameclassroom,forreasonsthathavenothingtodowiththeextrateacher.Forexample,allthestudentsinaclassroomwillbeaffectedbytheiroriginalteacher,bywhethertheirclassroomisunusuallydark,oriftheyhaveachalkboard;these factorsmeanthatwhenonestudent intheclassdoesparticularlywell for thisreason,allthestudentsinthatclassroomprobablyalsodobetter—whichmighthavenothingtodowithanextrayear.

Therefore, if we sample 100 kids from 10 randomly selected schools, that sample is lessrepresentativeofthepopulationofschoolsinthecitythanifweselected100randomkidsfromthewhole population of schools, and therefore absorbs less variance. In effect, we have a smallersamplesizethanwethink.Thiswillleadtomorenoiseinoursample,andhencealargerstandarderrorthanintheusualcaseofindependentsampling.Whenplanningboththesamplesizeandthebestwaytosampleclassrooms,weneedtotakethisintoaccount.

Thisexercisewillhelpyouunderstandhowtodothat.Shouldyousampleeverystudentinjustafewschools?Shouldyousampleafewstudentsfrommanyschools?Howdoyoudecide?

Wewillwork throughthesequestionsbydetermining thesamplesize thatallowsus todetectaspecific effectwith at least 80%power. Remember that power is the likelihood thatwhen the


53

treatmenthasaneffectyouwillbeabletodistinguishitfromzeroinyoursample.

In this example, “clusters” refer to “clusters of children”—in other words, “classrooms” or“schools”. This exercise shows you how the power of your sample changeswith the number ofclusters, the size of the clusters, the size of the treatment effect and the Intraclass CorrelationCoefficient.WewilluseasoftwareprogramcalledOptimalDesigndevelopedbySteveRaudebushwithfundingfromtheWilliamT.GrantFoundation.Youcanfindadditionalresourcesonclustereddesignsontheirwebsite.

Section1:UsingtheODSoftwareFirstdownloadtheODsoftwarefromthewebsite(asoftwaremanualisalsoavailable): http://sitemaker.umich.edu/group‐based/optimal_design_softwareNotesonOptimalDesign

Menu PersonalLevelRandomization(students) Clusterwithindividualoutcomes(schoolsrandomized,lookatindividualtest

scores) Clusterwithclusteroutcomes(schoolrandomized,schoolleveloutcomes)

Clusterwithindividualoutcomes ClusterRandomizedTrialassignclusters(schools,districts,etc)toCandT BlockedTrialstratification:ex.Stratifyschoolsintosimilarpairsandrandomize

whichareTandC ClusterRandomizedTrial,TreatmentatLevel2

Explain:fivevariables:rho,numberofclusters,power,clustersize,andR2…setthreefixed(oratdifferentlevelsfordifferentlines)togetrelationshipbetweentheothertwoonaxis.

Whenyouopen it,youwillseeascreenwhichlooks liketheonebelow. Select themenuoption“Design”toseetheprimarymenu.Selecttheoption“ClusterRandomizedTrialswithperson‐leveloutcomes,”“ClusterRandomizedTrials,”andthen“Treatmentatlevel2.”You’llseeseveraloptionstogenerategraphs;choose“Powervs.Totalnumberofclusters(J).”


54

Anewwindowwillappear:

Selectα(alpha).You’llseeitisalreadysetto0.050fora95%significancelevel.Firstlet’sassumewewanttotestonly40studentsperschool.Howmanyschoolsdoyouneedtogotoinordertohaveastatisticallysignificantanswer?Click onn, which represents the number of students per school. Since we are testing only 40studentsperschool,fillinn(1)with40andclickOK.Now we have to determine δ (delta), the standard effect size (the effect size divided by thestandard deviation of the variable of interest). Assumewe are interested in detectingwhetherthereisanincreaseof10%intestscores.(Ormoreaccurately,areuninterestedinanincreaseofless than10%.)Ourbaseline survey indicated that the average test score is26,with a standarddeviationof20.Wewanttodetectaneffectsizeof10%of26,whichis2.6.Wedivide2.6bythestandarddeviationtogetδequalto2.6/20,or0.13.Selectδfromthemenu.Inthedialogueboxthatappearsthereisaprefilledvalueof0.20fordelta(1).Changethevalueto0.13,andchangethevalueofdelta(2)toempty.SelectOK.Finallyweneedtochooseρ(rho),whichistheintra‐clustercorrelation.ρtellsushowstronglytheoutcomesarecorrelatedforunitswithinthesamecluster.Ifstudentsfromthesameschoolwereclones (novariation) andall scored the sameon the test, thenρwould equal1. If, on theotherhand, students from the same schools are in fact independent—and there were no differencesbetweenschools‐thenρwouldequal0.Youhavedeterminedinyourpilotstudythatρis0.17.Fillinrho(1)to0.17,andsetrho(2)tobeempty.Youshouldseeagraphsimilartotheonebelow.


55

You’llnoticethatyourxaxisisn’tlongenoughtoallowyoutoseewhatnumberofclusterswould

giveyou80%power. Clickonthe buttontosetyourxaxismaximumto400.Thenyoucanclickonthegraphwithyourmousetoseetheexactpowerandnumberofclustersforaparticularpoint.

Exercise11. Howmanyschoolsareneededtoachieve80%power?90%power?


56

Nowyouhaveseenhowmanyclustersyouneedfor80%powerwhensampling40studentsperschool. Suppose instead that you only have the ability to go to 124 schools (this is the actualnumberthatwassampledintheBalsakhiprogram).Exercise2

1. Howmanychildrenareneededperschooltoachieve80%power?90%power?2. Choosedifferentvaluesforntoseehowyourgraphchanges.

Finally,let’sseehowtheIntraclassCorrelationCoefficient(ρ)changesthepowerofagivensample.Leaverho(1)tobe0.17butforcomparisonchangerho(2)to0.0.Youshouldseeagraphliketheonebelow.Thesolidbluecurveistheonewiththeparametersyou’veset.Thebluedashedcurveisthereforcomparison–toseehowmuchpoweryouwouldgetfromyoursampleifρwerezero.Lookcarefullyatthegraph.Exercise3

1. Howdoesthepowerofthesamplechangewiththeintraclasscorrelationcoefficient(ρ)?Totakealookatsomeoftheothermenuoptions,closethegraphbyclickingonthe inthetoprighthandcorneroftheinnerwindow.SelecttheClusterRandomizedTrialmenuagain.


57

Exercise4

1. Try generating graphs for how power changes with the cluster size (n), intra‐classcorrelation(rho)andeffectsize(delta).Youwillhavetore‐enteryourpre‐testparameterseachtimeyouopenanewgraph.


58

CASESTUDY4:DEWORMINGINKENYA

ThiscasestudyisbasedonEdwardMiguelandMichaelKremer,“Worms:IdentifyingImpactsonEducationandHealthinthePresenceofTreatment

Externalities,”Econometrica72(1):159‐217,2004


Case 4: Deworming in KenyaManaging threats to experimental integrity

Case4:DeworminginKenyaManagingthreatstoexperimentalintegrity


59

KeyVocabulary1. Phase‐in Design: a study design in which groups are individually phased into treatment over a

period of time; groups which are scheduled to receive treatment later act as the comparison groups

in earlier rounds.

2. Equivalence: groups are identical on all baseline characteristics, both observable and

unobservable. Ensured by randomization.

3. Attrition: the process of individuals joining in or dropping out of either the treatment or

comparison group over the course of the study.

4. Attrition Bias: statistical bias which occurs when individuals systematically join in or drop out of

either the treatment or the comparison group for reasons related to the treatment or outcomes.

5. Partial Compliance: individuals do not comply with their assignment (to treatment or

comparison). Also termed "diffusion" or "contamination."

6. Intention to Treat: the measured impact of a program that includes all data from participants in

the groups to which they were randomized, regardless of whether they actually received the

treatment. Intention‐to‐treat analysis prevents bias caused by the loss of participants, which may

disrupt the baseline equivalence established by randomization and which may reflect non‐

adherence to the protocol.

7. Treatment on the Treated: the measured impact of a program that includes only the data for

participants who actually received the treatment.

8. Externality: an indirect cost or benefit incurred by individuals who did not directly receive the treatment. Also termed "spillover."

Worms‐Acommonproblemwithacheapsolution

Worminfectionsaccountforover40percentoftheglobaltropicaldiseaseburden.Infectionsarecommon in areas with poor sanitation. More than 2 billion people are affected. Children, stilllearning good sanitary habits, are particularly vulnerable: 400 million school‐age children arechronicallyinfectedwithintestinalworms.Thesymptomsassociatedwithworminfectionsincludelistlessness,diarrhea,abdominalpain,andanemia.Beyondtheireffectsonhealthandnutrition,heavyworminfectionscanimpairchildren’sphysicalandmentaldevelopmentandreducetheirattendanceandperformanceinschool.Poorsanitationandpersonalhygienehabitsfacilitatetransmission.Infectedpeopleexcretewormeggsintheirfecesandurine.Inareaswithpoorsanitation,theeggscontaminatethesoilorwater.Otherpeopleareinfectedwhentheyingestcontaminatedfoodorsoil(hookworm,whipworm,androundworm),orwhenhatchedwormlarvaepenetratetheirskinuponcontactwithcontaminatedsoil (hookworm) or fresh water (schistosome). School‐age children are more likely to spreadwormsbecause theyhaveriskierhygienepractices (more likely toswim incontaminatedwater,


60

morelikelytonotusethelatrine,lesslikelytowashhandsbeforeeating).Sotreatingachildnotonly reduces her ownworm load; itmay also reduce disease transmission—and so benefit thecommunityatlarge.Treatmentkillswormsinthebody,butdoesnotpreventre‐infection.Oralmedicationthatcankill99percentofwormsinthebodyisavailable:albendazoleormebendazolefortreatinghookworm,roundworm,andwhipworminfections;andpraziquantelfortreatingschistosomiasis.Thesedrugsarecheapandsafe.Adoseofalbendazoleormebendazolecostslessthan3UScentswhileonedoseofpraziquantelcostslessthan20UScents.Thedrugshaveveryfewandminorsideeffects.Wormscolonizetheintestinesandtheurinarytract,buttheydonotreproduceinthebody;theirnumbers build up only through repeated contact with contaminated soil or water. The WHOrecommendspresumptive school‐basedmassdeworming in areaswithhighprevalence. Schoolswithhookworm,whipworm,androundwormprevalenceover50percentshouldbemasstreatedwith albendazole every 6months, and schoolswith schistosomiasis prevalence over 30 percentshouldbemasstreatedwithpraziquantelonceayear.

PrimarySchoolDewormingProgram

International Child Support Africa (ICS) implemented the Primary School Deworming Program(PSDP)intheBusiaDistrictinwesternKenya,adensely‐settledregionwithhighwormprevalence.TreatmentfollowedWHOguidelines.ThemedicinewasadministeredbypublichealthnursesfromtheMinistryofHealthinthepresenceofhealthofficersfromICS.ThePSDPwasexpectedtoaffecthealth,nutrition,andeducation.Tomeasureimpact,ICScollecteddata on a series of outcomes: prevalence of worm infection, worm loads (severity of worminfection);self‐reportedillness;andschoolparticipationratesandtestscores.

Evaluationdesign—theexperimentasplanned

Because of administrative and financial constraints the PSDP could not be implemented in allschools immediately. Instead, the75schoolswererandomlydivided into3groupsof25schoolsandphased‐inover3years.Group1schoolsweretreatedstartinginboth1998and1999,Group2schoolsin1999,andGroup3startingin2001.Group1schoolswerethetreatmentgroupin1998,whileschoolsGroup2andGroup3were thecomparison. In1999Group1andGroup2schoolswerethetreatmentandGroup3schoolsthecomparison.

1998 1999 2001Group1 Treatment Treatment Treatment

Group2 Comparison Treatment Treatment

Group3 Comparison Comparison Treatment


61

ThreatstointegrityoftheplannedexperimentDiscussionTopic1:Threatstoexperimentalintegrity

Randomizationensuresthatthegroupsareequivalent,andthereforecomparable,atthebeginningof the program. The impact is then estimated as the difference in the average outcome of thetreatment group and the average outcome of the comparison group, both at the end of theprogram.Tobeabletosaythattheprogramcausedtheimpact,youneedtobeabletosaythattheprogramwastheonlydifferencebetweenthetreatmentandcomparisongroupsoverthecourseoftheevaluation.

1. Whatdoesitmeantosaythatthegroupsareequivalentatthestartoftheprogram?2. Canyoucheckifthegroupsareequivalentatthebeginningoftheprogram?How?3. Otherthantheprogram’sdirectandindirectimpacts,whatcanhappenoverthecourseofthe

evaluation(afterconductingtherandomassignment)tomakethegroupsnon‐equivalent?4. Howdoesnon‐equivalenceattheendthreatentheintegrityoftheexperiment?

Managingattrition‐Whenthegroupsdonotremainequivalent

Attritioniswhenpeoplejoinordropoutofthesample—bothtreatmentandcomparisongroups—overthecourseof theexperiment.Onecommonexample inclinical trials iswhenpeopledie;socommonindeedthatattritionissometimescalledexperimentalmortality.DiscussionTopic2:Managingattrition

Youarelookingatthehealtheffectsofdeworming.Inparticularyouarelookingatthewormload(severityofworminfection).Wormloadsarescaledasfollows:

‐ Heavyworminfections=scoreof3‐ Mediumworminfections=scoreof2‐ Lightinfections=scoreof1

Thereare30,000children:15,000 in treatmentschoolsand15,000 incomparisonschools.Afteryourandomize,thetreatmentandcomparisongroupsareequivalent,meaningchildrenfromeachofthethreecategoriesareequallyrepresentedinbothgroups.Supposeprotocolcomplianceis100percent:allchildrenwhoareinthetreatmentgettreatedandnoneofthechildreninthecomparisonaretreated.Childrenthatweredewormedatthebeginningoftheschoolyear(thatis,childreninthetreatmentgroup)endupwithawormloadof1attheendoftheyearbecauseofre‐infection.Childrenwhohaveawormloadof3onlyattendhalfthetimeanddropoutofschooliftheyarenottreated.Thenumberofchildrenineachworm‐loadcategoryisshownforboththepretestandposttest.


62

Pretest PosttestWormLoad Treatment Comparison Treatment Comparison3 5,000 5,000 0 Droppedout2 5,000 5,000 0 5,0001 5,000 5,000 15,000 5,000Total childrentestedatschool 15,000 15,000 15,000 10,000

1. a) Atposttest,whatistheaveragewormloadforthetreatmentgroup

b) Atposttest,whatistheaveragewormloadforthecomparisongroup?c) Whatisthedifference?d) Isthisoutcomedifferenceanaccurateestimateoftheimpactoftheprogram?Whyor

whynot?e) Ifitisnotaccurate,doesitoverestimateorunderestimatetheimpact?f) Howcanwegetabetterestimateoftheprogram’simpact?

2. Besides worm load, the PSDP also looked at outcomemeasures such as school attendanceratesandtestscores.

a) Would differential attrition (i.e. differences in drop‐outs between treatment andcomparisongroups)biaseitheroftheseoutcomes?How?

b) Would the impacts on these final outcome measures be underestimated oroverestimated?

3. In Case 1, you learned about othermethods to estimate program impact, such as pre‐post,

simpledifference,differencesindifferences,andmultivariateregression.a) Doesthethreatofattritiononlypresentitselfinrandomizedevaluations?

Managingpartialcompliance‐whenthetreatmentgroupdoesnotgettreatedorthecomparisongroupdoesgettreatedSomepeopleassignedtothetreatmentmayintheendnotactuallygettreated.Inanafter‐schooltutoringprogram,forexample,somechildrenassignedtoreceivetutoringmaysimplynotshowupfortutoring.Andtheothersassignedtothecomparisonmayobtainaccesstothetreatment,eitherfrom the program or from another provider. Or comparison group childrenmay get extra helpfromtheteachersoracquireprogrammaterialsandmethodsfromtheirclassmates.Inanyofthesescenarios, people are not complying with their assignment in the planned experiment. This iscalled “partial compliance” or “diffusion” or, less benignly, “contamination.” In contrast tocarefully‐controlledlabexperiments,diffusionisubiquitousinsocialprograms.Afterall,lifegoeson,peoplewillbepeople,andyouhavenocontroloverwhattheydecidetodooverthecourseoftheexperiment.Allyoucandoisplanyourexperimentandofferthemtreatments.How,then,canyoudealwiththecomplicationsthatarisefrompartialcompliance?


63

DiscussionTopic3:Managingpartialcompliance

Supposenoneofthechildrenfromthepoorestfamilieshaveshoesandsotheyhavewormloadsof3.Thoughtheirparentshadnotpaidtheschool fees, thechildrenwereallowedtostay inschoolduringtheyear.Parentalconsentwasrequiredfortreatment,andtogiveconsent,theparentshadtocometotheschoolandsignaconsentformintheheadmaster’soffice.However,becausetheyhadnotpaidschoolfees,thepoorestparentswerereluctanttocometotheschool.Consequently,none of the children with worm loads of 3 were actually dewormed. Their worm load scoresremained3attheendoftheyear.Nooneassignedtocomparisonwastreated.Allthechildreninthesampleatthebeginningoftheyearwerefollowedup,ifnotatschoolthenathome. Pretest PosttestWormLoad Treatment Comparison Treatment Comparison3 5,000 5,000 5000 5,0002 5,000 5,000 0 5,0001 5,000 5,000 10,000 5,000Total childrentestedatschool

15,000 15,000 15,000 15,000

1. a) Calculatetheimpactestimatebasedontheoriginalgroupassignments.

b) Thisisanunbiasedmeasureoftheeffectoftheprogram,butinwhatwaysisitusefulandinwhatwaysisitnotasuseful?

2. Youareinterestedinlearningtheeffectoftreatmentonthoseactuallytreated(“treatmenton

thetreated”orToTestimate).Fiveofyourcolleaguesarepassingbyyourdesk;theyallagreethatyoushouldcalculatetheeffectofthetreatmentusingonlythe10,000childrenwhoweretreated.

a) Isthisadvicesound?Whyorwhynot?

3. Anothercolleaguesaysthatit’snotagoodideatodroptheuntreatedentirely;youshouldusethembutconsiderthemaspartofthecomparison.


4. Another colleague suggests that you use the compliance rates, the proportion of people ineachgroupthatdidordidnotcomplywiththeirtreatmentassignment.Youshoulddividethe“intentiontotreat”estimatebythedifferenceinthetreatmentratios(i.e.proportionsofeachexperimentalgroupthatreceivedthetreatment).



64

Managingspillovers—whenthecomparison,itselfuntreated,benefitsfromthetreatmentgroupbeingtreated

People assigned to the control groupmaybenefit indirectly from those receiving treatment. Forexample, aprogramthatdistributes insecticide‐treatednetsmayreducemalaria transmission inthecommunity,indirectlybenefitingthosewhothemselvesdonotsleepunderanet.Sucheffectsarecalledexternalitiesorspillovers.DiscussionTopic4:Managingspillovers

In thedewormingprogram, randomizationwasat the school level.However,while all boysat agiventreatmentschoolweretreated,onlygirlsyoungerthanthirteenreceivedthedewormingpill.ThiswasduetothefactthattheWorldHealthOrganization(WHO)hadnottested(andthusnotyetapproved)thedewormingpillforpregnantwomen.Becauseitwasdifficulttodeterminewhichgirlswereatriskofgettingpregnant,theprogramdecidedtonotadministerthemedicationtoanygirlthirteenorolder.(Postscript:sincethedewormingevaluationwasimplemented,theWHOhasapprovedthedewormingmedicationforpregnantwomen).Thusatagiventreatmentschool,therewasadistinctgroupofstudentsthatwasnevertreatedbutlivedinverycloseproximitytoagroupthatwastreated.Supposeprotocolcomplianceis100percent:allboysandgirlsunderthirteenintreatmentschoolsgettreatedandallgirlsthirteenandoverintreatmentschoolsaswellasallchildrenincomparisonschoolsdonotgettreated.Youcanassumethatduetoproperrandomization,thedistributionofwormloadacrossthethreegroupsofstudentsisequivalentbetweentreatmentandcontrolschoolspriortotheintervention.

PosttestTreatment Comparison

WormLoad Allboys Girls<13yrs Girls≥13yrs Allboys Girls<13yrs Girls≥13yrs

3 0 0 0 5000 2000 20002 0 0 2000 5000 3000 30001 10000 5000 3000 0 0 0

Totalchildrentestedatschool

20000 20000

1. a) Ifthereareanyspillovers,wherewouldyouexpectthemtoshowup?

b) Isitpossibleforyoutocapturethesepotentialspillovereffects?How?

2. a) Whatisthetreatmenteffectforboysintreatmentv.comparisonschools?b) Whatisthetreatmenteffectforgirlsunderthirteenintreatmentv.comparisonschools?c) Whatisthedirecttreatmenteffectamongthosewhoweretreated?d) What is the treatment effect for girls thirteen and older in treatment v. comparison

schools?


65

e) Whatistheindirecttreatmenteffectduetospillovers?f) Whatisthetotalprogrameffect?

References:Crompton, D.W.T. 1999. “How Much Helminthiasis Is There in the World?” Journal ofParasitology85:397–403.Kremer,MichaelandEdwardMiguel.2007.“TheIllusionofSustainability,”QuarterlyJournalofEconomics122(3)Miguel, Edward, and Michael Kremer. 2004. “Worms: Identifying Impacts on Education andHealthinthePresenceofTreatmentExternalities,”Econometrica72(1):159‐217.Shadish,William R, Thomas D. Cook, and Donald T. Campbell. 2002.Experimental andQuasi‐ExperimentalDesignsforGeneralizedCausalInference.Boston,MA:HoughtonMifflinCompanyWorld Bank. 2003. “School Deworming at a Glance,” Public Health at a Glance Series.http://www.worldbank.org/hnpWHO.1999.“TheWorldHealthReport1999,”WorldHealthOrganization,Geneva.WHO.2004.“ActionAgainstWorms”PartnersforParasiteControlNewsletter,Issue#1,January2004,www.who.int/wormcontrol/en/action_against_worms.pdf


66

GroupPresentation

Participants will form 4‐6 person groups which will work through the design process for arandomisedevaluationofadevelopmentproject.Groupswillbeaidedinthisprojectbyboththefacultyandteachingassistants,withtheworkculminatinginpresentationsattheendoftheweek.

Thegoalofthegrouppresentationistoconsolidateandapplytheknowledgeofthelecturesandtherebyensurethatparticipantsleavewiththeknowledge,experience,andconfidencenecessarytoconducttheirownrandomisedevaluations.Weencouragegroupstoworkonprojectsthatarerelevanttoparticipants’organisations.

All groups will present on Friday. The 15‐minute presentation is followed by a 15‐minutediscussion led by J‐PAL affiliates and staff. We provide groups with template slides for theirpresentation(seenextpage).Whilethegroupsdonotneedtofollowthisexactly,thepresentationshouldhavenomore than9slides (including title slide,excludingappendix)andshould includethefollowingtopics:

Briefprojectbackground Theoryofchange Evaluationquestion Outcomes Evaluationdesign Dataandsamplesize Potentialvaliditythreatsandhowtomanagethem Disseminationstrategyofresults

Pleasetimeyourselfanddonotexceedtheallottedtime.Wehaveonlyalimitedamountoftimeforthesepresentationsandfollowastricttimelinetobefairtoallgroups.


67

GroupPresentationTemplate


68


69

j pal courseexecutivesocialcourse 16participants 19 ...€pal lecturers .. 9 list of participants...

Documents