evaluation methods for measuring the impact of social protection programs

Evaluation methods for measuring the impact of social protection programs

Joost de Laat, Menahem Prywes, Shafique JamalThe World Bank

Objectives:

Understand:

Principles of the difference in the differences method of project evaluation and weaknesses of the method.

Principles of the randomized controlled trial (RCT) method. Limits and weaknesses of the randomized controlled trial

method. Principles of the regression discontinuity design (RDD)

method.

How donors evaluated development projects. Donors often couldn’t evaluate development projects, and

especially health projects, convincingly because: No one bothered to collect baseline data, No one tracked the treatment (beneficiary) group over time.

Sometimes donors collected this information and then measured changes in the treatment group over time. However it remained unclear whether this performance was

better or worse to the comparator (treatment) groups. Sometimes, donors applied the difference in the differences

methods. This compares results from the treatment group to results from a

control group. But it’s often unclear whether the comparator group really is

comparable to the treatment group. Parliaments and donors increasingly demand credible evaluations!

Souce: Prashant Bharadwaj

Difference in differences methodology-1

Difference in differences methodology-2Source: Prashant Bharadwaj

Difference in differences methodology-3Source: Prashant Bharadway

Difference in the differences: simple Difference in the differences methodology: simple numerical example.Source: Prashant Bharadway

In contrast to the Difference in the Differences method, randomized controlled trials seek to make valid comparisons between outcomes for treatment and control groups.

Randomization establishes a control group that is statistically identical to the intervention group. This produces unbiased results.

Randomization reduces selection bias, for example Undercoverage: some parts of the population are under-

represented in the sample. Self-selection: people who agree to participate in the trial

have special characteristics, i.e. strong opinions on an issue. Nonresponse: bias: participants who do not respond may

have particular views or other characteristics.

Unit of RandomizationChoose according to type of program

o Individual/Householdo School/Health Clinic/catchment

areao Block/Village/Communityo Ward/District/Region

Keep in mindo Need “sufficiently large” number of units to detect

minimum desired impact: Power.o Spillovers/contaminationo Operational and survey costs

As a rule of thumb, randomize at the smallest viable unit of implementation.

Example: Randomized Assignment

Mexico Progresa Conditional Cash Transfer program

Unit of randomization: Community

o 320 treatment communities (14446 households): First transfers in April 1998.

o 186 comparison communities (9630 households): First transfers November 1999

506 communities in the evaluation sample

Randomized phase-in


Treatment Communities

320

Comparison Communities

186

Time

T=1T=0

Comparison Period


How do we know we have good clones?

In the absence of Progresa, treatment and comparisons should be identical

Let’s compare their characteristics at baseline (T=0)

Example: Balance at Baseline

Case 3: Randomized AssignmentTreatment Comparison T-stat

Consumption($ monthly per capita) 233.4 233.47 -0.39Head’s age (years) 41.6 42.3 -1.2Spouse’s age(years) 36.8 36.8 -0.38Head’s education (years) 2.9 2.8 2.16**Spouse’s education (years) 2.7 2.6 0.006

Note: If the effect is statistically significant at the 5% significance level, we label the estimated impact with 2 stars (**).

Example: Balance at Baseline

Case 3: Randomized AssignmentTreatment Comparison T-stat

Head is female=1 0.07 0.07 -0.66Indigenous=1 0.42 0.42 -0.21Number of household members 5.7 5.7 1.21Bathroom=1 0.57 0.56 1.04Hectares of Land 1.67 1.71 -1.35Distance to Hospital (km) 109 106 1.02


Example: Randomized AssignmentTreatment Group(Randomized to

treatment)

Counterfactual (Randomized to

Comparison)

Impact(Y | P=1) - (Y | P=0)

Baseline (T=0) Consumption (Y) 233.47 233.40 0.07Follow-up (T=1) Consumption (Y) 268.75 239.5 29.25**

Estimated Impact on Consumption (Y)

Linear Regression 29.25**Multivariate Linear Regression 29.75**


Keep in MindRandomized Assignment

In Randomized Assignment, large enough samples, produces 2 statistically equivalent groups.

We have identified the perfect clone.

Randomized beneficiary

Randomized comparison

Feasible for prospective evaluations with over-subscription/excess demand.

Most pilots and new programs fall into this category.

!

Limits on randomized controlled trials

Out of sample generalization: Results from these trials are internally valid but cannot be generalized (extrapolated) out of sample. An inference of general validity of a result would require an internally consistent theory of causation and repeated randomized controlled trials in different countries, demographic rules, and natural environments.

Results are comparisons of averages. Therefore the results of a randomized controlled trial may not be valid for making policies for sub-groups or for individual households and people –especially if the policymaker has additional information.

Risks of bias in the randomized controlled trial methodology

Self selection out of the control group. Randomized controlled trials in the social sciences are not double blind, like pharmaceuticals trials. The people who are not receiving the treatment (for example, tutoring, or nutritional supplements) may decide to obtain these on their own, biasing the results.

Replacement of drop-outs may lead to bias.

Limits to use of randomized controlled trials

Randomized controlled trials are expensive. They can cost any where from $150,000 to several million dollars. A $500,000 cost is typical. This means the method cannot be applied to all development projects.

Many development projects do not address units that can be randomized. For instance states or provinces/oblasts cannot be meaningfully randomized.

Ethical rules are unclear. In medical research, participation in a randomized controlled trial requires informed consent. There are no general rules for economic development project. US universities and some developing countries have ethical rules.

Subtle conflicts of interests and biases can prejudice all evaluation studies –whatever the methodology.

Sponsors’ conflict of interest. Donors, governments, project units, and NGOs prefer to report positive findings because this helps to sustain their business and jobs. Sometimes, project units resist or even refuse payment to contractors who deliver negative evaluation reports.

Contractors’ conflict of interest. The contractors who carry-out the evaluation studies may be influenced by their clients preferences.

Confirmation bias. Donors, governments, project units, NGOs often believe that outcomes are positive and tend to perceive positive outcomes. Also, officials, managers, and development experts come to identify personally with the projects. Their psychological bias is, ‘I mean well, therefore the project is successful.’

Publication bias. Scholarly journals prefer to publish positive results and generally neglect negative results (‘no effect’ is not newsworthy). This may induce bias in academic work.

Economic and ethical questions

When should donors and governments insist on application of the randomized controlled trial methodology and when is this inappropriate?

When is it unethical to use the randomized controlled trials methodology in a development context?

Regression Discontinuity Design Many social programs select beneficiaries using an index or score:

Anti-poverty Programs

Pensions

Education

Agriculture

Targeted to households below a given poverty index/income

Targeted to population above a certain age

Scholarships targeted to students with high scores on standarized text

Fertilizer program targeted to small farms less than given number of hectares)

Regression Discontinuity DesignExample: Effect of social assistance program on nutrition

Reduce vulnerability and improve nutrition of poor families

Goal

o Households with a poverty score ≤50 are pooro Households with a poverty score >50 are not poor

Method

Poor households receive social assistance transfersIntervention

Regression Discontinuity Design-Baseline

Not eligible

Eligible

Regression Discontinuity Design-Post Intervention

IMPACT

Regression Discontinuity Design

We have a continuous eligibility index with a defined cut-offo Households with a score ≤ cutoff are eligibleo Households with a score > cutoff are not eligibleo Or vice-versa

Intuitive explanation of the method:o Units just above the cut-off point are very similar to units

just below it – good comparison.o Compare outcomes Y for units just above and below the cut-

off point.For a discontinuity design, you need: 1) Continuous eligibility index2) Clearly defines eligibility cut-off.

THANK YOU!

Questions?

Next: Tajikistan Example

evaluation methods for measuring the impact of social protection programs

Documents

differences methodology

comparator group

treatment beneficiary

intervention group

differences methods

simple difference

prashant bharadwaj difference

prashant bharadway difference