angus deaton, princeton university india international center, october 15 th, 2012

Angus Deaton, Princeton UniversityIndia International Center, October 15th, 2012

Evidence for policy

Everyone agrees that policies should be based on evidence

Much less agreement about the nature of the evidence What methods should be used? Is there a hierarchy of evidence?

Are some kinds of evidence better than others? Are randomized controlled trials the gold standard?

How do we move from evidence to policy? Rigorous evidence is of limited value if the step

to policy is not well-justified Two steps: developing evidence, adapting to

policy, and outcome depends on weakest link

2

Running examples

Building dams Do dams lead to poverty reduction?

Sanitation Total sanitation campaign (TSC) and its effects on

child mortality and child health How should such schemes be implemented?

Microfinance Is MF an effective tool for poverty reduction?

Food subsidies In kind versus cash? PDS versus CCTs

In general: “finding out what works” “Rigorous evaluation of CCTs has shown that they

work” Is this true, and if so, what does it mean for India? Or

anywhere else?3

Background

The “failure” of development economics and the whole development project Cycling fashions at the World Bank

Infrastructure, structural adjustment, education, health, women., political economy, governance . . . infrastructure

Not just the Bank, but the development community (or at least the community of “developers”)

Unconstrained by evidence Bank unable to document its contribution, if any Deep skepticism about its own internal evaluations

Many argued that there had been little or no progress Much less so now, though remains unclear whether

the development effort by rich countries was positive

4

Diagnosing the problem

Many possible stories for this state of affairs

One story is a failure to learn from experience No systematic, “rigorous,” evaluation procedure for

projects Casual empirical evaluation does not give credible

answers We need “rigorous” and “credible” evidence on what

works If the Bank had done this on all of its projects in the

past, we would know what works by now, and poverty would be history

Is this just the latest turn of the wheel of fashion, or is there some truth to this?

5

Better empirical analysis Certainly true that the quality of empirical

analysis was often weak Correlations that were obviously not causation Chinese railways

Randomized controlled trials seem to offer solutions to these issues They establish causality Solution to the statistical problems of bias,

selection, omitted variables (confounding) etc. These arguments have been very

successful In World Bank, among foundations J-PAL and others doing many experiments

6

Chorus of approval

“The World Bank is finally embracing science” Lancet editorial, 2004

“Creating a culture in which rigorous randomized evaluations are promoted, encouraged, and financed has the potential to revolutionize social policy during the 21st century, just as randomized trials revolutionized medicine during the 20th.” Esther Duflo, 2004 Did RCTs revolutionize medicine?

“Britain has given the world Shakespeare, Newtonian physics, the theory of evolution, parliamentary democracy—and the randomized trial” BMJ editorial, 2001.

7

What is an RCT?

Trial population is randomly divided into two groups, experimentals and controls Experimentals get treatment Controls get none Average outcome in experimental group minus

average outcome in control group tells us if the treatment works, and by how much on average

An RCT estimates an average treatment effect In general, each person (unit) will have a different

treatment effect We cannot observe these for each individual But RCT gives us the average for the group, which is a

lot! Minimal assumptions, absence of bias,

establishing causality are big advantages But is this really the only “rigorous” evaluation?

8

Examples again

CCTs in Mexico (Progresa), some villages got CCTs, some did not Better average outcomes for treatment villages Random selection means it must have been the CCT, not

something else What do we learn?

Will it work in India? External validity. Will it work for a specific village in Mexico? Why did it work? If we knew, we could answer two questions? Controls knew they were going to get CCTs later? Does that

matter? Mexico had a system of clinics: hard to take kids to a non-existent

clinic Big issue today for Santiago Levy at IADB today

Dams: not possible to do randomized dam construction!! So RCTs cannot be done in all cases Some have argued that policies should not be implemented in

these cases Do many things routinely for which there have been no RCTs!

9

Alternative methods

Rohini Pande and Esther Duflo’s work on dams used placement of dams and NSS data on poverty

Dean Spears’ work on TSC uses NFHS and other survey data on health in conjunction with administrative data

Alternative methods of estimating average treatment effects Weaker than RCTs in some respects

Causality, selection, bias are not automatic and must be argued

More assumptions Stronger in other respects

Access distribution of treatment effects, not just the average Usually much larger samples Triangulation helps to pin down mechanisms at work RCTs good at saying what happened, not good at saying why

Ex post fairy stories (just-so stories) without evidence

10

Small RCTs

Are often not large enough to be reliable Expensive to do, so this is not a matter that is easily fixed In a small trial, a few outliers can wreak havoc Example might be microfinance, where one or two women

might be able to do really well, and the rest not at all Get lots of weird and counterintuitive results No idea if they are real, or method is just broken Doubt one can learn anything from a trial of 10 experimental

villages and 10 control villages in CCT experiment Experiment is often conducted on a convenience sample

Not easy to get cooperation from all relevant units: e.g. in looking at CCT, those opposed to the idea might be less willing to cooperate

Results are correct only for the convenience population Not for population that would be affected by the policy

Gold standard rhetoric protects results from questioning

11

Large scale RCTs

Use all of the units in a country PDS/CCT experiment for all of rural India

Comparable to large social experiments in the US in the 70s NJ income tax experiment, SIME/DIME Rand Health experiment

Rand experiment is an important part of the debate today, others not

Ex post data mining Null result is never acceptable to the sponsors Enormous pressure on investigators to find something Usually by subgroup analysis, or looking for other outcomes

MTO has now examined thousands of outcomes Some of the statistically significant ones are spurious And we are back to the small sample problem again

Large experiments not decisive either

12

Dynamic effects

Many policies take time to work out Lots of things work as intended in the short-run, fail later People learn to “work the system” Food rationing in Britain during the war:

Excellent at first, big nutritional benefits, solidarity Crooks (“spivs”) learned to exploit it and create a black

market Support eventually vanished, when it was continued too long

Old age pensions in South Africa: cash transfer Burial insurers were allowed on site to get first access to

recipients Higher level corruption: banks?

Procurement and supply effects in food policy What would an RCT show?

It works! Expensive and unethical to continue the experiment

We get the wrong answer, or only part of the answer Issue in medicine too

13

Using a perfect evaluation Suppose we have a result, e.g.

On average, CCTs make people happier than PDS On average, dams increase poverty On average, reducing open defecation improves child health and

reduces mortality Suppose also that these were all done perfectly, so there is

no dispute about the conclusions Which, of course, never happens!

What use can we make of those results in policy? Should the Planning Commission ban new dams? Should MRD encourage better sanitation? Should we replace PDS by CCTs?

That dams don’t work on average tells us little about any individual dam It is an individual dam that comes up for approval, not all dams! We needs to know more, why dams cause poverty, under what

circumstances, none of which comes from an RCT

15

What should a village do? Or any local authority that decides

Given an RCT about CCT v PDS Again, the average is useful but not decisive

Will it have the same effect for us? We are not the average village Again, we need to know why it works, not whether it works

Neighboring village tried and is happy with the outcome Perhaps this is just an anecdote (“your uncle likes his new

TV”) But for the village, the average outcome is an anecdote too Perhaps the authorities should visit their neighbors and see

what is going on, see if it would work for them Average is more useful for a public health policy that

will be applied to the whole country Sanitation?

16

Finding out what works?

A trial and error process But T & E is NOT the same as an RCT T & E, endless tinkering, is a good

description of the Industrial Revolution How to invent a steam engine, or a toaster How medical science works, on procedures

and devices For which trials are close to irrelevant, and

in many cases have never been done T & E using knowledge and intelligence

can solve the dimensionality problem17

Seeing into the machine

Allows a village, the ministry, or the Planning Commission to make a better choice It may be able to see whether it would work for

them It may be able to see places where they could adapt

it and make it better Hope to understand the process & how it would work

in context Trial and error, plus local knowledge, hard

thought Experimentation but not necessarily RCTs What are the “helping factors” that made a trial

work? E.g. clinics in Mexico!

Can teach us why things work which is generalizable knowledge

18

Causality & helping factors Do not RCTs reveal causality?

It was the treatment that did it! Not something else Is this not particularly helpful in policy? Yes and no. Causality, by itself, is not always useful

The house burned down because the TV was left on Causal, but not general: TVs do not usually burn down houses RCT would show this causal effect But TVs need “helping factors” like bad wiring, or inflammable material left

nearby We have to think about what are the helping factors, how they

work, and whether they will work for us Will a CCT work in a particular village, or during food price inflation, or in a

competent v a corrupt state Does it need banks, or clinics to make it work? Does it matter who gets it? Men and women: gender issues in India v Latin

America Replication of an RCT is not useful, because get different results

in different contexts with or without helping factors Causality is “local”

19

Cartwright: Local causality

Open window A, and fly kite B, String C opens door D, which allows moths E to escape and eat shirt F. Lighter shirt lowers shoe G on to switch H which heats iron I which burnspants J. Smoke K enters tree L and smokes out possum M into basket N, pulling rope O,and lifting cage P, allowing woodpecker Q to chew pencil R. (Emergency knife S in case woodpecker or possum gets sick and can’t work.) 20

Expanding literature

We now have enough RCT papers to judge their quality and the evidence that they claim Some excellent, some terrible Just like other empirical papers in development But they must be judged case by case, like all other

empirical work There is no free pass, just because they are RCTs Using the word “rigorous evaluation” as a code word

for RCT is without justification Right now, in economics, and aid literature, they are

being given a free pass. Sometimes absurd generalizations based on small

special RCTs

RCTs have no monopoly on rigour, there is no gold standard

21

angus deaton, princeton university india international center, october 15 th, 2012

Documents