visualizing impact evaluation data quality · visualizing the quality of impact evaluation data...
TRANSCRIPT
Visualizing the Quality of Impact Evaluation Data Josef L. Loening World Bank The views expressed in this presentation are entirely those of the author; they do not necessarily represent the views of the World Bank and its affiliated Organizations, or those of the Executive Directors of the World Bank or the governments they represent.
OECD Seminar on Innovative Approaches to Turn Statistics into Knowledge, 8-10 December 2010, Cape Town
Outline
• Impact evaluation and community-driven development (CDD) projects
• Detecting unusual data with Benford’s law using STATA and WORDLE
• Project example from Africa – Household production of eggs
– Household production of maize
– Beneficiary financial contribution
• Conclusions
Impact Evaluation and CDD Projects • “Seeking the truth from facts” • Many practical and logistical
difficulties in monitoring and evaluation
• Analytical and data demands for valid interferences can be quite daunting
• Distortions in the market for knowledge about development effectiveness
• CDD projects: According to Bank’s independent evaluations rigorous monitoring and evaluation is often weak
• But: rising interest and support for evaluations
We look at three issues 1. Data quality underpinning
monitoring and evaluation: rarely analyzed
2. Asymmetric information: development practitioners cannot easily assess the quality of information
3. Project’s own monitoring databases: important, but typically underutilized
Distribution Anomalies and Benford’s Law
1,2,...,91
D )
1d
1(1
10log)
1P(D
about 30% of numbers begin with 1 about 5% of numbers begin with 9
Short Review
• Observation that the first pages of logarithmic tables were more worn out than the last pages
• Benford (1938) rediscovered the first digit phenomenon, using 20 different data sets
• Mathematical proof of the “random samples from random distributions theorem” and finding that Benford’s law is base and scale invariant
• In the literature, a number of rules and statistical tests are formulated on which data are expected to follow Benford’s law under certain conditions
• Nigrini (1996) was very influential in establishing Benford’s law as an indicator of fraud in finance and taxation
• Judge and Schlechter (2009) find that Benford’s distribution applies to detect unusual household survey data in developed and developing countries
Project Example
• Objective is to raise the production of food, incomes, and assets of participating households
• Implementation of small sub-projects, planned and managed by communities themselves
• Very large number of subprojects, covering the entire range of rural productive activities
• Unique database, which captures information for about 49.9 percent of sub-projects
“We had a very long but productive meeting, chaired by the Permanent Secretary. I mentioned about the cleaning of the monitoring and evaluation database, which generated a long discussion. In the end, it was agreed that the Project Coordination Unit completes the gap-filling data-entry exercise.”
“I came to realize that it required a lot of consultation with district officers as well as referring to district reports. In cases where there were different crop measurements, I had to agree with district officers on the equivalent weights to get the correct figures.”
“When data clerks were entering raw data they had to choose from a list of pre-coded information like ‘1’ or ‘2’ and when done in a hurry, they made some mistakes. This was more common in livestock, which is why in enterprises like ‘cattle’ products appeared to be ‘eggs’.
Digits for Egg Production 10
2030
400
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Egg production at baseline Egg production after CDD project
Per
cent
Digits
Number Clouds for Egg Production (per laying cycle)
At baseline
After CDD project
Revised Digits for Egg Production 10
2030
400
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Egg production at baseline Egg production after CDD project
Per
cent
Digits
Number Clouds for Household Maize Production (in kg) At baseline After CDD project
Digits for Maize Production
1020
3040
0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Maize production at baseline Maize production after CDD
Per
cent
Digits
Number Clouds for Beneficiary’s Own Financial Contribution
Revised
Original
Digits of Beneficiary’s Own Financial Contribution
First Digits
Second Digits
510
1520
2530
0
Pe
rcen
t
1 2 3 4 5 6 7 8 9Digits
68
1012
140
Pe
rcen
t
0 1 2 3 4 5 6 7 8 9Digits
Quantiles of CDD Project Financial Contribution to Beneficiaries
Note that thresholds and assigned numbers make Benford’s law not applicable!
0
1000
0020
0000
3000
00
CD
D fi
nanc
ial c
ontr
ibut
ion
per
hous
ehol
d (lo
cal c
urre
ncy)
0 .25 .5 .75 1Fraction of the data
Conclusions
• Simple, objective, and effective tool to screen quality of monitoring and evaluation survey data: – Livestock production is useable after outlier detection
– Crop production is not reliable due to poor measurement and/or other problems
– In our case, method not applicable to project’s financial data because of heavy censoring
• Graphical distribution analysis of continuously measured project performance indicators can enhance quality of results-based M&E systems
• Wide range of other applications with significant forward-looking potential to enhance quality of statistics and contribute to informed decision-making