which future for impact evaluation in the belgian ngo … · 2016-05-04 · which future for impact...

Which future for impact evaluation in the Belgian NGO sector ?

Lessons from four case studies

Synthesis

inpu

ts o

utputs

outcomes impact

KINGDOM OF BELGIUM

Federal Public Service

Foreign Affairs,Foreign Trade andDevelopment Cooperation

Offi ce of the Special Evaluator for Belgian Development Cooperation

KINGDOM OF BELGIUM

Federal Public ServiceForeign Affairs,Foreign Trade andDevelopment Cooperation


Editor: Dirk AchtenPresident of the ExecutiveGraphic Production: www.mediaprocess.be

Egmont • Rue des Petits Carmes 15, B-1000 Brussels • + 32 2 (0)2 501 38 34 www.diplomatie.belgium.be • www.dg-d.be • [email protected]

Dépôt légal n° 0218/2016/016

Wh

ich fu

ture

for im

pact e

valu

atio

n in

the B

elgia

n N

GO

secto

r ? - Lesso

ns fro

m fo

ur ca

se stu

die

s

Peru

. Pr

ojec

t ID

P ©

Sven

Kru

g

Tanz

ania

. Pr

ojec

t TR

IAS ©

AD

EIn

done

sia

©Ja

cque

line

Lién

ard

Phili

ppin

es ©

AD

E

Federal Public Service Foreign Affairs, Foreign Trade and Development Cooperation

Special Evaluation Office of the Belgian Development Cooperation

What does the future hold for impact evaluation in the Belgian NGO sector?

The lessons from four case studies

Synthesis Report

Volume I

January 2016

This evaluation has been produced by ADE (www.ade.eu) in collaboration with Focus Up, supported by a steering committee.

The opinions expressed in this document represent the authors' views and do not necessarily reflect those of the FPS Foreign Affairs, Foreign Trade and Development Cooperation.

© FPS Foreign Affairs, Foreign Trade and Development Cooperation.

January 2016

Graphic design: www.mediaprocess.be

Printing: Printing Service FPS

Evaluation No. S4/2014/02

Legal deposit: 0218/2016/015

This document is available in French, Dutch and English on the website: http://diplomatie.belgium.be/en/policy/development_cooperation/how_we_work/special_evaluation_office/reports and www.dgcd.be, or from the Special Evaluation Office. Annexes and management answers are available on the aforementioned websites as well.

This report should be cited as follows: Special Evaluation Office/SEO (2016), What does the future hold for impact evaluation in the Belgian NGO sector? The lessons from four case studies. FPS Foreign Affairs, Foreign Trade and Development Cooperation, Brussels.


The lessons from four case studies. 3

Contents

List of Boxes and Figures ........................................................................ 5

List of Appendices ................................................................................... 7

Abbreviations .......................................................................................... 9

Executive summary ............................................................................... 11

1. Introduction ................................................................................ 17

2. Elements of the legal context of the NGO sector in Belgium ........ 19

2.1 The current legal framework and the partnership approach ............ 19

2.2 "NGO programme" accreditation and the integrated policy ............. 20

2.3 The new legal framework and monitoring and evaluation ............... 21

3. Clarification of the concepts used to evaluate the 4 interventions ............................................................................... 23

3.1 Defining impact in the 4 case studies .......................................... 23

3.2 Type of methodological approach implemented in the case studies . 26

4. The selection procedure for the 4 interventions .......................... 31

4.1 The methodological approach for selection ................................... 31

4.2 Selection Criteria ...................................................................... 32

4.2.1 Criteria for the use of rigorous mixed methods ..................... 32

4.2.2 Criteria related to the context ............................................ 33

4.2.3 Criteria related to added value ........................................... 34

4.3 Results of the selection process .................................................. 34

4.4 Lessons to be learned from the selection process in terms of ex post impact evaluability ............................................................. 36

5. Responses to the summative evaluation questions for the 4 selected interventions .............................................................. 39

5.1 Access to drinking water in Peru (Iles de Paix) ............................. 40

5.1.1 Brief presentation of the intervention .................................. 40

5.1.2 Methodological approach ................................................... 41

5.1.3 Responses to the 7 summative evaluation questions ............. 42

5.2 Supporting the rice value chain in Indonesia (VECO) ..................... 47



5.2.3 Responses to the summative evaluation questions ................ 48

4 What does the future hold for impact evaluation in the Belgian NGO sector?

The lessons from four case studies.

5.3 Supporting the chicken and sunflower value chains in Tanzania (TRIAS) ................................................................................... 52




5.4 The right to health in the Philippines (TWHA) ............................... 57




6. Responses to the formative evaluation questions ....................... 63

6.1. EQ1 - Definitions of impact ........................................................ 63

6.2. EQ2 - Rigorous mixed methods .................................................. 70

7. Conclusions and lessons learned ................................................. 81

7.1 General conclusions .................................................................. 81

7.2 Conclusions on the summative evaluation questions ..................... 81

7.2.1 Relevance ........................................................................ 81

7.2.2 Impact evaluability ........................................................... 82

7.2.3 Achievement of outcomes .................................................. 83

7.2.4 Achievement of the impact ................................................ 84

7.2.5 Sustainability ................................................................... 84

7.2.6 Gender and environment ................................................... 85

7.2.7 The added value of a rigorous mixed approach ..................... 85

7.3 Conclusions on the formative evaluation questions ....................... 85

7.3.1 The definition of impact ..................................................... 85

7.3.2 The use of rigorous mixed methods: feasible, relevant and affordable .................................................................. 86

8. Recommendations ....................................................................... 89

8.1 Recommendation to the Federal Parliament and the Minister of Development Cooperation ......................................................... 89

8.2 Recommendations for all stakeholders ........................................ 89

8.3 Recommendations for the DGD .................................................. 90

8.4 Recommendations for the NGO .................................................. 93

8.5 Recommendations for the SEO ................................................... 96

9. Bibliography .............................................................................. 101



List of Boxes and Figures

List of Boxes

Box 1: Focus on the impact evaluation's objective and the methodological approach 30

Box 2: Non-representative sample of the interventions of Belgian NGOs ................. 36

Box 3: Summary of the responses to the first formative EQ - The debate around the definitions of impact ......................................................................... 63

Box 4: The learning process central to the evaluation ........................................... 67

Box 5: Summary of the response to the second formative EQ – Rigorous mixed methods ............................................................................................... 70

Box 6: Impact evaluation and ex post evaluation ................................................. 74

Box 7: The complex case of ‘policy influence’ type interventions ............................ 77

Box 8: The development in the overall development assistance context.................. 78

Box 9: The impact evaluation and the programme approach ................................. 80

Box 10: What is meant by "rigorous mixed methods"? ........................................... 98

List of Figures

Figure 1: Definitions of the logical framework concepts used for the 4 case studies 23

Figure 2: Causal sequence of the effects at the level of outcomes and impact ...... 24

Figure 3: Impact evaluation by mixed methods for this study ............................... 27

Figure 4: The selection procedure for the 4 interventions ..................................... 31

Figure 5: Diagram facilitating the selection of the interventions ............................ 35



List of Appendices

VOLUME II: APPENDICES TO THE MAIN REPORT

Appendix A: Timetable

Appendix B: Comparison of the old and new legal framework

Appendix C: List of the 22 interventions initially proposed

Appendix D: Tools for selecting the 4 interventions

Appendix E: Glossary

Appendix F: Special specifications

VOLUME III: EVALUATION REPORTS FOR THE 4 SELECTED INTERVENTIONS

Intervention 1: Rice project in Boyolali district in Indonesia (VECO)

Intervention 2: Value chain of chicken and sunflower project in two districts in Northern Tanzania (TRIAS)

Intervention 3: Access to drinking water in two municipalities of Huánuco in Peru (Iles de Paix)

Intervention 4: Right to health intervention in the Philippines (TWHA)



Abbreviations

CCA Common Contextual Analysis

NGCS Non-Governmental Cooperation Stakeholders

MD Ministerial Decree

RD Royal Decree

SC Steering Committee

DAC Development Assistance Committee (of OECD)

JSF Joint Strategic Framework

DGD Directorate General for Development Cooperation and Humanitarian Aid

IE Impact Assessment

ICS Internal Control System

IdP Iles de Paix

JASS Juntas Administradoras de Sistemas de Saneamiento or "Administration Committees for Sanitation Systems"

LSMS Living Standards Measurement Study

MBO Member Based Organisation

OECD Organisation for Economic Co-operation and Development

NGO Non-Governmental Organisation

ME Monitoring-Evaluation

FPS Office of the Special Evaluator of the belgian Development Cooperation – Federal Public Service Foreign Affairs, Foreign Trade and Development Cooperation

SMART Specific, Measurable, Achievable, Realistic (relevant) and Time-based Indicators

SNI Indonesian National Standard on Organic Rice

ToC Theory of change

TWHA Third World Health Aid

VECO Vredeseilanden



Executive summary

This report summarises the ex post impact evaluation of four Belgian non-governmental cooperation interventions. The interventions selected for this study were (i) support to the organic rice value chain in Boyolali district in Indonesia (VECO), (ii) access to drinking water in the Huánuco region in Peru (Iles de Paix), (iii) support to the sunflower and chicken production chains in Tanzania (TRIAS) and (iv) the right to health in the Philippines (M3M).

In this evaluation, impact is defined as referring to the effects induced by the project (contribution) at a general level. The term "outcome" refers to the direct and indirect effects that can be attributed to the intervention for the different types of beneficiaries (direct beneficiaries (partners), intermediate beneficiaries and final beneficiaries).

The sample of NGOs participating in this study and that of the 4 selected interventions are not representative of all the development actions conducted in the NGO sector in Belgium, or the evaluation practices of such stakeholders. There are strong indications that the 22 voluntarily proposed interventions were considered by the participating NGOs to be interventions that had been relatively successful. The 4 case studies were selected based on well-defined criteria so as to ensure the impact evaluation's feasibility within the time and budget constraints, while maximising the learning process. However, since the 12 NGOs were involved in the evaluation process, along with the NGO federations (Ngo-federatie and ACODEV) and DGD representatives, the considerations in this study are, at least partially, relevant to all development assistance stakeholders in our country.

A changing legal context

Non-governmental development cooperation in Belgium is undergoing substantial reform. Current legislation stipulates that the capacity building of local stakeholders (partnership approach) must be an objective of the interventions. The programme approach is preferred, along with the creation of a joint strategic framework in order to promote consistency in Belgian development cooperation interventions in the field. Furthermore, results-based management is encouraged in the "Development Results" Strategy Paper (DGD-2014) and the Special Evaluation Office is working to standardise and certify the evaluation systems.

Methodological approach

For each of the four selected interventions, the evaluators developed a mixed methodological approach for a rigorous evaluation of the development action's effects on the beneficiaries, especially the final beneficiaries (outcomes) and a general level (impact), specifically the effects at sector level and the extent of the effects geographically.

The four case studies were selected using three criteria: (i) the technical feasibility of using mixed methods; (ii) the context (during the intervention and the evaluation); (iii) the added value of the evaluation in terms of the cost/benefit ratio and learning for the NGO sector in Belgium. There were several steps in the selection process to determine whether or not these criteria were met by the 22 interventions put forward for this study by 12 NGOs: an analysis table of the projects based on documentary evidence, an interview with representatives from each NGO and a participatory workshop.



This participatory selection process identified three main obstacles to conducting an ex post impact evaluation using rigorous mixed methods: (i) the difficulty in accessing the final beneficiaries (absence of an exhaustive list of beneficiaries); (ii) the lack of resources (costs for collecting primary data from populations because NGOs often only have limited and unreliable data at their disposal); (iii) the difficult security situation (many NGOs work in fragile states).

Rigorous mixed methods

The evaluation favoured the use of mixed methods, i.e. a combination of quantitative and qualitative methods, structured around the theory of change.

The methodological approaches proposed for each of the selected interventions were not the same, but varied depending on the type of project and the data already available within the NGO. They were developed taking into account the time and financial constraints of this study. Although they were not equivalent in terms of scientific rigour, the proposed methodological approaches followed four rigorous practices:

(i) Relevant information collection tools, previously tested in the field and implemented by a competent local team, trained in the content of the tools and the use of tablets (to ensure relevant, reliable data collection and avoid reporting biases, i.e. the exaggeration of reality by respondents);

(ii) A random selection of areas to be visited and respondents (in order to prevent selection biases);

(iii) A counterfactual approach, to demonstrate the attribution of the observed changes to the intervention;

(iv) Reconstruction of the pre-intervention (baseline) situation based on questions drawing on people's memories (recall data) and based on secondary data.

Furthermore, a rigorous method of processing the information collected was respected. Appropriate econometric and statistical tools were used. The limits of the methodological approach and potential biases were clearly expressed and taken into account in the study's findings.

The quantitative approach used varied depending on the interventions. In two of the four case studies, primary data was collected using a household survey (in Indonesia and Peru). One of the case studies (in Tanzania) used secondary data available from the NGO and the World Bank. For the last case study (in the Philippines), there was no quantitative analysis due to a lack of available, reliable data.

The qualitative approach involved reconstructing, ex post, the theory of change for the selected interventions. Semi-structured interviews and focus group discussions took place with key stakeholders in Belgium and in the field. The counterfactual approach was also implemented in the qualitative approach and interviews and discussion groups were organised within a control population.

The use of rigorous mixed methods therefore aimed to ensure that the information collected was as objective as possible in order to draw credible conclusions about the effects that could be attributed to an intervention, to measure these effects where the data allowed, while explaining the underlying mechanisms behind the changes observed (or not).



Main findings of the evaluation

For each case study, the evaluation team answered seven summative evaluation questions: the intervention's relevance, its evaluability, achievement of the outcomes, achievement of the impact, the sustainability of the effects, the effects on cross-cutting themes (gender and environment) and finally, the added value of this type of rigorous methodological approach in terms of findings and methodology for the selected NGO. The findings from these summative questions are summarised in this report and explained in detail in the evaluation report for each intervention (available in the appendices).

Two formative evaluation questions were also used in this study: the first focused on the definition of the term 'impact' and the second on the feasibility, relevance and cost of applying rigorous mixed methods.

In general, this study presents the feasibility and the need to account for development assistance results for target populations. The rigorous impact evaluation is a relevant tool for accounting for aid effectiveness. In addition, it is the level of rigour with which the methodology is implemented that guarantees the credibility of an evaluation's findings rather than the type of methodological approach (qualitative or quantitative).

The selection process

In absolute terms, the rigorous methodological approach was feasible for all 22 interventions presented by the NGOs. However, once the cost/benefit ratio for deploying such a system (ex post) and the security situation in the intervention area were taken into account, just 5 interventions were eligible for conducting a rigorous ex post impact evaluation. Given that the study planned to conduct 4 case studies and that two interventions were from the same NGO, it was easy to select 4 interventions that satisfied all 3 selection criteria (explained below).

Relevance

The 4 selected interventions satisfied the priority needs of the beneficiary populations. The choice of partners and the intervention procedures proved to be relevant for two interventions, but for the other two the conclusion was more complex (questions remain as to the viability of the local structures without the NGO's financial support).

Impact evaluability

The monitoring and evaluation systems of the 12 participating NGOs are relatively sophisticated for assessing outputs and outcomes for the direct beneficiaries, i.e. the partners. However, they are weak when it comes to assessing the results for the final beneficiaries. Incentives to assess the effect of the interventions on the target populations are almost non-existent (from donors). The NGOs feel more accountable for the effects on their partners' capacity building than the effects on the beneficiary populations.

Qualitative and quantitative evaluation tools are used by some of NGOs but a lack of rigour is often observed in their application.

Achievement of outcomes

Overall, in the 4 case studies, the partners assessed the support received. The partnership approach raises several challenges in the field, in terms of the type of partnership possible depending on the intervention areas and the duration of the collaboration and even on the financial relationship between the NGO and the partners. Two types of partnership are identified as interesting (and effective): public partnerships and partnerships with civil society groups.



Intermediate beneficiaries play an important role in achieving the effects on the populations. However, a positive assessment of the effects at partner level does not necessarily guarantee that the expected effects on the final beneficiaries will be achieved.

For two of the four interventions, the effects on the final beneficiaries were identified, measured and attributed, at least in part, to the intervention. In the two remaining cases, the results are less tangible.

Achievement of impact

The proportion of individuals affected by the intervention in the intervention area could only be calculated in a single case (due to the lack of available data). The spillover effect was clearly identified for two of the four interventions. The effect on local policies is weak but present in three of the four interventions.

Sustainability

The sustainability of the effects on the final beneficiaries is often challenged through the sustainability of the effects on the intermediate beneficiaries (for example, the operation of producers' groups or water management committees, etc.). The sustainability of the effects of field actions seems more likely when the partners are civil society organisations (in the field of activity) or with a public partnership. Such organisations are present in the intervention area before and after the project, making the spillover effect more probable. In addition, they are less financially dependent.

Gender and environment

Three of the four evaluated projects clearly target women as the beneficiaries. None of the interventions explicitly refer to environmental protection.

The added value of a rigorous mixed approach

The NGOs recognise a certain added value in implementing a rigorous mixed approach to assess the effects of development actions.

Definition of the term 'impact'

There is confusion regarding the definition of the term 'impact' in the NGO sector. A shared vision defining the different levels of results of a development intervention (direct, intermediate, final beneficiaries) seems essential for determining the required level in terms of results-based management (in order to obtain them) and consequently in terms of accountability and aid effectiveness.

The use of rigorous mixed methods

The use of such methods is feasible, even ex post. However, the development of an ex ante rigorous mixed approach would significantly reduce any implementation difficulties.

The NGOs recognise the contribution that the rigorous mixed approach makes regarding the findings about the results and in methodological terms. This approach therefore seems relevant, though 'policy influence' type interventions can be questioned.

The rigorous impact evaluation is affordable in many cases. In addition, the benefits created by using a relevant and rigorous monitoring and evaluation system from the start of an intervention may be greater than the additional costs incurred by more rigorous practices, which makes the approach profitable. Furthermore, this approach will be more profitable if there are incentives for rigorously accounting for the effects on the final populations in the Belgian development cooperation sector.



Recommendations

Generally, greater discussion is needed between development cooperation stakeholders, in order to develop a common understanding of the issues, challenges, constraints and opportunities of evaluation, particularly impact evaluation.

The use of communication methods other than paper is recommended, such as video for sharing a study's results with the key stakeholders, (including the beneficiaries)1.

Recommendation to the Federal Parliament and the Minister of Development Cooperation

It is recommended that politicians generally endorse the rigorous assessment of the changes to the target populations generated by the interventions. In order to do so, they must allocate resources and expertise and provide the necessary incentives, not only to the State structures (SEO, DGD) but also the stakeholders in the field (like the NGOs) and those involved in the evaluation.

Recommendations for the DGD

1. It is recommended that the DGD creates incentives for accounting for the results of non-governmental cooperation for the populations it is targeting. This is also set out in the "Development Results" Strategy Paper (2014, p.4).

- The DGD should clarify its objectives in terms of evaluation. There are three different types of objectives: (i) accountability, (ii) learning for decision making and (iii) learning for knowledge sharing.

- It is recommended that the DGD promotes a shared vision of the definitions of the different levels of results, especially for outcomes and impact.

- It is recommended that the DGD specifies the level of results on which the NGOs must report, as well as the required level (rigour) of the reporting.

- It is recommended that the NGOs are given incentives to implement effective and rigorous monitoring and evaluation systems to report the results for the target populations that can be attributed to their actions. Incentives are needed to produce relevant and usable baselines (which is rarely the case currently) and to adopt a counterfactual approach (even in the case of a qualitative approach).

- This evaluation recommends that funding focuses on (rigorously evaluated) results, but also on stakeholders' ability to learn the lessons of the past. Incentives therefore need to be created so that these monitoring and evaluation systems can also be used to report on failures, without this leading to consequences for future funding, if lessons have been learned.

2. It is recommended that the notion of partners' capacity building be questioned, especially for long-standing relationships.

3. It is also recommended that the NGOs be encouraged to collaborate with different types of partners. Partnerships with civil society organisations in their field of competence or even with decentralised, local public institutions may improve the sustainability of the results compared to partnerships with local NGOs that are highly dependent on the funding provided by Belgian NGOs in order to act among populations.

Recommendations for the NGO

1. It is recommended that monitoring and evaluation systems be developed and allow accounting for results for the final beneficiaries and at a more general level.

1 This has been successfully tested within the framework of this study, based on the project in Peru. The videos

are available on the following webpage: http://www.ade.eu/news-detail.php?news_id=56



- The theory of change should be used as a basis for constructing the monitoring and evaluation system so as to identify the specific causal relationships and the underlying hypotheses behind the changes.

- The partners' capacities for monitoring and evaluation must be strengthened. They must be given incentives to be more rigorous in their data collection practices.

- It is also recommended that the effects of an intervention at a more general level be assessed (especially if the period covers an intervention cycle and not just a funding cycle).

2. It is recommended that, within the monitoring and evaluation system and in the evaluations, a more rigorous methodological approach be adopted (random selection of respondents, counterfactual approach), while focusing on mixed methods in order to produce credible findings, demonstrate the attribution of the results and explain the underlying mechanisms behind the changes.

3. An intervention's complexity or the cost of a rigorous mixed approach are not valid reasons to avoid developing a more effective monitoring and evaluation system and producing more rigorous evaluations.

- Considerations should be given to how these costs can sometimes be shared. - Within all Belgian development cooperation, work should also be carried out on the

constraints (expertise, budgets, etc.) and the incentives (financial, support, etc.) to account for the results of the development actions for the final beneficiaries.

4. It is recommended that collaboration be planned with different types of partners such as civil society organisations in their field of competence or even with decentralised, local public institutions, in addition to local NGOs.

5. It is recommended to systematically explain where and how the intervention will have effects on women, (young people) and the environment.

Recommendations for the SEO

1. It is recommended that part of the annual SEO budget be dedicated to producing strategic impact evaluations to investigate the results of development assistance for the target populations and/or more generally (extended to the effects on a region, a sector, a stakeholder).

2. It is recommended that the SEO designs and supports the creation of a methodological framework and support system to develop the (impact) evaluation capacities of all Belgian development cooperation stakeholders.

3. It is recommended that the SEO plays a decisive role in assessing the quality of the NGOs' monitoring and evaluation systems. This type of certification, in addition to accreditation after screening (in progress), would be two elements that would promote the relationship of trust between the DGD and these NGOs, in such a way that the latter would no longer have to report on activities and outputs and could focus on outcomes and impact.



1. Introduction

This document is the synthesis report of the ex post impact evaluation of four Belgian non-governmental cooperation interventions. This evaluation's main objective is to learn lessons about impact evaluation, about both the methodological approach to be implemented and the added value of this type of evaluation for non-governmental cooperation, based on four case studies.

Twenty-two interventions were voluntarily put forward for this study by twelve Belgian NGOs. These interventions were all co-funded by the DGD over the 2008-2013 period. Following a participatory selection process, four interventions were selected. These were supporting the sunflower and chicken production chains in Tanzania (TRIAS), the organic rice value chain in Boyolali district in Indonesia (VECO), access to drinking water in the Huánuco region in Peru (Iles de Paix) and the right to health in the Philippines (M3M). It should be noted that these interventions are components of broader programmes implemented by these NGOs, not only in these countries but also other countries.

For these four interventions, a rigorous methodological approach was developed to account for the intervention's impact. More specifically, the evaluation focused on the effects on the final beneficiaries and the effects at a more general level2. The approach favoured mixed methods, combining quantitative and qualitative analysis tools to varying degrees. The objective of such an approach was to measure quantifiable effects, assess other effects, demonstrate the attribution of these effects to the intervention and/or to make the case for the intervention's contribution to the more general changes and finally, to explain the underlying mechanisms behind the changes observed (or not).

This evaluation took place over an 18-month period. A steering committee comprised of members of French-speaking and Flemish NGO federations, the DGD and the Special Evaluation Office (SEO), closely monitored the process. All the participating NGOs were brought together twice, at the start and end of the process. The four selected NGOs also attended the steering committee meetings during the presentation of the final report relating to them. The sample of interventions proposed and selected is not in any way representative of all the actions by Belgian NGOs. However, due to the many discussions with the NGO sector and the steering committee, the lessons from this study are interesting to all those involved in development cooperation in Belgium.

This evaluation was also possible due to the involvement and motivation of the Belgian and local teams of the selected NGOs, competent local experts and enumerators and a team of experienced evaluators. We would like to thank all those who were involved, in whatever capacity, in producing this impact evaluation.

This report summarises the entire evaluation process for this study. It summarises and draws lessons from the selection process and the responses to the seven summative evaluation questions analysed in the evaluation report for each of the 4 selected interventions (these reports are available in the appendices). It also answers two formative evaluation questions. Finally, this report draws lessons from this ex post impact evaluation of 4 non-governmental cooperation interventions as a whole and presents not only general recommendations, but also specific advice for the NGO sector, the DGD and the SEO.

2 The effects at a more general level are defined as the proportion of the population affected in a geographic

area, the influence at sectoral level and the legitimacy of the NGO and its partners in the intervention area.



This study is relatively complex because it has a summative aim based on four case studies and a formative aim of drawing lessons from this entire evaluative process. Specifically, this study illustrates three themes: (i) the definition of the purpose, the impact and the overall objective of impact evaluations, to improve aid effectiveness; (ii) the methodological approach combining qualitative and quantitative analysis tools to produce impact evaluations; and (iii) the rigour necessary to guarantee the credibility of the findings of evaluations in general and therefore also of impact evaluations.

This document is organised into 8 chapters. After a short introduction, several elements of the legal context of the NGO sector in Belgium, particularly regarding the requirements in terms of monitoring and evaluation, are presented. Chapter 3 specifies the key concepts as used in this evaluation for the 4 case studies and gives the definition of outcomes and impact, as well as that of the mixed approach. The next chapter summarises the selection process used to choose the four interventions. Chapter 5 presents the four case studies and a summary of the responses to the seven summative questions for each of the interventions subject to an impact evaluation (Q1 - relevance; Q2 - evaluability; Q3 - achievement of outcomes; Q4 - achievement of impact; Q5 - sustainability; Q6 - cross-cutting themes; and Q7 - the added value of a rigorous mixed approach). Chapter 6 covers the two formative evaluation questions. The first provides the context for the discussion about the definition of the term 'impact' and indicates how the definition proposed for this exercise is interesting for participating NGOs and for evaluating development actions. The second formative evaluation question focuses on the rigorous mixed methodological approach. It differentiates between the notion of mixed methods and rigorous practices and then questions the feasibility, relevance and cost/benefit ratio of these methods for those involved in the NGO sector in Belgium. Chapter 7 concludes and chapter 8 puts forward the recommendations for the various stakeholders: the DGD, NGOs and SEO.

Finally, we inform the reader that the team has made two videos based on the field mission in Peru, in order to provide a visual communication, in addition to the written report: (1) a 4-minute film illustrating the rigorous mixed methodological approach used for this ex post impact evaluation; and (2) a film showing and explaining the results of the intervention in Peru. These films are pilot initiatives in terms of communicating about an evaluation. They are available on the ADE website (www.ade.eu), in the News section.



2. Elements of the legal context of the NGO sector in Belgium

Development cooperation in Belgium is undergoing substantial reform. Strategic discussions are in progress between NGO umbrella organisations and the DGD. One of the objectives of this reform is to simplify administrative management within both the DGD and the NGOs, while trying to move towards greater aid effectiveness and better account for the changes instigated due to Belgian development assistance.

All the interventions that are subject to this evaluation are governed by the Law of 25 May 1999. The accreditation of NGOs and the subsidies for programmes and projects presented by NGOs were covered, respectively, by the Royal Decree (RD) of 14 December 2005 and that of 24 September 2006 (the execution of which is specified in the Ministerial Decree of 30 May 2007).

The evaluation therefore focuses on four interventions implemented in the south by non-governmental organisations with Belgian development cooperation, subsidised by the Belgian State under the Royal decree of 24/09/2006. This ex post impact study provides some answers about the results of these 4 interventions, especially about measuring the changes at the level of outcomes and impact and their attribution to the development assistance received.

A new law relating to Development Cooperation was published on 19 March 2013 then amended by the Law of 9 January 2014. Other legal texts govern the work of NGOs, including the RD of 25 April 2014 regarding Subsidies for non-governmental cooperation stakeholders (NGCS) and the RD of 10 April 2014 which establishes the list of partner countries for NGCS.

The analysis of some elements of the legal context can provide a frame of reference for considering impact evaluation in the development of policies and strategies for Belgian development cooperation. Exploration of this legal framework focuses specifically on the partnership approach, the programme approach and integrated policy and on the rules in terms of monitoring and evaluation.

Appendix B presents a summary comparison of the old (Law of 25 May 1999) and new (Law of 9 January 2014) legal frameworks.

2.1 The current legal framework and the partnership approach

The new law of 2014 pays particular attention to the capacity building of local stakeholders. This becomes an explicit objective which is not present in the law of 1999. This move towards capacity building corresponds to the change in the concept of development assistance, after the Paris Declaration on aid effectiveness.

The partnership approach is not explicitly mentioned in the law. However, it has become a de facto compulsory criterion for obtaining funding given that capacity building must be present in the explicit objective of the programmes. Capacity building of the local partner is no longer just a means of impacting the populations, but becomes one of the objectives of the interventions. That said, the ultimate objective of development assistance is still to change people's (final beneficiaries) lives sustainably.



2.2 "NGO programme" accreditation and the integrated policy

The new law is part of a process which favours the programme approach3, although individual projects can still be funded by the DGD. The co-funding system can therefore be used to support the initiatives of Belgian NGOs under a NGO project or NGO programme in developing countries.

Article 2 of the Royal Decree of 25 April 2014 regarding Subsidies for NGCS sets out the conditions for accreditation as an NGO.4 The programme approach would in principle be justified by stronger internal consistency of the NGO, which would increase the effectiveness of interventions and allow better achievement of the results.

'NGCS programmes' will have access to greater and more sustainable funding over time. This should therefore, among other things, provide:

- A more stable presence in the countries covered by the programme and therefore a better understanding of the context, especially in the sector affected;

- A more careful consideration of the added value of the NGCS in each context/country/sub-region;

- A greater stability of the partnerships encouraging NGCS to invest more in the capacity building of its partners (although results in terms of capacity, knowledge and changes in behaviour are only realised over the medium term);

- Opportunities for 'cross-learning' based on the experiences of different partners in the same sector/field.

- An opportunity to build on the experiences of different partners and invest in the capacity building of a network of partners;

- The opportunity to develop working relationships with other stakeholders operating in the same sectors/countries; etc.

This programme approach goes hand in hand with the importance given to the consistency of Belgian development cooperation interventions (synergy, coordination and standardisation of all stakeholders) in the new law. The objective of the resulting integrated policy is "to ensure better coordination of the efforts of the various channels of Belgian development cooperation and thus promote complementarities and synergies between these channels". This "is achieved by defining a shared vision5 between those involved in Belgian development cooperation on the challenges and opportunities of a country and the strategies or development approaches to be favoured [...] which will be the basis for identifying opportunities for synergies and complementarities6 within Belgian development cooperation".7

This need for standardisation is reinforced by the DGD's reform into geographical departments (2012)8 and by the need to create common contextual analyses (CCA) for Belgian NGCS that operate in the same country/sector. The effort started through the CCAs

3 It is important to emphasise that the "programme approach" as envisaged by the DGD contrasts sharply with

the programme approach of the DAC/OECD. The DAC/OECD programme approach is a means of development cooperation based on the principle of coordinated support for a development programme with strong local roots, for example a national development strategy, a sector programme, a thematic programme or the programme of a specific organisation. This is not the case for NGO programmes.

4 'NGO programmes' must have an effective system for managing the organisation (strategic management, process management, results-based management and partnership management). The additional accreditation to become an 'NGO programme' requires, among other things, to "comply with a results-based logical approach" and "contribute, through transparent and fair partnerships, to the capacity building of local partners" (Art 16).

5 "Shared" does not mean "joint". 6 Complementarity does not necessarily imply collaboration. 7 "Communication aux ONG membres - Un accord-cadre entre le Ministre et les représentants des acteurs de

la coopération non gouvernementale" 11-09-2015, p.5. 8 The 'Civil Society Directorate' was divided into geographical departments.



will be intensified by the development of thematic or geographical joint strategic frameworks (JSF). These JSF are living tools where the objectives are:

- to promote consultation and coordination of strategies between the NGCS active in a country or on a theme;

- to implement the effective complementarities or synergies between NGCS encouraging structured or ad hoc collaborations;

- to ensure collective learning about strategies and risks; - to engage in strategic dialogue with the administration.

2.3 The new legal framework and monitoring and evaluation

Since the RD of 2006, NGOs have been forced to show more professionalism in order to receive subsidies. One of the criteria is the establishment of results-based management, during the intervention period. This means that for each expected result, the programme specifies the timetable of achievements, the baseline situation and the indicators used to assess the achievement of the defined result. This also means producing a list of hypotheses and a risk analysis related to the intervention's success. This approach is completed by formulating a logical framework which is used as both a formulation tool (to demonstrate the resources implemented in order to achieve the required results) and a monitoring and evaluation tool.

The new legal framework9 and the 'Development Results' Strategy Paper10 stress the importance of standardising the evaluation systems and their certification.

"The DGD will set up a performance measurement system to assess the effectiveness of the interventions funded, learn lessons and ensure transparent communication.11 (…) A consistent approach will be developed for this purpose, in order to allow reporting of the results and results-based management. A standardised reporting system will also allow the systematic monitoring of whether or not the results have been obtained.12 (...) The Special Evaluation Office will be responsible for certifying the evaluation systems proposed by the different NGS.10"

The Royal Decree of 25 April 2014 stipulates that the SEO will certify such systems without specifying how they will be standardised. A relatively complex exercise given the diversity of the actions and implementation methods.

Furthermore, the new legal framework still talks about results-based management without providing more details about the definition of the word 'result' (output, outcomes, impact) and without specifying whether it would be interesting to demonstrate the intervention's attribution or at least its contribution, to the changes observed.

The Strategy Paper, although it indicates that one of the challenges of development cooperation is to demonstrate the results achieved and stresses the importance of accounting for the effectiveness of interventions, does not define the criteria and procedures for certifying evaluation systems allowing the collection of reliable information.13 Furthermore, it is not clear whether NGOs should explain the underlying

9 Law of 9 January 2014. 10 '‘Résultats du développement' Strategy Paper, Directorate General for Development Cooperation and

Humanitarian Aid, July 2014. 11 '‘Résultats du développement' Strategy Paper, p. 4-5. 12 Law of 19 March 2013. 13 The Law of 9 January 2014 on Development Cooperation states the following elements in relation to the

requirements in terms of monitoring and evaluation.

- “With a view to achieving Belgian Cooperation Development goals..., the results are evaluated on the basis of the criteria defined by the OECD’s DAC, namely relevance, effectiveness, efficiency, feasibility, and impact, as well as on the basis of sustainability. To that end, a coherent approach will be developed



mechanisms behind the changes ("why" and "how" the effects were achieved or not) in order to learn lessons and improve aid effectiveness.

It is important to realise that too great a focus on measuring results could lead the DGD to concentrate its funding on interventions with easily measurable short-term results at the expense of actions that are more difficult to measure and/or where the effects are mostly felt only in the medium- or long-term.

with a view to enabling the reporting of results and results-based management. A standardised reporting system should also enable the systematic monitoring of the results obtained or not...." (Art.32)

- "Belgian Development Cooperation Stakeholders... are responsible for the internal evaluation and monitoring of their interventions. The King determines the means of standardising and certifying these evaluation systems." (Art 33).

- "The King determines the instruments needed to guarantee the external evaluation of all Belgian development cooperation interventions..." (Art. 34)

- "The costs for the evaluation are included in the operating expenses and must amount to between 1% and 3% of these"" (Art 41 of the RD of 25 April 2014).

- "For 5-year subsidy applications, an intermediate evaluation will be arranged. This will determine: (1) whether the required results are being achieved; (2) where necessary, the appropriate measures that the NGCS should implement in order to achieve the results; and (3) the lessons learned that the NGCS should take into account when preparing new subsidy applications. A final evaluation will assess the achievement of specific objectives and results at the end of each subsidy application. The methodology, process and final report of the evaluation will have to satisfy the DAC's quality criteria in terms of evaluating development cooperation. The final evaluations, together with the intermediate evaluations for 5-year subsidy applications will be carried out by external evaluators. Intermediate evaluations for applications under 5 years can be carried out internally." (Art. 43 of the RD of 25 April 2014).



3. Clarification of the concepts used to evaluate the 4 interventions

3.1 Defining impact in the 4 case studies

Neither the literature, nor discussions with stakeholders from the NGO sector have provided an exact definition of the term 'impact'. However, a definition of this term was needed in order to structure this exercise and demarcate everyone's expectations.14

In the 4 case studies, the logical framework was used as a tool to structure the analysis. Figure 1 presents the definitions of the logical framework elements that were used for this study. The definitions proposed are based on the definitions used for the ex post evaluation of 4 governmental cooperation projects (2013). They were then slightly amended to reflect the partnership approach specific to the NGO sector. At the start of this study, these definitions did not elicit any special remarks from the 12 participating NGOs. However, at the end of the process, there was a debate about the relevance of using this definition and standardising the definition of the term 'impact' for NGOs. These findings are the subject of the formative evaluation question (see later).

We suggest a clear distinction between outcomes and impact: outcomes concern all the direct and indirect effects resulting from the use of the outputs for the different types of beneficiaries; and impact refers to the effects generated by the project at a general level (see Figure 1 below).

Figure 1: Definitions of the logical framework concepts used for the 4 case studies

14 It should be noted that in the report "Broadening the range of designs and methods for impact evaluation"

(DFID, 2012), the authors also begin this study by clarifying certain definitions, including that of 'impact'.



The notion of temporality does not appear directly in the concept definitions because it is included in the notion of the causal sequence of the effects at the level of the outcomes (see later).

The notion of attribution of the effects is present in the definition of outcomes ('resulting effects') and that of contribution is found in the definition of impact ('generated effects').

Figure 2 below illustrates the concept of the causal sequence for one type of beneficiary and between types of beneficiaries at the level of the outcomes. Given the partnership approach developed by all NGOs, it is important to put the effects on the partner(s) and the other types of beneficiaries into context. These effects are linked to each other by different causal relationships.

There are three notions underlying the causal sequence of an intervention's effects.

(1) The notion of observability of effects. Some results can only be observed after a certain amount of time. Others may no longer be observable because they are not sustainable. The nature of the results and the moment at which the evaluation takes place are decisive factors in this regard.

(2) The notion of sustainability of effects. Some results may be sustainable over time and able to generate further results.

(3) The notion of indirect effects. In addition to the results directly related to the intervention, consideration should be given to its indirect effects, including the development action's positive and/or negative externalities.

Figure 2: Causal sequence of the effects at the level of outcomes and impact

Besides the effects at the level of the different types of beneficiaries, it is also interesting to consider whether the intervention has had a significant effect at a more general level (see figure 2). The intervention can have significant results on the beneficiaries, but this does not necessarily mean that the project has had an important effect more generally. Three types of effect at a more general level are selected.

(1) The impact on the sector and/or a geographic area15. This means assessing whether the intervention has an influence at the level of the sector affected. This dimension is certainly important in interventions that aim to influence local policies, such as lobbying or civil society capacity building type interventions (‘policy influence interventions’ or even ‘empowerment interventions’). It also involves assessing the

15 Although the intervention is not always intended to bring about changes at a more general level, it is still

interesting to inform about the impact of an intervention/programme on a sector and geographic area. Indeed, a project may achieve 100% of its objectives but only affect 1% of the population, this intervention's impact is therefore still low overall for the intervention area. This observation is not necessarily accompanied by a value judgement, it is simply interesting information to be collected.



extent of the effect, i.e. estimating the proportion of the population that has been positively affected by this intervention. The geographic area to be considered varies depending on the project's scale (municipal, regional, national).

It is important here to assess this overall effect in line with the intervention's characteristics (duration, scope, budget). It should be noted that having an ex ante discussion about this matter may potentially lead to a resizing of the intervention's ambitions.16

(2) The impact at programme level.17 This study evaluates the impact of a development intervention, not the impact of an entire programme. However, it is interesting to assess how an intervention's results can help to achieve the programme's impact. To do so, it is worth considering where and how the results of each of the programme's components contribute to the overall impact of the whole programme. The objective is accountability on the one hand, and learning on the other.

- Accountability can have different required levels. (i) In relation to all planned interventions, how many of these interventions have achieved the expected effects for the different types of beneficiaries? (ii) How many beneficiaries have been positively impacted by all of a programme's interventions? (iii) To provide a measurement of the effects that can be attributed to each of a programme's components for each type of beneficiary.

- Learning lies at another level. Learning involves understanding how the results (outcomes) have been achieved for each type of beneficiary. The lessons learned about the impact evaluation for a single intervention can be discussed with various key stakeholders in order to understand how these can be extrapolated (at least in part) for all a programme's actions, giving strong and relevant arguments justifying such extrapolations. This goes back to judging the external validity, i.e. understanding where and how the same type of intervention is likely (or not) to achieve its results in other contexts. This also means understanding the links between the evaluated intervention and other components of the programme implemented among the same beneficiaries (where appropriate).

It should be noted that NGOs do not currently (in practice) evaluate this type of effect. Assessing this level of impact is relatively complex. A programme's impact is often formulated in a relatively abstract fashion and is far removed from the other effects in the causal sequence, thus creating a 'missing middle' (an observation also made in the Study of the practical evaluability of the interventions co-funded by Belgian development cooperation, 2015).

(3) The impact of the intervention on the NGO and its partners. This aspect consists in assessing how achieving the results (or not) of an intervention has a positive effect (or not) on the reputation and performance of the NGO and/or on that of its partners in the field. This is important in a wider temporal vision of the implementation time of an intervention, a programme (which is often the case). This gives credit to the legitimacy of the stakeholders. The elements of this analysis are also a learning opportunity for the NGO and its partners.

For this ex post impact evaluation, the team reconstructed the logical framework of each selected intervention ex post using the definitions given above. For this study, the "impact

16 Indeed, it is important to define the required overall effect in line with the intervention conducted, because

it sometimes/often seems that this effect is disproportionate. In this case, the risk is that the expected effect is not found.

17 In this study, a programme is defined as a set of coherent interventions that aim to achieve an overall objective: this either consists of reproducing the same type of intervention implemented in several countries or intervention areas or creating different types of interventions in the same geographic area.



evaluation" therefore involved evaluating the effects for the different types of beneficiaries (partners, intermediate, final) with greater attention paid to the final beneficiaries (the population) (outcomes) and the effects at a more general level (impact).

In terms of evaluating the more general effects, this study focuses primarily on analysing the impact on the sector and the geographic area. The impact at programme level has not been analysed because this was beyond the objectives of this evaluation (see box 9).

3.2 Type of methodological approach implemented in the case studies

The methodological approach selected for this ex post impact evaluation has two main characteristics: it is mixed and meant to be rigorous (in the scientific sense).

Firstly, the logical framework was used to structure the theory of change (theory based evaluation). This tool can be used to structure the analysis by specifying the different effects expected on the different types of beneficiaries for each level of results (outputs, outcomes and impact) (see figures 1 and 2). It is also used to explain the causal relationships between the different levels of results of an intervention and the underlying hypotheses.

Given the ex post situation of this evaluation, the effects were defined according to the actual situation and no longer as expected in the initial logical framework. Consequently, the effects may differ from those initially expected. Furthermore, some interventions covered two funding cycles, a single logical framework was reconstructed for both periods. The theory of change (ToC) was therefore reconstructed based on the available documents, working closely with the NGO's key people in the field and at head office as well as with partners and taking into account the changing context.

A rigorous mixed methodological approach was then drawn up for each of the four case studies18, taking account of the context, data and information available from the NGOs and their partners, the type of effects to be assessed and the budget and time constraints of this evaluation.19

The selected methodological approach was therefore a combination of qualitative and quantitative methods structured around the theory of change. These two approaches are complementary and used simultaneously or sequentially depending on the scenario.

A good qualitative analysis provides an in-depth understanding of the context, the intervention and the key stakeholders. It identifies all the socio-economic, political and cultural aspects together with the challenges and the power relationships at different levels that may promote or inhibit the achievement of certain effects. It explains the underlying mechanisms behind the changes (in the theory of change). This type of qualitative approach provides a reasoned judgement on the achievement of effects that cannot be measured at the level of outcomes and impact (as defined in section 3.1). It also makes it possible to analyse the intervention's contribution to the achievement of these effects (see figure 3). The resulting knowledge also helps to interpret the results of the quantitative analyses.

The qualitative analyses conducted as part of this study were based on a documentary analysis and information collected through individual or group interviews (focus group

18 Ultimately, only a rigorous qualitative approach could be implemented to evaluate the M3M intervention in

the Philippines; in the other three cases, the mixed approach could be used (see Chapter 5). 19 A summary of the methodological approach followed for each of the case studies is presented in Chapter 5.

For more detail, please refer to the evaluation reports for each case study, available in the appendices.



type) and direct field observations. The qualitative approach followed had two specific features: (i) the selection of areas to be visited and people to be invited to the group discussions was made randomly; and (ii) a counterfactual approach was implemented. The aim of this was to collect information that was more credible than if the choice of the visits was made by the stakeholders themselves and if the respondents were self-selected (see below for a discussion about the rigorous aspect of the methodological approach).

Figure 3: Impact evaluation by mixed methods for this study

A good quantitative analysis is based on the use of econometric and statistical tools to measure and demonstrate the attribution20 of the effects to an intervention with a certain degree of precision (see figure 3). In order to be able to quantify the effects, these must be measurable21 and sufficient reliable data22 is needed. In order to demonstrate attribution such data must be available for two periods, before and after the intervention, as well as for two groups, a group of beneficiaries and a control group. The control group consists of individuals who are relatively similar to the group of beneficiaries before the intervention. The hypothesis is that without the intervention, the group of beneficiaries would have developed in a similar way to the control group.

Data can be generated directly by a programme's monitoring and evaluation system, it can be publicly available23 or even collected using surveys conducted during the evaluation.

One of the specific characteristics of NGOs is that they work with partners. The use of statistical tools to quantify outcomes for the direct beneficiaries (generally the partners) or even for the intermediate beneficiaries is irrelevant in view of the limited number of

20 Demonstrating attribution means showing the net effect of the intervention, i.e. demonstrating how the

intervention has made the difference in terms of the well-being of the target populations. 21 Qualitative variables may be subject to statistical processing if they are systematically recorded for a large

number of individuals (for example, a level for assessing an effect by the beneficiaries on a scale of 1 to 5). 22 100 observations is a minimum but this varies depending on the reliability of the data and the type of analysis

to be performed. 23 The World Bank, but also the FAO and other institutions publish a series of data that it is worth consulting

because it is comparable year on year and often across several countries.



possible observations.24 Indeed, in order to use these tools, it is important to have access to data about a large number of individuals. It is therefore more common that the sole effects on the final beneficiaries (the target population) that can be attributed to the intervention be measured using quantitative methods.

However, whether or not the effects on the direct and intermediate beneficiaries have been achieved can be considered as a factor that promotes or inhibits the effects on the final beneficiaries. In other words, the effects on the direct and/or intermediate beneficiaries become explanatory factors when analysing the results for the final beneficiaries.25

Furthermore, the quantitative results at the level of the outcomes on the final beneficiaries can provide strong arguments for measuring the effects at a more general level and the intervention's contribution at this level (sectoral / geographic / programme / NGO and partners).

The data used in this study comes from several sources. In two of the four case studies, primary data was collected using a household survey (Indonesia, VECO and Peru, Iles de Paix). In one of the studies (Tanzania, TRIAS), secondary data was used: a database created by the NGO at the end of the intervention using a credible household survey and public26 LSMS data from the World Bank. For the final study (Philippines, M3M), the plan was to use data collected by one of the partners, but in the end this was not the case (see below) and only a qualitative approach was followed for this case study.

A counterfactual approach was used in the household surveys and respondents were chosen randomly. In addition, the pre-intervention (baseline) data was reconstructed based on questions using people's memories (recall data). For the intervention in Tanzania, the baseline data was available in the NGO database (along with recall data). However, no information was available about non-beneficiaries. An analysis of the World Bank's data (LSMS) was used to establish an imperfect yet informative control group (see Chapter 5).

"These approaches - qualitative and quantitative -, although very different, need each other to provide quality evaluations. Perfect understanding of quantitative techniques can never be a substitute for a rigorous analysis of an intervention's causal sequence, or a relevant argument made by experts with local knowledge (Deaton, 2012). Conversely, a convincing explanation for an intervention's effects cannot replace the measurements and scientific evidence of the attribution of a project's effects through the use of quantitative methods." (Ex post impact evaluation of 4 governmental cooperation projects, Synthesis Report, 2013).

The developed methodology is intended to be rigorous in both the quantitative and qualitative approach. Rigour is often associated with the quantitative approach, due to the requirements of statistical tools, but also because it refers to the impact evaluation as practised in academia. However, the qualitative approach considered for this study was also intended to be rigorous. As previously stated, it favoured a random selection of areas to be visited and respondents. It also used a counterfactual approach designed to present an argument on the attribution of the effects (or the intervention's contribution to the effects) without however demonstrating this quantitatively.

In addition to reconstructing the logical framework27, four rigorous practices were systematically implemented in the methodological approach for this study, whether quantitative or qualitative. There was a twofold objective: ensure the greatest possible

24 Nevertheless, note that NGOs have developed tools to measure the effects in terms of the capacity building

of their partners. However, these measurements are still based on a small number of observations. 25 In a survey (household type survey), in addition to the outcome variables, context and behaviour variables

promoting or inhibiting the expected effects are also collected. 26 The methodology satisfies the necessary rigorous criteria (random selection of respondents, relevant

collection tool, trained and supervised enumerators, cleaned database). 27 Having established clear causal relationships between all levels of results and all types of beneficiaries. The

use of the logical framework in this study is therefore comparable to the use of the theory of change.



reliability of the data to be processed and allow an analysis of the attribution/contribution.

(1) Relevant collection tools. Collection tools were developed to obtain reliable data on the relevant indicators identified following the reconstruction of the logical framework. In the qualitative approach, the interview guides were tested and corrected. The interviews were conducted in a similar way in the different areas with the help of a local expert. The household survey questionnaires were also tested, corrected and implemented on tablets28. A field-based team of enumerators was trained on the evaluation's objectives, the questionnaire's content and using the tablets. The development processes for the tools and their implementation were therefore designed to minimise 'reporting' biases, where respondents can tend to exaggerate reality in one way or another by pursuing undisclosed objectives. When this bias was identified, the findings were adapted according to these limits.

(2) The random selection of the areas to be visited and respondents. This practice ensures greater objectivity and neutrality in the information collected. If the partners and the NGO suggest the areas and/or people to be questioned, or even if the respondents are self-selected, there is a greater risk of biased information (this is called a selection bias). The availability of lists of beneficiaries greatly facilitates this selection process. Recreating these lists can take time, but, it is sometimes possible to select respondents by visiting them and performing a random walk. If a random selection is not feasible, the potential bias in the responses due to the type of selection and/or the characteristics of the respondents must be identified.

(3) The counterfactual approach. The objective is to argument/demonstrate how the changes observed can be attributed to the development action. This is important in order to justify the purpose of the intervention. Identifying a control group was not easy for two reasons: the ex post situation and because one of the features of NGOs is that they choose their beneficiaries based on very specific characteristics which may directly influence the project's results. Furthermore, there may be a contagion bias, whereby the intervention produces effects on non-beneficiaries or where non-beneficiary populations can be influenced by other development actions29. These elements can potentially reduce the credibility of the counterfactual. If the identified counterfactual was not perfectly credible, i.e. exactly comparable to the beneficiary population, the findings were adapted according to the limits identified by this approach.

(4) Reconstruction of the pre-intervention situation. Knowing the value of the indicators before the intervention is implemented is necessary in order to assess the change in the indicators over the intervention period, but this alone is not enough to attribute the changes to the intervention. Only the counterfactual approach permits this, preferably with knowledge of the pre-intervention situation for both groups (beneficiary and control). The baseline situation was reconstructed where feasible by asking questions that relied on respondents' memories for the answers.

In addition to a rigorous methodology for information collection, a rigorous processing method was implemented. Rigorous processing is found in information literacy and the expertise on analytical tools (particularly quantitative), but also by mentioning the limits of the methodology and taking this into account in the findings. When biases were identified, hypotheses were put forward about the direction of these biases and consequently the influence that they could have on the findings (under- or overestimation of the results).

28 In addition to being a faster way to collect information, using tablets also largely helps to avoid encoding

errors. 29 Actions implemented by other stakeholders working in the same area, sector and/or related sectors.



Box 1: Focus on the impact evaluation's objective and the methodological approach

For this study, the ex post impact evaluation consisted in assessing the effects of a development action on the beneficiaries, especially the final beneficiaries (outcomes) and at a general level (impact), specifically the effects on the sector and the extent of the effects geographically.

To do so, a mixed methodological approach was developed. A combination of quantitative and qualitative analysis methods was used to assess the same type of effect (outcomes and impact). The objective of the quantitative approach was to measure the effects that could be attributed to the intervention. The qualitative approach was used to assess the non-quantifiable results and explain the underlying mechanisms behind the changes. However, because of constraints due to the lack of reliable data as well as time constraints, the evaluators were unable to use the quantitative analysis tools for one of the case studies.

This methodological approach was intended to be rigorous (in the scientific sense) in both its quantitative and qualitative approach, through the ex post reconstruction of the theory of change, the random selection of areas to be visited and respondents, by favouring a counterfactual approach, reconstructing the baseline data and the rigorous processing of the information taking into account any possible biases.

The aim of using rigorous mixed methods was therefore to ensure that the most objective information possible was collected in order to establish credible findings about the effects that could be attributed to an intervention, to measure these effects where the data allowed, while explaining the underlying mechanisms behind the changes observed (or not).



4. The selection procedure for the 4 interventions

Twelve Belgian NGOs voluntarily proposed one or two interventions to be evaluated for this study. These interventions had to have benefited from DGD funding over the 2008-2010 and/or 2011-2013 periods. The budget for this evaluation would only allow 4 interventions to be analysed. The evaluators therefore had to select 4 interventions from the 22 put forward by the NGOs. The list of the 22 interventions is available in Appendix C.

4.1 The methodological approach for selection

A four-stage selection process was used. It should be noted that team meetings, involving the academic and NGO experts, were organised between each of these stages in order to take advantage of the specific experience of each member and ensure consistency in the work produced. This process is illustrated in figure 4 and explained below.

Figure 4: The selection procedure for the 4 interventions

(1) A documentary analysis based on the documents received by the NGOs.30

(2) Interviews with NGO representatives. The objective was firstly to present the team of evaluators to the 12 NGOs and hear their expectations and questions in relation to this study. This was the opportunity for the team to ask questions to clarify the document review that had been conducted. Finally, it was important to meet the NGO representatives in order to get an idea of their willingness and availability, as well as that of their partner(s), to take part in this study.

(3) A meeting with the Steering Committee (SC). This meeting was held on 4 December 2014. The objective was to present the selection process and share its results. However, the final choice of the 4 interventions to be considered was only made following the workshop with the NGOs.

(4) A workshop involving the SC, all the participating NGOs and the managers of these NGOs at the DGD. This took place on 16 December 2014. Holding a workshop at this stage of the evaluation was not only important for the formative aspect of the

30 The list of documents consulted is available on request.



exercise, but also to validate the choice of interventions with the various stakeholders of this evaluation.

Three tools were used to select these 4 interventions (these are available in appendix D of this report).

(1) Firstly, a reading grid was drawn up for each intervention based on the documents received by the NGOs. This grid contained the intervention's characteristics and the information needed to get an idea of the impact evaluability.

(2) Based on the information collected in these reading grids, an interview guide was developed by the team in order to outline the bilateral exchanges with the NGO representatives.

(3) Finally, one datasheet per intervention was produced. This datasheet contained three types of information: (i) elements for or against the use of mixed methods, especially quantitative tools, for evaluating the impact as defined in this report (see section 3.1); (ii) factors related to the favourable context, or otherwise, for conducting an ex post evaluation; and lastly, (iii) factors related to the added value of conducting an impact evaluation. These elements are explained in the next section.

4.2 Selection Criteria

Three types of selection criteria were used:

(1) Criteria related to the technical feasibility of using mixed methods and particularly quantitative tools and rigorous practices;

(2) Criteria related to the context, including the context in which the intervention took place and the current context of the region where the intervention took place; and

(3) Criteria related to the added value of the impact evaluation exercise in terms of the cost/benefit ratio: benefits in terms of learning for the NGO (and partners) and for the Belgian NGO sector (specific features of NGOs) and costs depending on the resources to be implemented to complete the impact evaluation.

These three criteria categories were used to define the conditions favourable to the use of mixed methods to evaluate impact (as defined in this study - see section 3.1).

Based on each of these three types of criteria (detailed below), each intervention was judged as satisfying one, two or three types of criteria. If one of the conditions described in a criteria category was not satisfied, the intervention was considered as not satisfying this type of criteria (= OUT for this criteria category). In other words, an intervention satisfied a criteria category if and only if all the conditions described in this type of criteria were met (=IN). This simple method of an IN/OUT judgement by type of criteria was deliberately chosen, along with a limited number of conditions, without adding weighting. The goal was to get a quick idea of the interventions that satisfied all three criteria categories. If the criteria had not been sufficient to select just 4 interventions, the plan was to consider other types of criteria such as the diversity of the general context (fragile state/low or middle income country) or even sectoral or geographic diversity.

4.2.1 Criteria for the use of rigorous mixed methods

The criteria below define the favourable conditions under which it would be beneficial to implement an impact evaluation using rigorous mixed methods.



1) Interventions with a large number of beneficiaries (>500), to provide a meaningful sample. For the interventions selected for this study, this was the number of final beneficiaries.

2) Interventions where most of the outcome indicators for the final beneficiaries are quantifiable and can therefore be measured using reliable data available through the monitoring and evaluation system, and/or through the collection of primary data as part of this study (or reliable and relevant public data).

3) Interventions where a substantial number of the outputs and outcomes at partner level have been achieved. Indeed, if the outputs have not been achieved and/or the effects on the partners are contested, the likelihood of observing a change in the living conditions of the final beneficiaries is poor or even non-existent.

4) Access to the final beneficiaries is relatively easy. If lists of final beneficiaries are not available and/or the beneficiaries are spread out over a wide area or are in a relatively isolated area, this considerably increases the time spent in the field identifying and/or meeting them and thus considerably increases the budget needed for this study.31

5) Interventions where reliable data is available about the baseline situation through the monitoring and evaluation system, and/or through primary data collection by applying the recall32 technique or using secondary data.33

6) Interventions where a credible counterfactual can be identified and for which it is possible to collect data. It will be difficult to find a credible counterfactual if the intervention concerns 100% of the population, if such an approach raises ethical issues (for the NGO, its partner or the evaluation team) or if there is a significant contagion bias.

4.2.2 Criteria related to the context

The criteria given below present elements of the context favourable to implementing an ex post evaluation. There are two types of context considered: the context at the time of the intervention and the current context. In terms of criteria concerning partner(s), the information was obtained based on interviews in Belgium. This information was then verified through a direct discussion with the field.

At the time of the intervention

1) Relations with the partner worked well (no significant conflict, breach of contract, etc.)

2) No serious security, health, economic, political, environmental event took place, during or after the end of the intervention that may have substantially altered the achievement of the effects on the final beneficiaries.

At the time of the evaluation

3) The partner(s) is (are) available and cooperating with the evaluation's implementation.

4) The security, health, political, environmental context is conducive to field visits by members of the evaluation team.

5) At the time of the evaluation, most of the effects on the beneficiaries can be observed.

31 This condition could appear in the "added value" type criteria under the 'cost' condition (see later). However,

it was decided to put it at this level, because quantitative analyses focus mainly on the final beneficiaries and access to a large number is essential. Furthermore, lists of beneficiaries are also useful for identifying the sample (random walks can be conducted but it is then important that the beneficiaries are not too spread out). For a qualitative analysis, this criteria becomes less important because only a few need to be found and questioned (focus group), which then has just a small impact on the time spent in the field (and therefore on cost).

32 The questions asked rely on peoples' memories. 33 Many NGOs have baselines, but the relevance and/or reliability of the data collected at this stage often raises

questions.



4.2.3 Criteria related to added value

An evaluation's added value relates to its benefits and costs. The benefits to be taken from an impact evaluation are assessed on two levels: (i) lessons learned about achieving the results for the NGO and (ii) lessons learned in methodological terms for the entire NGO sector. Even if there are significant benefits in terms of learning, it is important not to lose sight of the implications of the choice on the financial resources to be deployed in the field. This is why certain costs related to implementing the evaluation system are considered under the final condition in this criteria category.

The criteria that maximise benefits and minimise costs are presented below.

1) Benefits in terms of learning about the effects on the final beneficiaries and the effects at a global level for the NGO (and its partners) are important. The question about the evaluation's contribution in relation to the findings already made in existing evaluations therefore needs to be raised. Or even, how the NGO will use the findings of this evaluation internally.

2) The benefits in terms of methodological learning for the NGO sector are important. This means that the intervention sector (for example, access to water/environment...) or the type of intervention selected (for example, service provision/lobbying/strengthening civil society or a combination of these types of actions) is relevant for a large number of Belgian NGOs.

3) The costs are reasonable in terms of travel for the evaluation team (for example, the intervention area is not too isolated) and in relation to the costs for translation/cultural differences (for example as with China).34

4.3 Results of the selection process

As a reminder, one datasheet per intervention was created based on the documentary evidence, interviews with NGO representatives and team discussions with the NGO and academic experts. For each of the interventions, this datasheet presented the arguments for or against the conditions under the three criteria categories (criteria related to the use of mixed methods, the context and added value) as described in the previous section.

Based on these datasheets, the team divided the interventions into three sets and their overlaps. Each of the sets represented a satisfied criteria category. In practical terms, an intervention can satisfy the criteria of three, two or a single category. In order to satisfy a criteria category, all the conditions described under that type of criteria had to be satisfied (see section 4.2).

Following this process, a proposal containing the choice of interventions was presented at the SC meeting on 4 December 2014. Then, a workshop involving all the key stakeholders of this study was organised on 16 December 2014. As previously stated, the aim of this workshop was to present the methodological framework and the evaluation process. It was also used to validate the relevance and credibility of the selection process and its result by the NGOs and the DGD managers themselves.

The team suggested that the members present (NGO - DGD Managers) complete together one datasheet for each proposed intervention for this impact evaluation. Then they asked them to place a post-it with their intervention in one of the diagram's sets or overlaps. This exercise was very informative and conclusive: the diagram obtained was exactly the same as the results obtained by the evaluators.

34 These costs for translation/cultural differences must be taken into account for all proposed interventions. It

is important to be aware that costs can vary greatly for a similar learning process. Given the tight budget constraints, a rational choice should be made in this regard.



This diagram is presented in figure 5. It reveals that 5 of the 22 interventions initially proposed satisfied all three criteria categories. Given that two of them (numbers 15 and 16) were from the same NGO (Vredeseilanden, also called VECO) and in the same country (Indonesia), the suggestion was to select one of the two. VECO chose the rice industry (no. 16).

Figure 5: Diagram facilitating the selection of the interventions

Although these 4 interventions are not a representative sample of the entire NGO sector, their selection offered several advantages, lending credibility to this evaluation for the Belgian NGO sector.

These interventions reflect the actual conditions in which the NGOs work: - Different intervention durations: Iles de Paix between 2011 and 2013, the other

interventions selected were implemented between 2008 and 2013. - Completed interventions (Iles de Paix, M3M) and interventions extended to the

present day (VECO, TRIAS in part). - Interventions where there are few or no stakeholders in the sector (Iles de Paix,

VECO); and other interventions where other donors/NGOs have actions in the same sector (TRIAS in microfinance).

- Minor climate events that are part of the reality of life for the beneficiaries (M3M: typhoon in the Philippines).

They cover the specific features of NGOs: - NGO project (Iles de Paix, M3M) versus NGO programme (TRIAS, VECO).

Modestly-sized NGOs (Iles de Paix, M3M) versus NGOs with a higher volume of activity (VECO, TRIAS).

- Different types of partners: municipal authorities (Iles de Paix), farming cooperatives/organisations (VECO, TRIAS), civil society organisations (M3M, TRIAS), private sector (VECO), local NGOs (TRIAS).

- Interventions with different numbers of partners: 2 for Iles de Paix and VECO, 8 for TRIAS and 4 for M3M.

- Affecting various sectors: health (M3M), agriculture (VECO and TRIAS), small animal farming (TRIAS), microfinance (TRIAS), access to water (Iles de Paix).

- Various types of interventions: access to goods/services (Iles de Paix, VECO, TRIAS), partner capacity building (all), lobbying (VECO and M3M, to a lesser extent for TRIAS and Iles de Paix), civil society capacity building (M3M).



- A certain geographical balance in terms of geography and socio-economic context: one intervention in Latin America (Iles de Paix), two in Asia (VECO, M3M), one in Africa (TRIAS). Tanzania is a low-income country, while Peru, Indonesia and the Philippines are middle-income countries, although the intervention areas are generally considered to be low-income.

They provide a sample of the evaluation methods developed within the NGOs - Use of qualitative tools to assess the effects on direct beneficiaries (partners)

and to a lesser extent to assess certain effects on final beneficiaries: "Most Significant Change" (M3M), "outcome mapping" (VECO), "spider diagram" (VECO and TRIAS).

- Data collected by the NGO about the final beneficiaries (TRIAS and Iles de Paix).

Box 2: Non-representative sample of the interventions of Belgian NGOs

It is important to state that the 4 interventions selected for this impact evaluation are not representative of all the interventions of the Belgian NGO sector. Firstly, only 12 NGOs proposed interventions for this study. Secondly, these NGOs probably submitted interventions where they were relatively confident about the effects produced. And finally, they were selected using relatively strict criteria favouring interventions that went well in the field, and where the monitoring and evaluation system was relatively well developed beyond the outputs (which is often not the case - see Study on the practical evaluability of the interventions co-funded by Belgian development cooperation, 2015).

Furthermore, the civil society capacity building aspect is primarily present in just one of the 4 selected interventions. However, this type of intervention is tending to become increasingly present in NGO actions.

4.4 Lessons to be learned from the selection process in terms of ex post impact evaluability

An analysis of figure 5 (see previously) and the datasheets per intervention reveals that:

- Half the proposed interventions satisfied the 'rigorous mixed methods' criteria (11/22). The other half did not meet this criterion for 3 main reasons.

Almost none of the NGOs had a reliable database with enough observations on the final beneficiaries to conduct the impact evaluation as envisaged in this study (see Box 1). The evaluators would therefore have needed to consult a certain number of randomly chosen beneficiaries. However, access to beneficiaries proved complex for most interventions that did not fulfil this criteria (either due to the lack of a list and/or because they were in a relatively isolated area or spread out over a vast region). This in turn made the use of the mixed methodological approach very/too complex.

To a lesser extent, elimination under this criterion was also due to the fact that the expected effects at the level of outputs and outcomes on the partners were only partially achieved (because of implementation problems and/or exogenous events related to the context). This is problematic because non-achievement of the results at partner level reduces the chances of seeing an effect on the final beneficiaries and therefore reduces the benefit of conducting an impact evaluation as defined in this study (see Box 1).35

35 Developing an evaluation system with quantitative analyses is only relevant if there are compelling reasons

to think that there is an effect (hence the importance of developing a good theory of change). Otherwise, in order to prove scientifically that there is no effect or to measure the potentially small effect the number of



The difficulty in identifying a credible counterfactual was only cited twice as an exclusion criterion for the use of rigorous mixed methods. Nevertheless, it should be noted that for all the interventions, the identification of a counterfactual was not easy, mainly due to the ex post situation of the evaluation.

- Just over half the interventions passed the 'context' criteria (12/22). This

finding shows that many NGOs operate in fragile areas. Conducting a rigorous impact evaluation in unstable areas is not infeasible, but it is more complex and potentially dangerous (10/22). The decision was therefore made not to choose such cases for this evaluation.

- Most of the proposed interventions satisfied the 'added value' criteria (18/22). The 4 interventions outside this set presented completely unreasonable costs for this ex post evaluation to have an added value. Only one of these 4 interventions was also outside the quantitative methods set (because it was not easy to find a credible counterfactual). These findings do not however mean that a planned ex ante impact evaluation would not have had added value.

This analysis highlighted three main obstacles to conducting an ex post impact evaluation using rigorous mixed methods. Before presenting these 3 major constraints, it should be emphasised that this evaluation comes ex post with definitions appropriate to what it seeks to evaluate and how it intends to evaluate this. Thus, the NGOs could not anticipate anything.

(1) The first obstacle is the difficulty in accessing the final beneficiaries. This access is vital in achieving the objectives of this evaluation, especially since almost no NGO has reliable data on the number of final beneficiaries. This observation makes random selection complex, although possible with significant resources to reconstruct a list of beneficiaries and/or find them directly in the field. This observation gives rise to another: NGOs concentrate mainly on their M&E system at partner level because monitoring the target populations is relatively complex. There is also another possible reason: NGOs are only partially concerned by the change in the target populations, because the partner implementing the intervention is responsible for such changes.

(2) The second obstacle is related to the first and lies in the lack of resources. For a rigorous qualitative or quantitative approach, it is important to go and question the final beneficiaries and a control population (counterfactual). This is normally almost always possible but not at any price. For a qualitative approach, it is often still affordable. But once household surveys are planned for interventions where there are no lists of beneficiaries and such beneficiaries are spread out over a vast area or in very isolated areas, the cost/benefit ratio then becomes negative. This observation is true in an ex post situation. There is a high probability that this would not be the case in an ex ante situation because the NGO would be able to anticipate this risk.

(3) The third obstacle identified is the risky context. Many stakeholders work in fragile states. Implementing such an evaluation system without leading to excessive risks for the international experts, would require working with local experts who have experience in this type of evaluation. This option was not considered and therefore these difficult security situations were excluded from the possible case studies.

observations to be collected increases significantly. If in doubt, it would be better to set up a rigorous qualitative evaluation system, rather than get involved in a quantitative approach, ex post.



5. Responses to the summative evaluation questions for the 4 selected interventions

The budget proposed in the tender provided for two evaluations with primary data collection (household survey type) and two evaluations using secondary data. Given the budget and time constraints of this evaluation and the internal data of the selected NGOs, it was decided to conduct a household survey in Peru (Iles de Paix) and another in Indonesia (VECO).36 In Tanzania, TRIAS had a reliable database containing a significant number of final beneficiaries (200). Furthermore, the evaluators found LSMS data (World Bank)37 available for the pre-intervention, mid-intervention and end intervention periods. In the Philippines, M3M works with a research institute that collects data related to the right to health and strengthening civil society, from which it frequently publishes analyses. Unfortunately, the team could not access this raw data and had to give up on using quantitative data for this case study. The methodological approach for this last case study is therefore based solely on rigorously-applied qualitative tools.

This chapter aims to summarise the evaluation reports written for each of the aforementioned interventions. A summary of the principal findings for the 7 summative evaluation questions is presented for each case study, successively. These questions are described in the following table. A second part was added to EQ2 in order to learn the lessons of the methodological approach implemented for this evaluation.

For more detailed information, it is possible to refer to the evaluation reports for each of the interventions, available in the appendices of this report. It should be noted that at the results presentation workshop on 19 November 2015 the elements of responses presented below also led to discussions between the evaluation team, the NGOs concerned and members of the DGD and SEO.

Finally, it is important to remember that every intervention evaluated as part of this study is one component of one of the NGO's programmes.38 However, the analysis is limited to evaluating the impact of the interventions and not of the programmes.

36 Iles de Paix has a database of final beneficiaries. However, the variables that the evaluators deemed relevant

were missing. VECO also has a database of final beneficiaries. However, there are not enough observations (20) to make relevant analyses. In addition, the selection method for the respondents is not random, challenging the data's representativeness and therefore its credibility.

37 This data came from representative household surveys at country level. Data is therefore available for the intervention areas, which can be used to see how the households have changed on average over the project duration. This is used as a counterfactual - although imperfect, it is informative (see below).

38 For Iles de Paix, the intervention evaluated is the only component of the programme financed by the DGD, the other components of the programme are financed using its own funds.



Table 1: Presentation of the 7 summative evaluation questions

EQ1 Relevance To what extent is the intervention relevant to the local populations? Is the choice of partner also relevant in providing the best response to the needs of the beneficiary populations?

EQ2 Evaluability

Is the monitoring and evaluation system set up by the NGO and its partners sufficient in assessing the effects on the different types of beneficiary (outcomes) and the intervention's impact at sectoral and regional level? - What lessons are learned from the methodological approach implemented for this evaluation?

EQ3 Findings on outcomes

To what extent has the intervention achieved the expected and unexpected outcomes? Can these be attributed to the intervention?

EQ4 Findings on impact

To what extent has the intervention had an impact at sectoral and/or regional level?

EQ5 Sustainability To what extent are the intervention's results for the different types of beneficiaries sustainable over time?

EQ6 Cross-cutting themes

To what extent has the intervention taken into account and had an effect on cross-cutting themes, especially gender and the environment?

EQ7 Added value What added value has this impact evaluation using rigorous mixed methods had in terms of findings and methodology?

5.1 Access to drinking water in Peru (Iles de Paix)

5.1.1 Brief presentation of the intervention

▪ This drinking water supply project was implemented in the rural communities in the Santa Maria del Valle and Molino districts in the Huánuco region of Peru over the 2011-2013 period.

▪ Iles de Paix (IdP) directly supported two municipalities, Santa Maria del Valle and Molino. These were IdP's partners for this intervention.

▪ The intervention had two different components: o In 12 communities, IdP and the town halls jointly financed the construction

of new systems to supply drinking water to homes. Households were offered access to a sink with tap, a shower and a hydraulic WC with a drainage system for waste water. These new water systems were managed by a management committee (called JASS). Such a committee was therefore created in each of the beneficiary communities.

o In 14 other communities, there was already a water supply system. However, the management committees were ineffective. With the help of the municipalities, IdP developed the capacities of these management committees so that they could collect the monthly fees, maintain the distribution network and deliver safe drinking water.

▪ This intervention's objectives were to improve living conditions for people and reduce the incidence of gastro-intestinal diseases, particularly among children under 5.



▪ There were 3 types of beneficiaries: o Direct beneficiaries: the 2 municipalities o Intermediate beneficiaries: 26 JASS (including 12 new JASS). o Final beneficiaries: the residents of the beneficiary municipalities (estimated

at 4,750 users).

5.1.2 Methodological approach

▪ Individual interviews and focus group type discussions with: - Teams from the 2 partner municipalities (old and new team because of

elections in November 2014). - The team from a control municipality, i.e. one that did not receive support

from IdP, but which would have been eligible for the intervention and thus relatively similar to the municipalities supported.

- Members of the management committees of 11 beneficiary JASS and 5 non-beneficiary JASS (also called control JASS).

- Workers (doctors and nurses) at the health centres and facilities in the intervention areas.

- Various representatives from the local governments in Huánuco. ▪ 326 household surveys:

- Approach with counterfactual: o 172 beneficiary households; o 154 non-beneficiary households.

- Random selection of communities to be visited and respondents in the villages.

- Reconstruction of baseline indicator values (situation in 2011) where possible. Collection of the indicator values in May 2015, at the time of the evaluation.

- Difficulty in reconstructing a baseline for the control group because its members could not refer to the situation before the intervention since they had not benefited from this intervention. Thus the hypothesis was that the water supply sources for households in the control group did not change significantly over the intervention period. This hypothesis was confirmed by those in the field.

▪ The identified counterfactual proved to be relevant and credible. - No systematic difference between households from the beneficiary group

and the control group, regarding exogenous variables (not influenced by the intervention).

- No systematic difference between these two groups in the pre-intervention (baseline) situation.

- The hypothesis whereby these two groups would have evolved in a similar way over time without the intervention was confirmed by those on the ground.

▪ Analytical tools: - Difference of means tests between beneficiary households and non-

beneficiary households to measure the effects and demonstrate that they can be attributed to the intervention.



- Analysis of the content of the interviews and discussions in relation to the quantitative results found.

- Systematic analysis grids of the management performances of the water management committees.

5.1.3 Responses to the 7 summative evaluation questions

EQ1 - Relevance

IdP's drinking water supply project was a relevant choice insofar as, in the two intervention districts, a very small proportion of the population had access to drinking water and sanitary facilities. These two districts mentioned access to drinking water as one of the most important needs for the local populations.

In the Peruvian context, making the municipalities partners was also a relevant choice by IdP. These institutions are sustainable and are responsible for building and maintaining drinking water supply systems. Although the elections led to a change in the teams, the institutional strengthening continued, at least partially. Furthermore, having their own financing, the municipalities were able to contribute to a large amount of the intervention's budget (53.2% for Molinos and 54% for SMdV). They are not therefore dependent on external funding to operate and reproduce this type of action in other communities.

EQ2 – Evaluability

Based on the NGO's M&E system

The M&E system is effective insofar as the indicators are relevant, measurable and unambiguous and that the teams use it as a reporting tool and as a way of drawing lessons. The monitoring relates to the intervention's results for the partners and the JASS (intermediate beneficiaries). Information about the final beneficiaries is also collected through the JASS and a household survey conducted by a consultant at the end of the intervention. The indicators relate to outputs and outcomes but not impact. For example, there is no indicator for the change in people's health even though this is a required impact.

The monitoring is relatively precise but not always systematic, particularly when carried out by the JASS. The use of all the data collected is therefore problematic.

IdP conducted a detailed risk analysis. It notably revealed the possible loss of technical capacity following the change in the municipal teams after the elections. Despite this risk, the municipalities still seem to be the most relevant partners for delivering this type of service to the populations in Huánuco.

Lessons learned from the methodological approach followed for this evaluation

▪ The counterfactual approach was feasible for the municipalities as well as the JASS and the final beneficiaries.

▪ Random selection was relatively easy due to the data files from the NGO's monitoring and evaluation system.

▪ The survey had an almost 100% response rate. This was stimulated due to two elements: the team of enumerators was comprised of locals who were well acquainted with the region; and during training, there was a specific focus on training the enumerators in how to present the study.

▪ The methodology could not be used for a rigorous assessment of the effect of safe drinking water on health. This is partly due to the fact that there was no baseline data at this level and that relying on people's memories for this type of information was not relevant. Additionally, there are many elements other than access to drinking water that influence health and this in both groups (beneficiaries and non-beneficiaries, such as boiling drinking water for example). Also, although secondary



data is available from the health centres or the Ministry of Health, it is unreliable and not easy to use (coding of the answers, mix of several diseases per code, etc.).

▪ The methodological approach was unable to make a quantitative assessment of the project's effect on people's well-being or their self-esteem; two effects often mentioned in the interviews. Neither did it allow an understanding of how the time saved due to access to the tap and sanitary facilities was used. It would have been possible to develop this type of indicator but this would have significantly increased the preparation and response time for the questionnaire, time that the team did not have. The team had to make a compromise between the questionnaire's length and the quality of the responses received.39

▪ The assessment of the effects on the JASS would have been more rigorous if the team had questioned all the beneficiary JASS and systematically analysed their accounts. This would have allowed an even more precise quantification of the collection rate of the monthly fees from users, a crucial element for the sustainability of drinking water systems.

▪ Finally, the effects on hygiene, cleanliness and the quality of living conditions, often mentioned in the qualitative interviews could have been assessed through the use of photos taken during each survey (using the tablet). However, to do so, the enumerators needed systematic instructions on the elements to be included in a photo, which they didn’t have.

EQ3 – Findings on outcomes

▪ With regard to the direct beneficiaries (the 2 municipalities), from 2012 to 2014, both municipalities exceeded the objectives set. The skill level achieved by the municipal Social Development services enabled them not only to be effective in supporting the JASS, but also to extend the drinking water network to other villages, well beyond what had been anticipated. They improved their capacities to organise, train and supervise the local water management committees and their capacities to build drinking water supply systems.

▪ Regarding the intermediate beneficiaries (26 JASS). - A first mixed result is that many JASS fail to collect a high proportion of the

fees due. However, our 3 sources of information give different results: (1) a check made by the municipality of Molino in 4 villages shows that in 2014 the JASS only collected 57% of the fees due; (2) a survey of 11 IdP's beneficiary JASS shows that 5 of them face significant challenges with late or non-payment; (3) more optimistically the household surveys reveal that 74% and 83% of IdP's beneficiary users are "good payers", i.e. they had paid the fees for at least 3 of the first 4 months of 2015.

- In terms of the organisation of the JASS: an index constructed from 5 organisation level indicators and applied to 16 JASS (of which 11 are beneficiaries of IdP and 5 are control JASS) indicates that some have become efficient while others are struggling, but all are more efficient than the JASS that have not received support. This analysis shows the importance of the progress made in a short space of time and what remains to be done to strengthen these institutions. It is also noted that the JASS that received support from other institutions perform just as well as those supported by IdP.

- With regards to the water system users’ opinion, 34% believe that there is no problem with the community's JASS. The issues most frequently mentioned are those related to the payment of fees deemed insufficient (and the level of penalty lacking credibility).

39 On average a survey lasted 1 hour and 30 minutes.



▪ Regarding the final beneficiaries. - The household surveys show that the share of the population who said that

they had access to more abundant and better quality water increased significantly. The share of the population reporting that they were sometimes without water dropped from 56% to 42% and the number of months during which they were without water fell from 2.6 to 1.7 months. The share of the population reporting that they had access to drinking water rose from 21% to 91%.

- However, water quality in IdP's beneficiary villages is still problematic. There was a marked change in water potability in these villages over the intervention period: 38% of the water analysis samples judged the water suitable for consumption in 2012, this figure reached 71% in 2015. But the fact remains that, on average, in more than one in three cases the water is declared "unsafe" by these analyses. Moreover, even if the water is not always drinkable at source, it is more likely to be so at the point of consumption because there are fewer sources of water contamination: water is less often transported and stored; and it is more often boiled (a rise from 43% to 65% in the beneficiary households).

- The household surveys show that both the hygiene and cleanliness of the houses have increased significantly with IdP's intervention. This is due in part to the use of WCs and showers, especially by the young.40

- We did not find any statistical proof of a causal relationship between safe drinking water and the users' health (especially for children under 5). However, all the interviews conducted, with users, municipalities and health centres confirm an important and short term effect of safe drinking water on health.

- The household surveys reveal that the installation of residential drinking water supplies saves time, particularly for women: 17 hours a month during the wet season and 24 hours a month in the dry season due to a reduction in the time needed to fetch water, this is on average over 10 full days a month every year. We do not know how this saved time is used but according to the qualitative interviews and field observations, it contributes to an improved quality of life for the beneficiaries.

- The study shows that the results given above can, for the most part, be attributed to the IdP intervention, while remaining cautious about the possible synergies with other stakeholders in the field of hygiene, health and safe drinking water.

New water system Old water system Without water system

EQ4 - Findings on impact

▪ There does not seem to be an impact on policies in terms of access to water at Regional Government level. However, there is an impact on policies at Ministry of Health level: IdP helped to obtain practical results on the public sector's commitment to sanitation.

40 It should be noted that this result contrasts with the findings of IdP's external evaluation at the end of the

intervention. This may be explained by the time needed for the population to adapt to the more modern facilities or simply by a different way of asking the questions. We are relatively confident in the quality of the responses collected, given the importance placed on training the enumerators and the testing of the questionnaire.



▪ There is also an impact on municipalities, which goes beyond the two partner municipalities. IdP and the two beneficiary municipalities have demonstrated that poor rural municipalities with few resources and means, could do a lot for sanitation and improving the living conditions of their population.

▪ In terms of the extent of the effects, the number of people who now have access to a safe drinking water supply is 15,000 in the two municipalities (which is 43% of the population) including 5,000 who have been impacted directly by IdP.

▪ An important impact of the project on IdP's legitimacy among the beneficiary populations, the municipalities, the public sector and other NGOs, due to its expertise acquired in the field of water. IdP therefore enjoys greater capacity for action.

EQ5 - Sustainability

▪ The elements that could have a positive influence on the project's sustainability are:

- Its alignment with the institutional framework of national and local policies (especially the water and sanitation policy);

- The importance given to local resources, both human and financial, in order to prevent external dependency.

▪ The elements that could negatively affect the project's sustainability are: - The change in the municipal teams every 4 years reduces the technical and

social capacities of these teams and therefore their capacities to monitor and support the JASS. The study showed that this support was very necessary. However, all is not lost. The example remains and the trained individuals are still in the region.

- The collection of the fees for water is still inadequate and is largely related to the poor application of the penalties provided for in the JASS regulations. This point is important because the non-collection of fees beyond a critical threshold can quickly lead to a negative dynamic and endanger the maintenance of the supply systems and therefore the quantity and quality of the water available for the population. The JASS are the weak links.

EQ6 - Cross-cutting themes

Women are the big winners of this intervention, although men and children also benefit from it. It was women who ensured the supply of water, so they have benefited from significant time savings. The qualitative interviews seem to indicate that they feel more valued because of the sanitary facilities (they feel "like city women").

However, women are poorly represented on the JASS. This is partly explained due to their lower level of education than men.

There is a certain effect on the environment since the water systems must be carefully protected from pollution, especially where the water is collected.

The intervention also had a direct effect on good governance both within the municipalities through training, inspections and monitoring the construction sites and within the villages through greater transparency in the management of the JASS.



EQ7 - Added value

Added value in terms of findings

Iles de Paix, due to its presence in the field and its active involvement in the project's implementation has a very good understanding of what worked well or not so well. This evaluation has not therefore made any great revelation in terms of the nature of its findings. Nevertheless, here are the elements that demonstrate the evaluation's added value for the findings:

▪ The measurement of certain indicators clarified the importance of the change. This includes the measurement of the time saving, the proportion of households who boil their water, the proportion of households who complain about a lack of credibility in the application of penalties by the JASS in the event of late or non-payment, etc.

▪ The counterfactual approach at the level of the JASS has helped recognise that the support given to the water management committees by other stakeholders had allowed the same type of improvement. Thus, IdP has not necessarily done any better than the other stakeholders.

▪ This study shows the importance of specifying certain indicators. This includes an indicator for assessing the collection rate for water fees which is essential in ensuring the sustainability of the water systems. It also seemed to be worth specifying indicators to be collected to judge water quality at the point of consumption (contamination sources, preventative behaviours, etc.). This is particularly important because indicators for measuring the effects on health are complex and secondary data from health centres cannot be used. Therefore, it can sometimes be more relevant to collect indicators on users' behaviours in relation to health (hygiene, quality of the water consumed, food, etc.) in order to deduce the effects on health, instead of directly collecting health indicators which are irrelevant and unreliable and therefore unusable.

▪ Furthermore, the study focuses on the fact that water quality is only one of the factors influencing health and that IdP is not the only NGO working in this field. It is therefore important to put IdP's contribution into perspective regarding improvements in people's health.

▪ According to IdP, the study also further improved the understanding of certain underlying mechanisms in achieving the results.

▪ And finally, the debate about the definition of impact (effects at a more general level) drew the attention of the IdP team. It seemed relevant to them to assess this level of result because it would permit a view of the effects at a more macro level, even if these are not necessarily mentioned when a project is formulated.

Added value in terms of the evaluation methodology

The methodological approach applied for this impact evaluation helped to recognise several interesting practices for monitoring and evaluation.

▪ Firstly, it demystified the counterfactual approach, because it could be applied to all types of beneficiaries. In addition, IdP recognises its benefits because it intends to apply this approach to the municipalities in the monitoring and evaluation system for their new programme. Indeed, IdP finds that this approach provides a better understanding of the mechanisms and allows greater precision in measuring and attributing the effects.

▪ IdP realised the importance of training the enumerators and testing the collection tool when deciding to implement primary data collection.

▪ IdP also commented on the benefit of having a mixed group of enumerators to avoid gender biased responses.

▪ They were also interested in the use of tablets to collect information. This is an interesting technological innovation that IdP will seriously consider.



Training the enumerators in the content of the questionnaire and the use of tablets

▪ IdP, although already applying the random selection of respondents to surveys, is even more convinced of the importance of this practice in guaranteeing the credibility of the information.

▪ Following this study, IdP intends to carry out an internal discussion regarding the sample size to be able to measure certain effects with greater precision. This will take place alongside a discussion about combining the two approaches, qualitative and quantitative. Nevertheless, IdP still fears the additional workload that this type of practice may generate for the field teams and the budget that this would represent.

▪ The video tool was also interesting to IdP as a means of communicating the methodology and results; IdP could consider it for providing feedback in the field and thus learn more.

5.2 Supporting the rice value chain in Indonesia (VECO)


▪ The intervention involved providing support to develop a sustainable value chain for marketing rice, particularly organic rice, in the Boyolali district in Indonesia.

▪ The evaluated intervention was implemented over two DGD funding periods, 2008-2010 and 2011-2013. It is still ongoing in this region.

▪ The NGO directly supports private partners: LSKBB (1st period) and then APPOLI. These partners support farmers' groups (intermediate beneficiaries) who in turn coach the farmers (final beneficiaries).

▪ The NGO also lobbies the local governments and through its partners and conducts organic farming awareness campaigns among consumers.

▪ This intervention aims to influence trade relations for rice growing (price, resources, negotiating power) and improve income, as well as the well-being of rice farmers.


▪ It was immediately decided by joint agreement with the NGO that this study would focus on measuring the intervention's effect on the farmers' income.

▪ Interviews and group discussions were conducted with the principal stakeholders during the preparatory mission as well as during household surveys.

▪ The household surveys were organised with a random selection of respondents and a counterfactual approach. They concerned:

- 302 beneficiary households (3 types of beneficiaries: farmers converted into organic farmers supported by APPOLI, traditional farmers who are part of producers' groups supported by APPOLI that include organic farmers, traditional farmers who are part of groups supported by APPOLI that do not include organic farmers)



- 188 non-beneficiaries - traditional farmers (i.e. living in the same region, relatively similar to the beneficiary households, but not part of the producers' groups supported by APPOLI and consequently by VECO)

- Reconstruction of pre-intervention (baseline) data in 2008 and post-intervention data - agricultural campaign 2014.

Surveys in Indonesia

▪ The counterfactual, although imperfect, was relevant. The econometric techniques used allowed the systematic differences between two groups to be monitored, at least in part.

▪ Statistical and econometric tools were used to measure the change in the farmers' net income due to the intervention.

5.2.3 Responses to the summative evaluation questions

EQ1 - Relevance

Indonesia is a major consumer of rice, but is also a net importer of it. VECO made a relevant choice in focusing on this grain by proposing more efficient and sustainable production techniques. The approach is also relevant because VECO Indonesia uses an integrated approach. In addition to supporting production techniques, actions are also conducted in terms of marketing, influencing policies and raising consumer awareness of organic rice.

However, the choice of partner is questioned in this study. It does not seem capable of providing the best prices or purchasing all the organic rice production. Furthermore, it seems to overestimate the number of beneficiaries or at any rate, not all the beneficiaries that it mentions are aware of it. APPOLI also seems highly dependent on external funding.



VECO's logical framework is well designed and most of its indicators are SMART. A relatively detailed risk analysis is also available and several external evaluations have been produced.

However, the indicators are collected based mainly on self-evaluations and self-declarations from direct partners. Indicators for the final beneficiaries are collected through the partners based on surveys with a sample of 20 self-selected beneficiary farmers.

Although most of the indicators are relevant and measurable, this evaluation challenges the rigour of the collection method itself. Indeed, self-evaluation, the self-selection of respondents and the small number of observations can lead to biases in the M&E system and especially on the effects produced for the final beneficiaries. One of the initial findings was that there were errors in the list of the partner's members and therefore in the list of the intervention's beneficiaries.




▪ The methodological approach developed was highly focused on implementing the survey among producers. This required an enormous amount of time and commitment from the field teams: mobilisation of enumerators, training, translation of the questionnaire into the local language, the difficulty in identifying respondents based on the partner's unreliable lists, etc.

▪ It would have been interesting to spend more time understanding the agricultural issues of rice in the region in order to refine the wording of some questions and therefore give a more insightful interpretation of the results. The contribution of a thematic expert would undoubtedly have allowed this.

▪ The fact that it was predominantly focused on the farmers' income prevented the effects of the programme being studied in terms of the well-being of the rice producers.

▪ The information collected about income could have been more comprehensive (existence of non-farming income, complementarity/substitution between income from different types of rice, calculation of production costs, etc.) but this would have significantly increased the length of the questionnaire and therefore the response time. Strategic choices therefore had to be made.

▪ Although included in the appendices of this main report, the content of the interviews was too infrequently used to explain the results of the statistical analyses.


▪ Effect on farmers' income - The data analysis does not allow a significant conclusion to be drawn about

farmers' income due to the intervention. Although primary data was collected in the field, some monitoring data that would have allowed robust conclusions about income to be drawn was lacking.

- Some analyses show that organic rice producers have a slightly higher income than those who grow traditional rice (weak result). However, this slight economic benefit is not due to a higher sales price for organic rice. The SNI certification does not seem to allow producers to sell their organic rice at a higher price. Furthermore, the production costs for organic rice are lower than those for traditional rice production: the cost reduction is estimated at 2.7% (VECO data).

- Analyses show that there does not seem to be an economic benefit to being part of a producers' group supported by APPOLI when growing traditional rice, but there is when growing organic or "healthy" rice.41

- The methodology developed for this study cannot be used to make a statement about the intervention's effects on the change in the farmers' well-being.

▪ Conversion to organic rice

- The project seems to have influenced a good number of current rice producers to try organic. Indeed, 86% of these producers began to produce organically after 2008. However, the quantity of organic rice produced per farmer did not increase over the period. Although the project had succeeded in slightly reducing traditional rice production, producers continue to produce it, even if they also grow organic rice. The conversion to organic rice is therefore still marginal.

- Furthermore, there is a real risk of the organic rice becoming contaminated. Indeed, it is commonplace for farmers to produce several types of rice on the same plot or on plots close to one another. The monitoring system based on a relationship of trust between APPOLI and the producers' groups is not

41 There is a category of rice certified organic and another called "healthy rice". This "healthy rice" category is

not certified organic but the production processes are similar to organic processes, at least in part.



well-developed enough to prevent contamination and identify with a greater degree of certainty the organic quality of the rice produced.

▪ Is there a market for organic rice? - Approximately 50% of organic rice producers do not currently and do not

intend to sell their organic rice. They produce it for their own consumption. They are convinced that it is better for their health.

- Furthermore, APPOLI does not seem to have the necessary financial resources to purchase all the organic rice production from its members.

- Two producers' groups received an IMO certification (from a Swiss company) and exported organic rice to Belgium. This procedure has not been repeated because it is too costly, neither APPOLI nor the producers' groups have the means to finance it and the authorities are not willing to continue paying for it.

- The projects to export organic rice were finally abandoned when an international inspection revealed that a small amount of the organic rice production was contaminated and therefore unsuitable for export.

▪ APPOLI and its members

- It would appear that the files on APPOLI's members are unreliable. Indeed, for the three sub-districts Nogosari, Sambi and Cimo, APPOLI (and VECO) present a file of 265 SNI producers and 440 ICS producers.42 When selecting the sample, the teams were only able to identify 38 SNI producers and 66 ICS producers in total. The actual number of beneficiaries therefore seems to be significantly smaller (14.3% and 15% of the number indicated for the SNI and ICS producers respectively).

- Most of the farmers questioned were not aware of their affiliation with APPOLI. An explanation given by APPOLI is that only the head of the group is in contact with APPOLI, thus the other group members are not necessarily aware of their affiliation.

- APPOLI is significantly more active with organic rice producers and the beneficiaries are generally satisfied with the support received.

- It should also be noted that the government is involved as a priority, not only with organic farmers but with all producers. The observed effects are therefore partly due to APPOLI's support but not exclusively.

APPOLI Members


▪ Relatively weak impact at regional level. This was not an intended effect when the project was being formulated given the project's limited scope in the region. The 'spillover' effect has not therefore been observed.

▪ Due to the methodology followed, no comment can be made on the effect of the intervention in Boyolali on the entire programme conducted by VECO in Indonesia.

▪ The sectoral impact is more tangible. VECO and its partners have played an important lobbying role and influenced the local authorities in Boyolali. Organic farming (and "healthy rice") is today considered and even supported in various respects by the local authorities.

42 The SNI producers correspond to farmers producing certified organic rice by Indonesian standards. The ICS

producers correspond to farmers of 'healthy' rice based on internal inspections ensuring that at least part of the production process satisfies organic standards.



▪ Extension of the programme to 2 other organisations using the same type of model set up with APPOLI. We have reservations about the legitimacy of this extension while the APPOLI model still presents serious shortcomings.


The sustainability of the value chain developed by VECO in the Boyolali district is mixed. The project succeeded in convincing farmers to grow a certain proportion of their rice organically and raise awareness of its importance at sectoral level. But questions remain about the market for organic rice and APPOLI's role as a key intermediary for marketing this rice. Indeed, it does not seem to be able to offer the best prices, or even guarantee the organic quality of the product. Furthermore, there is no evidence to date that without VECO's financial support, APPOLI could continue to play its part in the region's rice value chain. The economic model developed remains to be demonstrated.


VECO Indonesia's intervention has a strong environmental objective since organic and "healthy" rice production reduces the use of chemicals and pesticides. This effect is partially achieved given the increased number of producers growing organic rice.

In addition, organic or "healthy" rice producers are convinced of the positive effects of this rice on their families' health. Most organic rice is consumed by the producers themselves.

However, the intervention did not have an impact on the number of female rice growers which remains just as low. Women are also under-represented within APPOLI where they hold just 5 positions out of 35.

EQ7 - Added value


▪ The deliberate choice to focus on income, one dimension of the change in people's living conditions, greatly reduced the type of conclusion that can be drawn about all the possible effects of such an intervention. The impact is obviously broader than this economic dimension; it encompasses several dimensions of the beneficiaries' living conditions. The added value of the findings on the total impact of the intervention is therefore low, even though some interesting results have been presented.

▪ Although imperfect, measuring the development in the producers' income over the intervention duration shows that the expected economic effect is not robust.

▪ The difference between the sales price of 'healthy' or organic rice and traditional rice is not proven by the estimates made.

▪ The random selection of respondents allowed the overestimation of the number of members to be detected.

▪ The economic model developed with APPOLI raises several questions.

Added value in terms of evaluation methodologies

▪ VECO is interested by the definitions proposed for outcomes and impact. However, VECO does not use these same definitions and is currently satisfied with the definitions that it uses consistently between its various programmes. It defines impact according to 3 priorities:

- A change in the living conditions of farmers' families (where income is one dimension);

- The capacity building of partners, producers' groups and the farmers who are members of these groups;

- The possibility of extending the programme, the programme's effects on non-beneficiaries.



Based on VECO's definition, this study has only been able to evaluate a single dimension of the impact.

▪ The rigorous approach developed in this study (the random selection of respondents, together with the counterfactual approach and the large amount of data collected and analysed) was able to highlight certain shortcomings with the partner concerning the number of beneficiaries. It also questioned the expected economic effects. However, the effects in terms of the well-being of the beneficiaries was not studied. Although VECO perceives a certain added value in implementing more rigorous evaluation practices, there are still concerns as to the resources to be implemented and therefore the cost/benefit ratio of this type of approach.

5.3 Supporting the chicken and sunflower value chains in Tanzania (TRIAS)


▪ TRIAS supported the chicken and sunflower value chains in two districts, Babati and Monduli, in northern Tanzania.

▪ The evaluated intervention covered two DGD funding periods, 2008-2010 and 2011-2013.

▪ TRIAS worked closely with local partners including local NGOs and civil society organisations ("member based organisations") - WEDAC, ACIST, FIDE and Faida Mali. The partners in turn supported farmers' groups by offering several services (various training courses, access to credit, etc.) to local people and the provision of inputs.

▪ The objectives were to improve animal and food production and increase the assets and consumption of poor rural families.


The methodological approach comprises two distinct main steps with a counterfactual approach43.

▪ Two types of secondary data (existing data) were analysed. - The NGO's database of 200 beneficiary farmers dating from 2013 (end of

the intervention). This database contains information about the reconstructed pre-intervention situation and data about the situation at the end of the intervention.

- Public data (LSMS data - World Bank) about households over three separate periods: 2008-2010-2013. By chance, these waves collecting similar data are available for before, during and after the intervention's implementation. The evaluation uses the national population or that of a region (or sub-region) as the control group. By considering the households in the intervention area, the comparison becomes more relevant but statistical accuracy is lost because this greatly reduces the number of observations.

43 The first uses a counterfactual approach with generic controls and the other has a qualitative counterfactual

with 'shadow' controls (see Étude de l’évaluabilité pratique des interventions cofinancées par la coopération belge, SEO, 2015).



▪ Various statistical and econometric techniques were used in order to assess the intervention's effect on the households' income, assets and consumption.

Focus group discussions with beneficiaries and non-beneficiaries.

▪ Field visits were conducted following a rigorous qualitative approach. This means that the areas were chosen randomly, but also that the respondents were selected randomly within the beneficiary and non-beneficiary (counterfactual) groups.

- In each region (Babati and Monduli), interviews took place with the local authorities.

- In each of the regions affected by the intervention, two areas were visited. And in each area, four group discussions were organised: two with beneficiaries and two with non-beneficiaries relatively similar to the beneficiaries but who had not benefited from the intervention. Eight group discussions therefore took place (4 treated groups and 4 control groups).


EQ1 - Relevance

Focusing on chicken and sunflowers was a relevant choice. Indeed, chickens are easy to keep, feed and especially to sell. They can therefore be used to provide an income regardless of the time of year and to produce eggs and meat for personal consumption. In addition, chickens require very little water and can therefore resist drought, which is an important benefit in the region in question. Moreover, this type of activity is mainly reserved for women, potentially enabling them to benefit from greater financial independence.

Similarly, sunflower seeds are easy to store. They can be eaten but also processed and then sold for a higher price. Growing sunflowers can also provide a year-round income. Additionally, improved seeds or intercropping can be used to give better yields of sunflowers but also other food crops.

The decision to combine two types of partners (NGO and MBO) for this intervention also seems to be relatively pertinent. The choice to use organisations well-established in civil society (MBO) that support farmers' groups is a way of ensuring the sustainability and 'spillover' effect of the results, because such structures enjoy a certain legitimacy with a large proportion of the population and are therefore trusted by farmers, permitting the effective distribution of good practices. The financial support is beneficial, but they do not need it to exist and continue their work. Local NGOs are effective partners for technical support, but without a financial contribution, they cannot continue this support.



In addition to a relatively sophisticated M&E system regarding the partners, TRIAS has a database with a large number of reliable observations (sample of 200 beneficiary farmers) about the pre- and post-intervention situation. This information can be used to quantify



the effects on the final beneficiaries. However, the attribution of these effects cannot be rigorously demonstrated because no counterfactual approach has been implemented.


▪ The re-analysis of the data collected by the NGO identified several small errors and qualified the effect on the income, when correctly differentiating between income and assets.

▪ The counterfactual approach somewhat qualifies the attribution of the effects on income. Indeed, the quantitative analyses show a positive effect on income but the qualitative interviews also indicate an increase in income in the group of non-beneficiaries. Owing to a lack of data, we cannot verify whether this increase is significantly greater for the project's beneficiaries.

▪ The analysis of secondary, public data (LSMS) also provided interesting comparisons between the beneficiaries and the control group over the intervention period, during which there was a severe drought.

▪ Field visits were used to illustrate the empirical results with practical examples. ▪ The counterfactual approach, even qualitatively (FGD) was possible and provided

a wealth of information. In fact, it was able to put certain accounts from the beneficiaries into perspective because the non-beneficiaries confirmed the same changes in terms of income, productive activities and even changes in eating habits. However, good practices only seem to be observed among the project's beneficiary families, hence the likely 'spillover' effect. Thus, even if TRIAS and its partners are not the only ones promoting chicken/hen farming, they seem to do so in a more sustainable way and one that favours the poorest more than other parties involved (often State initiatives).


▪ TRIAS' partners confirm that they have learned much from their collaboration with TRIAS. In addition to the training they have received to develop their capacities, it is working with TRIAS on an almost daily basis that has been beneficial to them and still is today.

▪ The income of the final beneficiaries includes the sale of livestock during the year but not the value of the livestock, which is an asset like land. According to the proposed definition of income, the analysis of the NGO's database indicates that income increased by 23.7% and not 42% as presented by the consultant who collected and analysed the data. Despite everything this is still a good result given the droughts of 2009-10.

▪ The income44 of the control households, calculated using public data, fell in the area affected by the programme, but this was mainly due to the severe drought in 2009-10. According to the people met, without the intervention, this fall in income would have undoubtedly been even greater.

▪ The intervention seems pro-poor. Indeed, the poorest group experienced a greater and more significant increase in income than the least poor farmers over the intervention period. This increase seems to have been possible due to the sale of milk, eggs and chickens and the growth in the sunflower cultivated area. Nevertheless, without a control group in the data collected by the NGO, it is not possible to demonstrate that this effect is due to the intervention. Only the interviews conducted during the field visits can be used to say that the intervention seems to have strongly contributed to the effect observed.

▪ Chicken/hen farming is traditionally an activity for women. This intervention therefore provided an increase in women's income. Which, according to the statements from men and women, helped to strengthen women's position in the households. Even today such farming is still conducted mostly by women but it is interesting increasing numbers of men.

44 Income does not include the value of the household's assets. The value of livestock and other animals is not

therefore included in this measurement, unlike the value of the derivative products, such as eggs and milk.



▪ Furthermore, it is important to point out that the consumption of eggs and chicken has significantly increased in the beneficiary households, which is having a positive effect on the nutritional value of the food of all family members, especially children.

▪ The trade in chickens/hens also has several benefits because they can be sold locally and at any time of the year. This provides an easy way of obtaining cash to buy consumer goods, pay for healthcare and school fees. Cash, generated predominantly by women (which leads to greater financial independence for women.

▪ It should be noted that these last two effects can also be seen in non-beneficiary households that farm chickens.

▪ Analyses indicate that families who have access to credit have significantly more livestock (measured in tropical livestock units). This effect can be attributed to the intervention, with a good degree of confidence.

▪ The intervention had indirect effects on the hygiene in houses45 and on good practices in terms of animal care (the benefits of vaccinating chickens encouraging the vaccination of cows, the search for more profitable hybrid breeds, etc.). Furthermore, the farmers who succeeded in the beneficiary areas seem to be used as an example by farmers in neighbouring areas. A 'spillover' effect is therefore undoubtedly possible and can be attributed to the intervention.

▪ The project also had the effect of encouraging farmers to increase the areas of their land cultivated with sunflowers. This has led to increased income but only for small business owners (>500 kg of sunflowers). They also process more seeds, which allows easier storage and access to markets at better prices.

▪ According to the analyses, the chickens/hens intervention generally had more positive effects than that relating to sunflowers. TRIAS had already observed this and this undoubtedly goes some way to explaining why subsequent programmes focus on the chickens/hens value chain.


▪ The intervention was proposed in a third of villages in the beneficiary area and was followed directly by more or less 10% of the population. This does not mean that 10% of the population experienced a significant increase in income due to this project, since there were dropouts.

▪ However, there were also effects on the non-beneficiary population (farmers who were very poor and are today successful serve provide an example and hope for the poorest). This 'spillover' effect was mentioned several times during the field interviews. An accurate calculation of the percentage of the population impacted by this project is not therefore possible.

▪ It is impossible to assess whether the intervention has had effects in the agricultural sector. It is undoubtedly too early to say whether a trust relationship between TRIAS and the chamber of commerce and the local government of Arusha will one day lead to more favourable policies for small farmers. However, it is interesting to point out that the local governments of Babati and Monduli were not really aware of and/or interested in the intervention.


▪ Working closely with MBOs increased the possibility of a 'spillover' effect. Indeed, the MBOs existed before the intervention and will continue after it. They also have a certain legitimacy with a large population and thus promote the exchange of good practices.

▪ Good practices for farming and growing sunflowers can be learned and transferred by the beneficiary population which suggests that the effects will be long lasting.

45 The fact of having learned to build chicken coops means that chickens no longer share the everyday life of

the members of the household. This has also reduced the theft of chickens.



This is undoubtedly less true for the 'entrepreneur' aspect of the intervention, it is not certain that there will be many farmers who will grow to the size of small companies whether in chicken farming or growing sunflowers. There are still many obstacles to accessing credit and other cultural and market constraints.


Since chicken farming does not interfere with livestock farming, it is an ideal activity for women who then gain a certain financial independence. Evidence suggests that the intervention has had an important effect on women: their position within their home has been strengthened due to the fact that they have access to significant income of their own. The situation for children has also improved because of more abundant food which is of better quality. Women also report that they are now more able to pay for their children's school fees and healthcare.

The intervention has had a low environmental impact given that, unlike livestock, chickens need little water and food. Chicken droppings can also be used as fertilizer.

EQ7 - Added value


▪ TRIAS is very active in discussing the assessment of the effects of its interventions. Its members already had a good understanding of the effects of this intervention, through their presence in the field, their monitoring and evaluation system and the study commissioned from a local consultant.

▪ However, the re-analysis of the data interested them in several respects, together with the counterfactual approach regarding the attribution of the effects.

▪ This study concentrated on the importance of distinguishing between income, the value of assets and consumption. In order to learn about income, it is important not to include the value of assets. This partly explains why this study found a smaller increase in income than that found by TRIAS' consultant (23.7% versus 42%).

▪ In addition, this study emphasises that income and value of assets is only an intermediate step in achieving a growth in consumption and people's well-being. It is therefore interesting that TRIAS is able to capture (and measure) the effects of its interventions in terms of both financial and physical assets and in terms of the consumption, nutrition and well-being of the members of rural families (men, women and children).


▪ TRIAS and in particular, TRIAS Tanzania through its local coordinator, is researching rigorous methods to assess the effects of its programmes at various levels (partners and final beneficiaries). This study strengthens TRIAS Tanzania's view of the importance of the quantitative aspect of an evaluation. But this view is not necessarily shared by the entire institution which instead advocates a qualitative approach based on storytelling and using quantitative approaches for just a few well-defined cases. TRIAS is developing an approach with ‘GDP growth’ and ‘PP Index’ (Progress out of Poverty index). Although interesting, both these



approaches somewhat neglect the changes in the less economic components of well-being, which are often less easy to measure and overlook the counterfactual approach.

▪ The Tanzania-based team asked many questions about the rigorous approach using mixed methods with a counterfactual that it could develop in its new programme. The discussions were interesting, comparing scientific rigour and reality in the field.

▪ Using the NGO's data probably enabled them to learn lessons about another way to use the data that it has. It also gave them ideas about other indicators that could be collected (consumption, nutrition, etc.). Secondary public data is also interesting and is worth considering more systematically.

▪ Although in favour of more rigour, TRIAS remains sceptical about the additional workload that such an approach would involve. The team raised the issue of the cost/benefit ratio of such a methodological approach. It is convinced of the benefit of this approach, but only when applied to a minority of projects.

▪ TRIAS insists that NGOs are not research centres and so techniques need to be developed that respond to evaluation requirements without being scientifically rigorous. Although TRIAS finds the counterfactual approach interesting, the NGO is not yet convinced by this approach.

5.4 The right to health in the Philippines (TWHA)


▪ The intervention's aim was to raise awareness among the Filipino people of their rights in terms of health care.

▪ The intervention period evaluated in this study covers two DGD funding periods, 2008-2010 and 2011-2013, and several islands and regions of the Philippines (some areas were beneficiaries during one period, others during both periods) - the project is still ongoing in some areas.

▪ M3M provides direct support to four local partners (Gabriela, Advocates, CHD, IBON), the first three train and support health teams and committees, through their local branches on the various islands. These volunteer health workers follow training courses to provide basic care to the population and help them to mobilise and fight for their rights, particularly those related to health.

▪ The objective is therefore to raise awareness and mobilise people so that they are better organised to claim their rights in terms of health. The objective is also to improve access to basic health care (and therefore indirectly improve peoples' state of health).


▪ From the 4 selected interventions, this is the intervention that most seeks to influence public policies (policy influence) by mobilising society. The expected effects are awareness and to change the attitude and behaviour of a large part of the population spread over a vast territory. Under these conditions, the effects are difficult to assess with a small-scale household survey (several hundred observations). An insignificant result could indicate an effect that cannot be detected with the number of observations considered, or no effect. This does not necessarily mean that the effect is not actually happening, because the causal sequence of such an intervention is complex and is certainly not linear. Changes in mentality and behaviour can take many years or even a generation to take place.

▪ It is also difficult to demonstrate changes in the population's state of health without baseline surveys. In addition, data that uses peoples' memories ("recall data") is rarely very reliable in this field.



Group discussion with healthcare workers trained by the project

▪ Since the team failed to obtain reliable and usable databases for this study (through IBON), given the complex and intangible nature of the effects of this intervention and the budget constraints of this study (maximum two studies with primary data collection, see previously), it was proposed that a purely qualitative rigorous methodological approach with counterfactual was implemented46. The objective here was to gain the most precise picture possible about the situation of the populations if the intervention had not taken place, through interviews with key stakeholders in the sector, local experts, etc.

▪ The methodological approach comprised: - Interviews with the M3M office in Belgium and in the field, as well as with

representatives of all the partners in Manilla. - Limited interviews with local authorities. In fact, only one mayor was

available for questioning. As for the national authorities responsible for health, it was impossible to meet them. Since they were unaware of the intervention and were requesting an arrangement that would lead to costs to meet with them, the team decided to give up.

- A choice that was as random as possible of the 3 areas to be visited for a field mission:

- According to the intervention period, one area supported in 2008-2010, one in 2011-2013 and one over both periods.

- Two mainly rural areas, one urban area. - One area that had suffered a relatively large typhoon since the

intervention. - In each area, the random choice of two 'Barangay' (municipalities). - In each of the beneficiary 'Barangay', the organisation of Focus Groups

(FGD) with health workers and representatives of the population selected as randomly as possible. Where possible, the team met local authorities and visited public health centres and hospitals.

- In non-beneficiary 'Barangay' (counterfactual approach), the organisation of FGD similar to those organised in the beneficiary 'Barangay'


EQ1 - Relevance

The intervention is relevant in the context of the Philippines where the health care service, weak and inadequate, advances slowly and is highly unequal.

Supporting people through local partners and giving them the capacities to demand their rights to access health is certainly a more sustainable and more effective action that directly financing the ministry of health. This method of implementation can be used to train motivated volunteer health workers.

46 This counterfactual approach, also called a counterfactual with shadow controls is the least rigorous

scientifically, but even so it provides an argument for or against the attribution of the effects to the intervention.



However, the creation of health facilities alongside state structures that already exist, with few or no links between them, does not seem to be the most relevant decision for achieving the required sustainable effects.



The M&E system is very well defined and structured. It provides for clear reporting from partners and relatively regular field visits. The partners themselves advise about the effects at partner level (self-evaluation). The effects on the final beneficiaries are assessed using the Most Significant Change tool.

This tool, although interesting for understanding some of people's concerns and/or potentially identifying certain underlying mechanisms behind the changes, is not rigorous enough to decide on an intervention's effects. The findings are anecdotal, collected by members involved in the intervention and related by respondents chosen on a voluntary basis. In addition, it cannot be used to present the causal relationships.

Furthermore, although the M&E system is well-established, the indicators present problems in terms of both the partners and the final beneficiaries. Many of the indicators are ambiguous and cannot be objectively verified. Indeed, these indicators seem to be measured by the partners themselves without any real monitoring from the NGO.


▪ The rigour of the qualitative approach with counterfactual is only partially convincing on two points (random choice and relevance of the chosen counterfactual).

- Random selection was not easy in relation to the 'Barangay' or the individuals invited to the group discussions.

- The non-beneficiary 'Barangay' were not always perfectly comparable to the beneficiary 'Barangay'.

▪ The geographic coverage of the field visits was limited in comparison to the geographic coverage of the intervention. However, the random selection of areas is an interesting argument for addressing this bias, at least in part. Indeed, why think that the effects are more or less important elsewhere?

▪ Comparisons between rural and urban areas or between interventions of different durations (over one or two funding periods) were not exploited, in the absence of any convincing evidence to be put forward.

▪ The team underestimated the time needed to implement a rigorous qualitative approach (random selection with counterfactual) in areas as spread out as they are in the Philippines, for an intervention that wanted to have an effect on changing attitudes and behaviours and moreover, working with several partners. These partners, undoubtedly lacking an understanding of the methodological approach, were perhaps also suspicious of us.

▪ Despite these difficulties related to the relevance of the counterfactual, the geographic coverage, the random nature of the sample and the partners' mistrust, it was possible to implement the proposed methodology, which allowed some interesting information to be collected.


▪ M3M's partners (direct beneficiaries) greatly appreciate the different training courses received under the intervention. However, this study is unable to measure with precision the progress made in terms of management and other types of capacities. Although these organisations train huge numbers of people, the team did not really see any proof of greater technical or financial independence. M3M has



worked with (most of) these partners for many years. They particularly appreciate the networking that M3M provides.

▪ For the health professionals (intermediate beneficiaries): - A relatively large number of people were trained in primary health care as

part of this intervention. They seem to be more aware of their rights in terms of health, and therefore better able to undertake initiatives and conduct actions to defend their rights by participating in public demonstrations (observed particularly among the volunteers trained by Gabriela and CHD in the areas visited for this study).

- Progress has been noted in terms of organisation, establishing synergies and networking with the health facilities set up in Palawan (where we saw an intervention conducted by Advocates).

▪ For the final beneficiaries: - Due to the large number of volunteers trained in primary health care, the

population has better access to such care in the beneficiary areas. - It was also observed that those people trained by the programme have a

better knowledge of preventative health care than those living in the non-beneficiary 'Barangay'. They also have a better capacity to organise themselves and defend their rights than people living in the non-beneficiary areas. Women are a priority target of this intervention. These results would be at least partially due to the intervention.

- However, the study was unable to highlight, either during the discussions with the beneficiary groups or the discussions with non-beneficiary groups, a significant improvement in people's state of health, nor a change in terms of attitude with regards women who are victims of violence.


▪ A large number of health professionals were trained and new health committees set up. This makes a difference in terms of offering basic health care in the beneficiary 'Barangay'. However, the number of these volunteers and committees that do not last or who stop because of their advancing years must not be overlooked. Thus the effect is somewhat put into perspective given the volunteer nature of the activity in a very vulnerable population.

▪ Despite the observation that the populations in these beneficiary areas are better organised to defend their rights to health, the potential influence at policy level is marginal and even non-existent at the moment. Indeed, their actions are quite far removed from any decision-making structure in the field of health. The intervention creates health facilities alongside the state health system (with few synergies) and does not act at regional or national government level. No reform in this field, even locally, can therefore be directly related to their actions.

▪ However, this does not mean that this intervention is without purpose. In fact, raising awareness, changing attitudes and grassroots movements take time to establish. It is through continuous action with the population that there is a chance that one day this will change. Nevertheless, it would be interesting to consider implementing actions with the decision-making bodies for health in order to act on several levels and create greater synergy.


The capacity building of health volunteers trained through the programme is a learned and sustainable element. However, such training is only possible due to external funding, which raises issues in terms of organising new training courses to replace volunteer staff. Given the lack of contact with state health structures, it is unlikely that these structures will provide this training.




The intervention played an important role in women's participation. In fact, most of the trained health workers are women. In addition to knowledge in the field of health, they have been able to acquire good organisational and mobilisation skills (petitions, demonstrations, etc.) normally reserved for men. Furthermore, since women are traditionally responsible for the health of their family, this certainly strengthens the effects in terms of improving health for their families and friends.

EQ7 - Added value


▪ M3M's expectation was to learn how to objectify the effects of an "empowerment" type intervention. The findings are interesting because they indicate a lack of precision and objectivity in the indicators chosen to assess the intervention's effects, but the study only gives a few areas for improvement (this was not the objective of this study).

▪ The objective is to change Filipino society in terms of access to health care for the most disadvantaged groups in the population. M3M and its partners work with these populations but strangely, in parallel to existing state structures. This raises a question about sustainability if external funding stops.

▪ Even if the effects of this type of intervention are difficult to quantify, it does not mean that it has no purpose.


▪ The discussion groups (FGD) were used to collect many accounts from both beneficiaries and non-beneficiaries of the intervention. These respondents were chosen as randomly as possible and listened to by individuals external to the programme. This provided a slightly more considered view of the intervention's effects. M3M recognised the possible biases in the positive findings of the changes revealed by the Most Significant Change tool as currently implemented. The NGO concluded that this is a learning tool, rather than an evaluation one.

▪ The way in which the team reconstructed the intervention's logical framework was also interesting for M3M. The reconstruction of this logical framework highlighted several problems with the indicators (not relevant and objectively verifiable enough). This step therefore enabled them to realise the importance of providing a better explanation of (i) the causal sequence of their intervention; and (ii) the expected impact.

▪ M3M considers that the methodological approach was implemented on too small a scale to be convincing. And although the counterfactual approach is interesting, M3M finds it biased for this reason.

- We admit that the geographical coverage is small and that the counterfactual is not perfect. Nevertheless, by chance, the chosen areas have not proved to be very different from the other intervention areas. The findings would probably not have been very different if the decision had been made to visit other areas. However, the findings would have undoubtedly been richer and more easy to generalise with a larger sample.

- Furthermore, even if the counterfactual is not perfect, it can be used to qualify certain findings.

▪ A rigorous qualitative approach with a better prepared counterfactual in the field would certainly have been more convincing. It should also be noted that only a large survey of households could objectify the effects of such an intervention. This type of approach is outside the framework of this evaluation and enters the realms of academic research.

▪ M3M also raises questions as to the sufficiently ex post nature of the evaluation for this type of intervention ("empowerment"). It is not easy to judge the sufficiently



ex post nature of the evaluation. But it is clear that the effects of this type of intervention are not linear. The theory of change is undoubtedly a better tool than the logical framework for accounting for the complexity of this intervention and better explaining the underlying hypotheses behind the expected results.



6. Responses to the formative evaluation questions

6.1. EQ1 - Definitions of impact

Box 3: Summary of the responses to the first formative EQ - The debate around the definitions of impact

EQ1 - Definition

There is a semantic debate surrounding the definition of the term 'impact'. The team proposed a definition (see section 3.1) in order to define the scope and subject of the evaluation. However, at the end of the exercise, it is interesting to summarise how the participating NGOs define the impact of their intervention and consider the following two questions:

▪ Do the participating NGOs agree that there is a debate around the definition of impact? What do the NGOs think of the definition proposed as part of this study?

▪ How does the lack of a clear definition of impact influence the evaluation of development actions?

Key findings There are several definitions of the impact of an intervention in the literature. The Belgian legal framework does not solve this semantic debate as it does not provide a clear definition of the impact of development cooperation interventions in Belgium.

Thus each stakeholders, including the NGOs, use their own definition of impact. Several aspects of this level of results are mentioned: sustainable effects, long-term effects, effects on the final beneficiaries, 'spillover' effects and effects on a sector.

As a reminder, this study proposes a clear distinction between outcomes and impact: - Outcomes relate to all the effects of an intervention on the beneficiaries (from partners to final beneficiaries); - Impact concerns an intervention's effects at a more general level (proportion of the population affected in a certain geographic area, influence at sectoral level, recognition/legitimacy of the NGO and its partners, consequences at programme level).

Although deemed interesting by development stakeholders in Belgium, there is no consensus about this definition. They do however perceive two advantages in it: (i) the definition of outcomes encourages them to be more explicit about the expected effects on the final beneficiaries and the causal relationships with the development action; and (ii) the definition of impact highlights the more general effects, not necessarily sought by the NGOs, but which are important in terms of the longer-term view of the intervention's effects.



On the one hand, there is a group of development stakeholders who do not have a shared vision about the definition of the impact of their actions. On the other hand, these same stakeholders do not feel concerned about evaluating this level of results. They consider that this goes back to their partners, responsible for implementing their actions and that there are multiple factors, external to the intervention, influencing this level of result. Furthermore, there are no (or few) incentives from donors to account for this level of results. Hence, they do not systematically evaluate the effects of their actions for the populations (final beneficiaries) and more generally. Yet, this is precisely where the ultimate objective of development assistance lies: the improvement of people's well-being.

We believe that these two observations are partly related. The lack of a shared vision about the definition of impact and therefore about what changes are pursued leads to confusion as to the evaluation's purpose and the methods to be used to account for aid effectiveness. In other words, a shared vision of the definitions of the different levels of results, outcome and impact in particular, is essential for defining the requirements in terms of results-based management and consequently in terms of accountability and aid effectiveness. A precise and consistent definition of the results to be evaluated would enable stakeholders (NGOs) to formulate more tangible and measurable objectives relating to the beneficiaries and at a general level. This would also encourage the development of relevant methodological approaches to evaluate them.

The response to this first formative question begins by setting out (i) the different definitions of impact. Secondly, it (ii) presents the various understandings of impact of the participating NGOs. Then, it (iii) recounts the opinion of the participating NGOs about the definition proposed as part of this study. And finally, it (iv) presents how the absence of a common definition of impact is a problem when evaluating development actions.

It is interesting to note that many development cooperation and evaluation practitioners use the term 'impact' without giving it the same meaning. This debate began during the impact evaluation of 4 Belgian development cooperation projects (2013). This semantic question arises again and below, we cover the main elements of the literature and the content of the discussions that took place with the NGO sector at the beginning and end of this study.

Notions from the literature

In the synthesis report of the ex post evaluation of 4 governmental cooperation projects (2013, p.22), the authors present definitions for the concepts of the logical framework according to several stakeholders (the BTC, the SEO, the DAC and the academic world) and gives an account of the issue concerning bilateral cooperation projects. Three notions arise from these definitions:

1) The notion of the temporality of the effects. The DAC defines impact as the long-term effects, while other stakeholders do not refer to this temporality.

2) The notion of the level of results ("output-outcome-impact"). According to the 'logical framework' approach, impact represents the overall objective (for a population, a region, a country, a sector), while outcomes correspond to the specific objective, the effects of using the outputs.

3) The notion of the intervention's contribution to the changes or the attribution of the changes to the intervention. The notion of contribution is present in the definition of impact, while the notion of attribution appears in the definition of outcomes - "effects resulting from the use of the outputs". According to impact



evaluation practitioners in the academic world, impact evaluation means measuring the effects on the direct beneficiaries that can be attributed to the intervention.

The recent article entitled "Realist Impact Evaluation" (September 2014)47 gives the following definitions for the terms "outcome" and "impact":

Outcome is defined as short, medium and long term changes for different beneficiaries (government, organisations, institutions, workers, population, etc.), intended and unintended, resulting from an intervention;

Impact is defined as short, medium or long term changes for the final beneficiaries who are the local populations, intended and unintended, resulting from an intervention.

This article highlights the notion of different types of beneficiaries, while retaining the notion of attribution.

1) The notion of different types of beneficiaries, according to this article, impact must be measured in terms of changes in the lives of the local populations regarded as the final beneficiaries.

2) The notion of attribution of the effects is present because the authors talk about changes for the beneficiaries resulting from the intervention.

This article is interesting insofar as it addresses the notion of an intervention's effects (outcomes and impact) in a setting that reflects the reality for non-governmental stakeholders who target different types of beneficiaries. However, it ignores the existence of the intervention's possible effects at a more general level, an effect at sector, regional or even country level. Although these effects are not necessarily explicitly sought by the NGOs.

This article also focuses on two important objectives of impact evaluation:

(i) Highlighting the mechanisms of change in the context of the intervention; (ii) Demonstrating the causal relationships between the changes observed and the

intervention (attribution).

In short, an impact evaluation should respond to "what works for whom, in what context and how" ("Realist Impact Evaluation", 2014, p.4).

Notions from discussions with the NGO sector

It is not surprising to note that the confusion about the interpretation of the term 'impact' existing in bilateral cooperation is found in the NGO sector. Indeed, from the documents supplied by the NGOs involved in this study and the interviews with these 12 NGOs, it emerges that the term 'impact' does not mean the same thing to all these stakeholders.

Before categorising the definitions proposed by the NGOs, we must present three elements reflecting the complexity of the context.

The institutional set-up is the first element that influences the definition of the term 'impact'. For NGOs with a 'project' approach, impact theoretically corresponds to the overall objective of their intervention. However, for those with a programme approach, the impact of an intervention X in a country Y corresponds to one element of a specific objective. There is another scenario when an NGO receives funding from various donors for the same intervention. Consequently, this means that the same effect of an intervention may be classified under the term 'impact' in the logical framework for one

47 This article was written by Gill Westhorp and is published by Method Lab, “an action-learning collaboration

between ODI (Overseas Development Studies) and the Australian DFAT (Department of Foreign Affairs and Trade)”: www.ODI.org/methodlab



donor and under the term 'outcome' in that of another donor (often the case for interventions that have DGD and European funding).

The lack of incentives to evaluate the impact or institutional 'belief' whereby impact is a level of effect, the measurement of which falls outside the scope of the NGOs' evaluations is the second notable fact arising from discussions with the NGOs. Since they do not have the resources and/or reasons to evaluate the impact of their interventions, the NGOs show the effects at the 'impact' level of result, which pleases the donors, aware that this is sometimes exaggerated, often vague and cannot be measured. The finding is that they do not have the incentives to define the impact precisely, with SMART indicators adapted to the different types of targeted beneficiaries. This was also one of the observations of the Meta-evaluation of the Programmes of Non-governmental Stakeholders (2013).

The difference between the funding cycle and the intervention cycle may have consequences for the way in which NGOs define impact. One NGO may have a definition of the impact that it wants to achieve after its presence in the field but may have a different definition for the impact as specified in its funding dossier (often much shorter).

To the question: "evaluating the impact of intervention X that you have proposed as part of this study, what does that mean for you exactly?" the NGO representatives often answered "mmm, good question!". Based on the answers received, four definition categories for impact emerge.

1. Impact corresponds to the sustainable effects of the intervention for different types of beneficiaries.48

▪ Some NGOs specify that impact must be assessed at the level of the sustainable effects for the capacity building of their partner(s) (direct beneficiary), without necessarily concerning themselves with the effects on the living conditions of the populations.

▪ Others state that it is, primarily, the sustainable effects on the living conditions and/or change in mentality of the final beneficiaries (the population) and/or the intermediate beneficiaries49 (the service structures used by the final beneficiaries) that must be assessed, without necessarily paying attention to the effects on the partner.

▪ A minority consider impact as being the sustainable effects of their intervention at both these beneficiary levels (partners and population).

2. Some declare that impact corresponds to the effects on the final beneficiaries. They say that these effects can be expected or unexpected, directly or indirectly related to the intervention, but must be tangible and have meaning for the final beneficiaries. They can therefore sometimes be different from those specified in the initial logical framework for the project.

3. Others focus on the existence of external factors related to the intervention's effects. Some call this 'the spillover effect', i.e. to what extent the intervention's effects also have repercussions on the project's non-beneficiary populations and/or inspire certain stakeholders in society. For some, this 'spillover effect' must therefore be considered when analysing the impact.

4. Impact corresponds to the overall change observed in a sector in which the NGO is one stakeholder among many. Some suggest that in addition to assessing the changes in the sector, the impact evaluation should consider the NGOs

48 Several NGOs admit that for them, the notion of impact is intimately related to the ex post notion; for some

NGOs "to assess impact is to assess the sustainability of the outcomes". 49 For example, farmers' groups, microcredit institutions, water management committees, etc.



contribution to the changes observed, since they are rarely the only stakeholders in a sector.

Five notions therefore arise from these discussions with the NGOs:

1) The notion of temporality of the effects. 2) The notion of different levels of beneficiaries; 3) The notion of 'spillover' effects; 4) The notion of effect at sectoral level; and 5) The notion of contribution.

It should however be noted that the notion of measurement is not clear at the level of impact, and even less so the notion of the attribution of the effects to the intervention. Several stakeholders admit that they lack confidence in the data available and therefore the accuracy of any measurement. As for demonstrating attribution, few people met were aware of the existing techniques for this type of analysis.

Similarly, NGOs seem to have low expectations considering the fact that an impact evaluation is also designed to rigorously identify the factors and mechanisms promoting or inhibiting an intervention's effects on the final beneficiaries. One element that may explain this finding is that the NGOs, working in the field and/or closely with their partners, understand the context and mechanisms at work within the framework of their interventions.

Nevertheless, a rigorous impact evaluation provides additional precision to the NGOs knowledge acquired through their presence on site. It may quantify the explanatory power of certain factors and mechanisms promoting or inhibiting the effects of an intervention.

Box 4: The learning process central to the evaluation

Regardless of an evaluation's objective, the notion of learning is central for the NGOs.

In this impact evaluation, which concentrates on an intervention's effects on the final beneficiaries and its effects at a more general level (proportion of beneficiaries in the intervention area, influence at sector level, strengthening of the legitimacy of the NGO and its partners, etc.), the notion of learning focuses on (i) measuring the effects and mechanisms that have or have not led to the expected/unexpected effects, and (ii) the rigorous methodologies to be developed to achieve this.

This type of evaluation can therefore be used to measure and/or assess an intervention's effects with greater precision, while highlighting the process of change. It therefore allows the questions "What works? Why? How? to be answered.

Do the participating NGOs agree that there is a debate around the definition of impact? What do the NGOs think of the definition proposed as part of this study?

Generally, the participating NGOs admit that there is a semantic issue surrounding the term 'impact' although this does not cause them any practical problems when formulating their interventions. Indeed, the expected impact is often formulated in a relatively vague but attractive way, in line with the type of intervention implemented.

The proposed definition (see section 3.1), which clearly distinguishes between the effects on all types of beneficiaries (outcomes) and the effects at a more general level (impact)



in the intervention logic, corresponds somewhat with the vision of the theory of change50. Indeed, the logical framework, reconstructed ex post using these definitions, highlights the causal sequence between the different stages of the process of change and includes the hypotheses and elements of the context which have positively or negatively influenced the achievement of the intervention's results.

Although the majority of participating NGOs find the proposed definition interesting, they do not see any particular benefit in using it. Nevertheless, two observations clearly emerge from the discussions with the NGO sector:

1. With regards the proposed definition of the outcomes (all the intervention's effects on all types of beneficiaries), they recognise that this type of definition could help to present the causal sequence of their actions more explicitly. Indeed, they do not currently focus sufficiently on the explicit links between the expected effects for the partners and those expected for the intermediate and final beneficiaries. This is also true for the causal relationships between the expected effects for the same type of beneficiary. This finding is undoubtedly related to the current reality of the type of monitoring and evaluation that the NGOs can set up. Greater attention is paid to the effects on the partners. The assessment of the effects on the final beneficiaries is less rigorous and is often done through information sent by their partners, with different degrees of precision and reliability. Although the NGOs agree that the ultimate objective of interventions is to have a positive effect on people's well-being, relatively few resources are dedicated to measuring these effects. These findings must be linked to the legal framework which strongly emphasises the notion of partnership and the capacity building of local partners, without explicitly mentioning the final beneficiaries.

2. In terms of the proposed definition of impact (overall effect of the intervention at geographic, sectoral and NGO programme level), all the participating NGOs indicate that the overall effect, as defined, is not or is rarely explicitly sought by their interventions. According to them, the interventions are implemented on too small a scale or over too short a period to have this type of influence (at least over a funding cycle). Some recognise that it would however be interesting to consider this type of effect, while others do not see its benefit.

How does the lack of a clear definition of impact influence the evaluation of development actions?

The notion of definition is pivotal to any evaluation. In order to produce a quality evaluation, firstly its objectives must be defined and then the evaluation's purpose must be determined in line with those objectives. The impact evaluation of development actions is used to account for aid effectiveness, and aid effectiveness is judged by evaluating the effects of the interventions on the beneficiary populations as stated in an OECD declaration, "...the true test of aid effectiveness is improvement in people's lives." (2006 Survey on Monitoring the Paris Declaration: Overview of the Results, OECD, 2007).

However, in practice, both the objective of an impact evaluation and the definition of impact are subject to different interpretations depending on the development cooperation stakeholders. On the one hand, we have the Belgian development stakeholders51 without a shared vision of the definition of impact. Impact is also often an ambitious objective, but relatively vague and difficult to measure.52 On the other hand, these same

50 http://www.theoryofchange.org 51 This is not unique to the NGO sector.

52 In terms of a programme's impact, it is even more ambitious and abstract than the impact of a component of the programme, making any rigorous evaluation almost infeasible.



stakeholders only feel slightly concerned about assessing and measuring impact. Evaluating impact is not therefore common practice in the development assistance landscape in Belgium.

The participating NGOs raised several factors that could explain this finding. - NGOs work closely with partners, responsible for implementing the

interventions. The NGOs therefore feel responsible for the actions that they conduct for their partners, but relatively less responsible for the actions that their partners conduct for the final beneficiaries.

- They believe that many parameters that are external to the evaluation influence this level of result.

- They find that they do not have the means (resources and expertise) to evaluate this level of result.

- There are no incentives for the NGOs to evaluate impact. They have very little guidance in terms of evaluation from the DGD. This was highlighted in the Meta-evaluation of the Programmes of Non-governmental Stakeholders (2013). "Great freedom is given to the NGOs in terms of evaluation with practically no involvement of the DGD. Indeed, the DGD's regulatory framework leaves it to the NGO to define its own approach for evaluating its interventions. The DGD's regulatory framework does not therefore provide any special guidance regarding the types of reports expected, the methodologies to follow, whether there is a need to report on the achievement of its expected results, the involvement of the DGD, sending reports, etc. It implicitly refers to external evaluation."

Further to this study, we believe that accounting for aid effectiveness at country level is only feasible if the stakeholders share a common vision of the definition of the different levels of results. This common vision is vital for then defining the requirements in terms of evaluating these different levels of results. A consensus on the definition of the results to be evaluated would encourage the NGOs to formulate more tangible and specific objectives as far as the final beneficiaries and more generally. Evaluation methods could then be developed to satisfy the requirements in terms of evaluating these results.

Indeed, although it is important to evaluate the results at partner level, it is just as important and even more important in terms of aid effectiveness to assess whether the intervention has been able to improve the lives of the targeted populations. In order to account for aid effectiveness, the implicit hypothesis must be verified, which means believing that if the partner has been well chosen and that the intervention was relevant and went as planned, the effects on the final beneficiaries will be achieved.



6.2. EQ2 - Rigorous mixed methods

Box 5: Summary of the response to the second formative EQ – Rigorous mixed methods

EQ1 - Rigorous mixed methods

How feasible, relevant and affordable is it to apply rigorous mixed methods for assessing the effects (outcomes and impact) of the interventions of Belgian NGOs?

▪ What are the NGOs' practices for evaluating the effects of their interventions?

▪ Is it feasible to apply rigorous mixed methods? ▪ What is the added value of evaluating an intervention's

effects with rigorous mixed methods? Are these methods relevant for the different types of intervention supported by the NGOs?

▪ What is the cost/benefit ratio for these rigorous evaluation methods?

Key findings Before developing this question, it is important to state that the use of the

word rigorous is not associated with a value judgement. It refers to its use

in the field of scientific research, whereby a rigorous approach means an

analysis framework which can be used to formulate credible findings and

conclusions.

The NGOs selected for this study apply qualitative and quantitative approaches in their M&E system by combining them to various degrees. The mixed methodological approach is therefore a reality for these NGOs. However, the application of these tools often lacks the rigour to be able to measure, with some degree of precision, the results that can be attributed to the interventions and explain their underlying mechanisms.

As much as the tool (qualitative or quantitative), it is the level of rigour in the system for collecting and analysing the information that influences the level of reliability and relevance of the findings.

The mixed methodological approach implemented for the ex post evaluation of the 4 selected interventions is rigorous in several respects.

(1) It defines the concept of outcomes and impact and focuses on evaluating the effects on the final beneficiaries and more generally.

(2) It reconstructs, ex post, the underlying theory of change behind the logical framework. The causal sequence is therefore highlighted from the inputs to the effects on the final beneficiaries, including the effects at partner level. The context is also considered. Relevant indicators are defined at the different levels of results and for all the intervention's beneficiaries.

(3) Within the time and budget constraints of this study, it develops a mixed approach combining complementary qualitative and quantitative methods. In order to ensure the reliability of the data collected, these approaches were rigorously implemented. The quality of the system for collecting and analysing data was guaranteed by a counterfactual approach, through the random selection of respondents, the development of collection tools tested



and implemented by a competent and motivated field team as well as through the necessary quantitative expertise.

The implementation of the rigorous mixed approaches considered within the framework of the 4 case studies did not raise any major problem. Although it was not always easy to identify a credible ex post counterfactual and randomly select the respondents, this was possible.

The selected NGOs identify several benefits to applying rigorous mixed methods (see EQ7 in Chapter 5). This has helped to demonstrate the importance of evaluating the results for the final beneficiaries (and not solely at partner level) and, more generally, the usefulness of clarifying the causal sequence between the different levels of results and between the types of beneficiaries; to provide precision when measuring certain effects; to demonstrate the attribution (or not) of the effects; to highlight certain unexpected effects, whether at partner level or on the final beneficiaries; and to provide a better understanding of the underlying mechanisms behind the changes. These benefits were identified even in sub-optimal situations for a rigorous ex post evaluation, namely: an imperfect counterfactual, baseline data using recall data, a relatively small sample size, etc.

Therefore, this rigorous mixed methodological approach seems relevant for the selected interventions. Nonetheless, for interventions that aim to bring about changes in policy by developing the capacities of civil society ('policy influence' type interventions), it is certainly still relevant if adapting the evaluation's temporal and spatial dimensions to take account of the specific features of this type of intervention (non-linear effects, the observability of the effects, etc.)

Although feasible and relevant the issue of the cost/benefit ratio of such a methodological approach is often raised by the participating NGOs.

This ex post evaluation certainly mobilised considerable resources, probably more than if the methodological approach had been considered and implemented from the start of the intervention. Given the ex post nature of the evaluation, some shortcomings in the existing monitoring and evaluation systems had to be overcome in order to be able to measure the effects that could be attributed to the intervention on the final beneficiaries, and understand their underlying mechanisms.

A more rigorous M&E system would increase costs but probably not significantly. This could lead to a positive cost/benefit ratio because any additional costs due to a more rigorous approach in the M&E system would probably be covered by the benefits generated due to improved aid effectiveness. This is true if there are incentives to develop a rigorous and joint evaluation framework for the results on the final populations. The resulting evidence would promote the accumulation of knowledge on the processes of change and therefore allow improved aid effectiveness.

After this study, the rigorous mixed methodological approach seems affordable. However, evaluations using a rigorous mixed approach are much more profitable if they are designed at the time of implementation, if all stakeholders take



ownership of the concepts of rigour and if they are incentivised to conduct a rigorous evaluation of the effects on populations.

This second formative evaluation question (i) summarises the practices of the participating NGOs in terms of monitoring and evaluation, (ii) illustrates the NGOs' perceptions of the added value of the impact evaluation and questions (iii) the feasibility and (iv) the cost/benefit ratio of the use of rigorous mixed methods in evaluating the actions of stakeholders in the Belgian NGO sector.

The objective of this study was to verify how the application of mixed methods would help to improve the evaluation of interventions by Belgian NGOs. However, from this evaluation, it appears that it is not the application of one or another method which seems to raise questions but the level of rigour in the applied methodology. Much more than the qualitative or quantitative tool, it is the rigour in the application of the data collection and processing methods that influences the level of reliability and credibility of an evaluation's findings.

Before going further, it is important to state that the use of the word rigorous is not associated with a value judgement. It refers to its use in the field of scientific research, whereby a rigorous approach means an analysis framework which can be used to formulate credible findings and conclusions.

What are the NGOs' practices for evaluating the effects of their interventions?

The analysis of the documents received from the volunteer NGOs for this study and the interviews with representatives from the 12 NGOs concerned reveal different evaluation cultures within these organisations. It was also noted that they are all constantly discussing ways to improve the monitoring and evaluation practices for their actions.

The 12 NGOs participating in this study have relatively well-developed and effective M&E systems for assessing the outputs of their interventions. At the level of outcomes, this is frequently limited to the expected results at partner level. The results for the final beneficiaries are often monitored and assessed by the partners, with a relatively small involvement from the Belgian NGO. With regard to the impact, few NGOs try to assess this53.

Furthermore, the results indicators whether for specific objectives or the overall objective are rarely well-defined. These are often highly qualitative and little information is given on how they have been collected and measured. This information is important in judging the rigour of the applied methodology and, ultimately, the reliability of the data supplied. Furthermore, the selected indicators are not always relevant for assessing the level of result to which they refer (for example, confusion between output and outcome indicators).

The tools developed by the participating NGOs are mostly qualitative. Two of the selected NGOs use the "Most Significant Change"54 and "Outcome Mapping" 55as evaluation tools for their effects. Although these tools are interesting in terms of providing a better understanding of the context and the possible effects of the intervention, the results generated are not very objective.

Indeed, in the case of the Most Significant Change, the accounts are collected from beneficiaries selected on a voluntary basis and the findings remain relatively anecdotal. As for Outcome Mapping, this mainly affects direct stakeholders (local partner NGO, local

53 The reasons explaining such a finding were given in EQ1. 54 http://www.mande.co.uk/docs/MSCGuide.pdf 55 http://www.outcomemapping.ca/download/OM_English_final.pdf



governments, farming organisations, etc.) and not the final beneficiaries of the development action. Furthermore, the definition of outcome is often still vague and does not allow the expected end results for the beneficiaries to be outlined. Using this tool, large amounts of information are collected without however being analysed.

The other qualitative tools used are group interviews with key stakeholders. The NGOs appreciate the participatory nature of the evaluations and therefore often ask the partners to contribute, although the final beneficiaries are rarely asked to take part in the evaluation process. Once more, participation in these discussion groups is mainly on a voluntary basis.

Fewer quantitative tools are developed by the participating NGOs but they do exist. Large numbers of NGOs have quantitative data about their interventions but this data remains relatively unused in the evaluation reports for various reasons such as a lack of confidence in the data collected, the lack of capacity internally and/or among the consultants recruited to exploit this data, the fact that the data remains at partner level, a lack of financial resources and time to exploit it, etc.

Three of the four NGOs selected for this study have quantitative data from surveys with final beneficiaries that they use to evaluate the effects of their interventions. The evaluation team made three main criticisms about the databases that it could see:

(i) The methodology for collecting this data is not explained clearly enough (how was the questionnaire developed? How were the respondents selected? Who conducted the surveys? Is the methodology the same throughout the intervention? etc.). There is a doubt as to the reliability of the available data.

(ii) There is not always enough data to produce statistics (at least one hundred observations are needed for this to become possible).

(iii) There is also a real concern about the quality of the data about the pre-intervention (baseline) situation. This is not very detailed and does not necessarily correspond with the data (indicators) collected at the end of the intervention. This makes the before/after comparison difficult although not explicitly stated.

Furthermore, some databases are in the local language which makes their analysis relatively cumbersome.56

The participating NGOs also use tools that can be described as quantitative to assess the results of their intervention at the level of the capacity building of their partners. Note in particular the use of the "spider diagram" which provides a graphic representation of several aspects of capacities on the same graph; or even questions where the answers are encoded and weighted to give a relatively objective index.

The participating NGOs evaluate their interventions/programme at the end of the funding cycle, but also commission thematic and strategic external evaluations. The results of these evaluations are used for various purposes including accountability, help in making strategic decisions, building on good practices for intervention methods, project type or methodology. However, few NGOs have already produced an impact evaluation, which they often confuse with an ex post evaluation (see Box 6).

56 The Etude de l’évaluabilité pratique des interventions cofinancées par la coopération belge (2015) makes the

same observations.



Box 6: Impact evaluation and ex post evaluation

An impact evaluation is not necessarily ex post. Conversely, an ex post evaluation is not necessarily an impact evaluation.

An impact evaluation can be ex post and can therefore come several years after the end of an intervention. However, it is also possible to conduct an impact evaluation from the start of an intervention. This means developing the methodology for the impact evaluation with precision even before the intervention is implemented. The advantage of such a practice is that it encourages development stakeholders to come up with an accurate definition of the expected effects at all beneficiary levels (final beneficiaries included) as well as the more general scope of the intervention. It also requires specific indicators to be defined for measuring these effects and the practical collection methods for these indicators to be described.

An ex post evaluation may have a purpose other than assessing the effects of the intervention on the final beneficiaries and more generally. It should nevertheless be noted that an ex post evaluation at the level of the direct beneficiaries (partners) can be a meaningless exercise (conversely, it is always meaningful for the final beneficiaries). Several elements are evidence of this:

- Capacity building may have effects on the institution and the individuals within it. Given that there is a high turnover of staff in these institutions, how can this be assessed several years after the project has ended?

- The partners may no longer work with the NGO in question, but with other donors/associates. They no longer necessarily apply the good practices learned from the Belgian NGO. This does not mean that the effects are no longer there, it may simply mean that they are applying other methods in accordance with the requirements of their new donors/associates.

The NGOs have an interest in ex post evaluations. However, they do not have the means to budget for this type of evaluation in their funding, yet they are interested in finding out whether the effects of their interventions are sustainable over time.

NGOs' preconceptions of the impact evaluation

Although there is a certain interest in discovering the evaluation impact, most of the participating NGOs expressed doubts regarding the added value of this type of evaluation. In particular, these NGOs raised the question of the cost/benefit ratio, namely the relationship between the costs of implementing a rigorous mixed methodology and the benefits that it provides in terms of learning about measuring the results of their interventions and the mechanisms that explain them and therefore the lessons to be learned for improving aid effectiveness.

Another question concerns the feasibility of demonstrating the attribution of the effects to the intervention. NGOs work predominantly in a complex 'multi-stakeholder/multi-actions' environment, where there are synergies between their intervention and those of other stakeholders in the same sector or in related sectors.

The organisations also question whether evaluating the impact of interventions for final beneficiaries is really their responsibility. Indeed, they work with partners who are responsible for implementing the actions in the field and believe that many factors influence this level of result (see section 6.1)



Furthermore, the notion of counterfactual is relatively unknown in the world of NGOs. Once the concept is explained57, several NGOs fear that this approach raises expectations among these non-beneficiary populations. They also make four other comments on this subject:

- They have doubts as to the reliability of questioning a non-beneficiary group;

- They also expressed reservations about finding a credible counterfactual. Indeed, the choice of beneficiaries is often determined by the characteristics of the population that foster the achievement of the results;

- They see an additional workload in this approach; - Finally, some of them queried the need to demonstrate the attribution

scientifically. They feel that their knowledge of the overall context of the intervention area enables them to find arguments to confirm that the observed effects are indeed related to their intervention. They do not therefore understand the added value of using a counterfactual approach.

Finally, they question the relevance of conducting impact evaluations for interventions that focus on a short period and which corresponds to one or two funding cycles. Indeed, the interventions subjected to this exercise had to correspond with DGD funding over the selected period. However, these interventions sometimes began before and/or ended after this period due to further funding or self-funding. It should also be noted that certain NGOs sometimes work in the area and in the same sector, for years. They believe that the impact evaluation's time scale is therefore an important dimension in observing an impact.

The last points discussed above are interesting because they compare the reality of the context in which the NGOs work with the methodological rigour of an impact evaluation.

Is it feasible to apply rigorous mixed methods?

The four case studies conducted as part of this ex post impact evaluation demonstrate that it is feasible to apply more rigour in the methodological approach, whether qualitative and/or quantitative. The methodological approach applied in each case study is explained in the first section of this report (Chapter 5, and more details are also provided in the evaluation reports for each selected intervention, available in the appendices).

A counterfactual approach was implemented in the 4 case studies. The NGOs' negative perceptions of this were addressed. There were no major difficulties in approaching the non-beneficiary population and having discussions with the randomly-chosen respondents. This does not seem to have created expectations.

However, it was not easy to find a credible counterfactual58 in the field. There are several reasons for this finding:

- The evaluation is ex post: it is therefore more difficult to identify this (control) group after the intervention ends than if this is done from the start of an action.

- The NGO representatives in the field and their partners are not familiar with this approach (and some of them are somewhat mistrustful of it). Identification of this control group has not always been properly understood and therefore done well.

- Evaluators do not always have the necessary time in the field to improve the credibility of the counterfactual.

57 A counterfactual is a group of non-beneficiaries relatively similar to the group of beneficiaries. The non-

beneficiaries would have been eligible for the intervention and the two groups would have developed in the same way if the intervention had not taken place.

58 A group of non-beneficiaries similar to the target population before the intervention, with good arguments to believe that the beneficiaries would have developed in the same way as the non-beneficiaries without the intervention.



Generally, if the counterfactual approach had been considered from the start of the intervention in agreement with the entire team (NGO and partners), identifying a credible counterfactual would have created fewer problems.

Even if the counterfactual was not always perfect and the target and control groups were not comparable on all points, the lessons learned from this approach are still interesting. Even an imperfect counterfactual approach can lead to arguments for the attribution of the effects and/or put the attribution of certain effects into context, while providing information about any biases in the comparisons. This is true whether a qualitative (e.g. focus group discussion) or quantitative (e.g. data analysis from household surveys) approach is used. Several examples demonstrate this in the case studies presented in Chapter 5 of this report.

The random selection of respondents was also feasible in all the case studies. Here again, the approach surprised some partners, because they are not accustomed to this type of practice. It is therefore possible that some participants in the group discussions were not chosen truly at random. This practice also requires preparation time so that the field teams can be made aware and convinced of the benefits of random selection.

Random selection is easy when there are lists that can be used to draw names at random and such lists provide contact information for the individuals to be questioned. These lists were available for the beneficiaries of the selected interventions, but not for the non-beneficiaries. Information therefore had to be sought from the local authorities and the NGO's partners.

The quality of the procedure for collecting and processing data was also generally ensured for the 4 case studies. This was possible due to the involvement of experienced evaluators, national experts, competent and motivated teams of enumerators and local NGO teams. Although many resources were used for the preparation field work and the analysis, certain deficiencies in the data collection and processing procedure are worth mentioning. As already stated, the counterfactual approach and the practice of random selection were not always fully observed. Furthermore, the absence of reliable data about the pre-intervention situation meant that the pre-intervention data used was mainly reconstructed using recall data, so there could be reporting biases. Some indicators were not assessed as effectively as possible due to a lack of time and/or experience. This was the case for example, for the income indicator for a producer or the indicator for improved self-esteem. In some studies the contextual analysis could have been better used to explain the process of change (or lack of change).

Further to this study, it is found that adding rigour to the evaluation methodology of an intervention's effects is feasible, although the support of various expertise is required to achieve it. This is especially true if the rigorous methodological approach is developed from the start of an intervention with ownership of the concepts by all stakeholders.

What is the added value of evaluating the effects of an intervention with rigorous mixed methods? Are these methods relevant for the different types of intervention supported by the NGOs?

In Chapter 5 of this report, the added value of this type of evaluation in terms of findings and methodology is discussed for each of the selected interventions (EQ7 in each of the case studies).

Overall, it seems that the use of a rigorous mixed methodology has some added value for the 4 selected NGOs, even in sub-optimal situations caused by an imperfect counterfactual, baseline data using recall data, a relatively small sample, etc. This approach was able to:

- demonstrate the importance of evaluating the results for the final beneficiaries (and not solely for the partners) and more generally;



- show the usefulness of explaining the causal sequence between an intervention's different levels of results;

- provide precision in the measurement of certain effects; - prove/argue in favour of the attribution (or not) of the effects; - highlight certain unexpected effects, whether these were for the partners or

the final beneficiaries; - provide an understanding of the underlying mechanisms behind the

changes.

Thus, this study shows that it is relevant to apply more rigorous mixed methods for the monitoring and evaluation of interventions. However, this type of methodological approach raises questions for interventions that are intended to influence policies by strengthening the capacities of civil society ('policy influence' type intervention). It is certainly still relevant but the temporal and spatial dimensions of the evaluation would probably have to be adapted to take account of the specific features of these 'policy influence' interventions (see box 7). A study concentrating on this type of intervention would be needed to make a more confident statement about the relevance of implementing a mixed rigorous approach for evaluating the effects of such interventions on the beneficiaries.

Box 7: The complex case of ‘policy influence’ type interventions

Some NGOs, through their partners, conduct actions to strengthen civil society and encourage local authorities, the business world (multinationals) and international organisations to adopt and implement policies/provisions/actions in aid of the most disadvantaged populations and/or the environment.

There are four aspects related to this type of intervention that make it difficult to evaluate the attributable effects on the final beneficiaries and the sector concerned59:

1. The observability and non-linear nature of the effects. These actions aim to bring about a change in understanding that will lead to a change in attitudes/policies then in behaviour, in order to see ultimately a change in people's lives. Although it is possible to assess a change in understanding after several years, changes in attitudes/policies and behaviours that will in turn have an effect on people's well-being are likely to take more time. These effects are not linear, but happen suddenly, when a set of conditions are met for this to take place. It is therefore possible that changes at these levels are not observed because the evaluation takes place too early in the process.

2. The possible obsolescence of the effects. The activities conducted for this type of intervention are greatly influenced by the context (the opportunities or obstacles can change unexpectedly at any time). The objectives also change with the context if they want to remain relevant. It is therefore possible that the expected effects defined at the start of the intervention no longer match the expected effects at the end of the intervention.

3. The influence of many other exogenous factors. In addition to taking time, changes in attitudes/policies are influenced by many factors external to the intervention such as political and economic stability, natural disasters, health crises, elections, etc. It is therefore important to take account of these external events before deciding on whether or not the achieved effects can be attributed to the actions undertaken as part of the intervention.

59 Coates & David, Oxfam GB, 2002.



4. A multitude of stakeholders addressing these issues. In addition to country-specific events, actions financed by other donors to strengthen civil society and lobby can also influence the achievement of an intervention's effects. In an environment with many stakeholders, it is difficult to identify the effects that can be attributed to the intervention of the NGO in question and assess its general contribution.

5. The difficulty of finding a credible counterfactual. These actions can be taken so they reach the entire population or carried out in a defined geographical area. In the first scenario, it is impossible to find a control population. In the second case, there is a risk that the control population could be influenced by the actions, because communication methods (radio, local newspapers, and large gatherings) potentially encourage a wider distribution of the effects. It therefore becomes problematic to attribute the effects.

These findings reveal that the logical framework is certainly not the most suitable tool for accounting for the complexity and non-linear nature of the effects of this type of intervention. It is unlikely that the effects will be achieved as imagined at the start, in a short and defined time period. The theory of change (ToC) seems better suited to developing a relevant monitoring and evaluation system for this type of intervention. Results indicators for the beneficiaries must be defined together with indicators on the development in the context.

Box 8: The development in the overall development assistance context

The overall development assistance context is evolving. Partnership relationships are changing because there is increasingly more expertise available in the South. The diversity and number of partners is growing. There are local funding sources, etc. The interventions supported by Belgian NGOs are implemented by increasingly independent partners, who sometimes have objectives that differ from those of the Northern stakeholders. Today's world is increasingly a world of networks that pursue a set of objectives in different fields and different countries.

NGOs formulate and evaluate their interventions in this changing context. Besides the fact that this change challenges the relevance of developing the capacities of Southern stakeholders, it highlights the complexity of the context. This complexity can have consequences in terms of feasibility and relevance in evaluating interventions for the final beneficiaries using rigorous mixed methods. Indeed, it may make it more difficult to demonstrate attribution, etc.

This evolution leads to changes in the way 'development is done', which is likely to affect the way development actions are evaluated. However, accounting for the results of a development action for populations is no less relevant in this context. It is also a way of involving all stakeholders and pooling learning.

What is the cost/benefit ratio for these rigorous evaluation methods? The stakeholders are genuinely concerned about the additional workload and budget involved with such an approach and question the cost/benefit ratio of such an approach. Indeed, although there is added value in terms of findings, is it worth developing more rigorous methodological approaches that consume resources for these benefits?

In order to answer this question, we need to look at both sides of the equation, the costs but also the benefits. In terms of costs, it is true that this evaluation used significant financial and human resources, but it is important to remember that these are ex post evaluations.



Therefore, a certain number of problems60 had to be addressed so that, based on reliable data, it could measure the effects on the final beneficiaries, demonstrate their attribution and explain their underlying mechanisms. These types of problems would not have been present if the NGOs had developed more rigorous M&E systems.

A more rigorous monitoring and evaluation system should not cost significantly more than the systems currently in place. It is mainly a question of revising the current system by adding greater rigour based on three priority areas.

1. Whether a qualitative or quantitative approach is favoured, it is important to identify a control group at the level of the target population from the start of the intervention. If the control group later becomes a group of beneficiaries, it is then normal to include it in the monitoring and evaluation system, this will allow the attribution of the effects to be demonstrated. Otherwise, it is at least important to verify the changes in this group throughout the intervention. Depending on the evaluation's objective and the available resources, FGD and household surveys must be conducted in order to measure the intervention's net effects.

2. Once more, whether the approach is primarily qualitative or quantitative, it is important to select the participants as randomly as possible, for meetings, discussion groups or FGD. If this is not possible, it is then essential to identify any biases in the findings resulting from non-compliance with this principle.

3. A mixed approach must be favoured, using both qualitative and

quantitative methods, and where the information collection process produces reliable and sufficient data. Regardless of resources, quality is more important than quantity.

a. Targeting the necessary information to prove the effects of the development action and understanding the mechanisms of change. This means having carefully considered the indicators and the information collection and processing methods.

b. A large number of observations will allow the effects to be measured while a small sample can only assess them. However, there must be the necessary expertise for the quantitative processing.

c. Consider looking for any existing databases that could be used. The relevance of this data must be verified for the intervention in question, as well as for its reliability (be sceptical of national statistical data and favour data from research centres and international institutions).

d. Implementing household surveys assumes that they are conducted correctly. Otherwise, it is sometimes better to use secondary, publicly available data or even be limited to a rigorous qualitative approach.

In terms of benefits, as previously stated, the rigorous mixed methodological approach offers some added value regarding learning and improving aid effectiveness. Indeed, it allows the results that can be attributed to development assistance to be measured with a certain precision for the populations and provides an understanding of the mechanisms at work. However, these benefits would be more valued if a rigorous evaluation framework for this level of results was required for all stakeholders. This would produce a benefit for all development cooperation stakeholders61

60 Just as with the reconstruction of the pre-intervention situation, the creation of updated lists of beneficiaries,

identification of a counterfactual, the preparation of questionnaires, etc. 61 Thus creating a 'public good' which, if of good quality, could extend beyond Belgian borders. English

development cooperation in particular, but also Dutch, German and French development cooperation produce rigorous impact evaluations where the lessons learned are used by a wide range of development stakeholders.



due to the large amount of evidence supplied. This evidence shows how Belgian interventions have brought about a significant change in people's lives.

A rigorous mixed methodological approach in the M&E system is affordable, particularly if there are incentives to report the effects as far as the final beneficiaries and with a certain degree of precision and reliability. However, this does not mean that such an approach is easy to implement. It requires all stakeholders to commit to and take ownership of the concepts of rigour.

Box 9: The impact evaluation and the programme approach

Some of the evaluated interventions are part of a funding 'programme' and others are part of a funding 'project'.

This study's objective is to review the definition of impact in terms of a non-governmental development cooperation intervention and not in terms of a programme as a whole. Similarly, its aim is to assess the effects on the final beneficiaries and more generally (see section 3.1), for an intervention and not for the programme of which it is part.

According to the NGOs, a programme can consist of several different interventions in the same area or the same type of intervention in several regions of the world.

In the first scenario, these interventions can have synergies between each other. This must therefore be taken into account when analysing the attribution of the effects. The analysis must also consider these potential synergies when explaining 'why' and 'how' the intervention did or did not achieve the expected effects.

In the second case, it may be interesting to use the findings made when evaluating an intervention in one region in order to draw conclusions about interventions implemented in other regions of the world. It may even be interesting to assess an intervention's contribution to achieving the overall impact of the programme of which it is part.

Evaluating the programme's impact is outside the scope of this study.

There are evaluations that focus on all the components in a programme. For information, the NGO BRAC in Bangladesh has conducted coordinated impact evaluations in several countries. These rigorous evaluations (using the Randomized Control Trial62 methodological approach) were conducted with considerable budgets and over several years with the collaboration of many highly regarded academics (including Esther Duflo).63 Only such an approach can be used to assess the impact of a programme in its entirety. However, this is beyond the monitoring and evaluation framework of a programme set up by Belgian NGOs. An evaluation on such a scale must be conducted in partnership with companies specialising in evaluation and with specific budgets.

62 "Randomized control trial". 63 http://www.econ.yale.edu/~cru2/pdf/Science-2015-TUP.pdf



7. Conclusions and lessons learned

Firstly, we must remember that the sample of NGOs participating in this study and the 4 selected interventions are not representative of all development actions conducted in the NGO sector in Belgium, or the evaluation practices of such stakeholders. There are strong indications that the 22 voluntarily proposed interventions were considered by the participating NGOs to be interventions that had been relatively successful. The 4 case studies were selected based on well-defined criteria, so as to ensure the impact evaluation's feasibility within the time and budget constraints, while maximising the learning process. However, since the 12 NGOs had been involved in the evaluation process, along with the NGO federations (Ngo-federatie and ACODEV) and DGD representatives, the considerations in this study are, at least partially, relevant to all development assistance stakeholders in our country.

This chapter is designed to draw the principal conclusions and lessons learned from this relatively dense and complex study. These are based on the entire evaluation process, from selecting the 4 interventions to writing the synthesis report, taking account of the various discussions with the steering committee (SEO, NGO federations, DGD) and the 12 participating NGOs.

7.1 General conclusions

7.1 This study presents the feasibility and the need to account for development assistance results for populations. The rigorous impact evaluation is a relevant tool for reporting on aid effectiveness.

7.2 It is the level of rigour with which the methodology is implemented that guarantees the credibility of the findings of an evaluation (not solely impact evaluations) rather than the type of methodological approach (qualitative or quantitative).

7.3 Despite good intentions, NGOs are still not sufficiently aware, incentivised and equipped to assess the impact of their interventions.

7.4 Evaluation impact differs from evaluating long-term effects. An impact evaluation can be designed ex post (as is the case for this evaluation and can also evaluate the long-term effects) but it is more relevant to design it ex ante (i.e. when formulating and implementing a development action).

7.2 Conclusions on the summative evaluation questions

7.2.1 Relevance

The 4 interventions satisfy the priority needs of the beneficiary populations. The choice of partners and intervention procedures proved to be relevant for two interventions. For the other two, the judgement is more complex because questions remain as to the viability of the local structures without the financial support of the NGO. Furthermore, the courses of action raise questions as to whether this is the most appropriate way to achieve the required effects on local populations.



7.2.2 Impact evaluability

(i) For most of the 12 participating NGOs, the monitoring and evaluation system is relatively sophisticated for assessing the outputs and outcomes, but only at partner level. In terms of the results for the final beneficiaries (outcomes and impact), this system is much weaker. It is not designed to measure the effects on the beneficiary populations, much less to demonstrate the attribution of the effects to the intervention or to explain the underlying mechanisms behind the changes. An intervention's impact is also often loosely formulated and is relatively distant from the other effects in the causal sequence, making it difficult to evaluate. This is especially true for a programme's impact.

Incentives for a rigorous assessment of the impact of the actions on the target populations are almost non-existent. On the one hand, NGOs work closely with partners who are responsible for implementing the interventions so they feel more accountable for the effects on their partners' capacity building, than for the intervention's effects on the target population. The NGOs are also faced with a lack of resources, time and expertise for assessing these effects. On the other hand, the donor does not encourage the NGOs to account for this level of results. In the evaluation criteria, effectiveness and efficiency still take precedence over impact. This is also one of the findings of the Meta-evaluation of the Programmes of Non-governmental Stakeholders (SEO, 2013).

This conclusion leads us to a paradox: all the stakeholders agree to say that the ultimate objective of development actions is to improve people's well-being but this level of change is not widely evaluated by development stakeholders in Belgium. In other words, it is not possible to account for aid effectiveness in our country because this should be based on the (rigorous) assessment of the effects on the final beneficiaries and the effects at a more general level, yet this is not done. Accounting for the results for the partners is necessary but not enough for it to infer the results for the beneficiary populations. This evaluation contradicts the implicit hypothesis of believing that if the capacity building action is relevant and well done, it will necessarily lead to effects on the final beneficiaries.

(ii) Qualitative and quantitative evaluation tools are used by most of the 12 participating NGOs but a lack of rigour is often observed in their application.

Although the NGOs' evaluation tools are mainly qualitative, the quantitative approach is not overlooked. Large numbers of NGOs have quantitative data about the results of their interventions. Nevertheless, this data remains relatively unused in the evaluation reports for various reasons such as the lack of confidence in the data collected, the lack of capacity internally and/or among the consultants recruited to exploit this data, the lack of financial resources and time to exploit it, the fact that the data remains at partner level, etc.

Furthermore, the pre-intervention (baseline) data is often of poor quality. This is secondary data, the reliability of which has not always been verified, or data collected by partners, on a small scale, without any real methodology and not necessarily consistent with the monitoring indicators. It cannot therefore be used to assess the change in the situation of the populations over the intervention's duration.

In addition, there is no rigorous methodology for selecting respondents in the monitoring and evaluation systems of the 12 participating NGOs. Self-selection is often preferred but without presenting the biases related to such a practice. It is not therefore uncommon for evaluation reports to draw conclusions for all beneficiaries based on a very small, self-selected sample, without necessarily mentioning this fact.

Finally, the counterfactual approach is non-existent, making it impossible to demonstrate the attribution of the results. None of the monitoring and evaluation systems for the 22 interventions put forward for this study were designed using a



counterfactual approach. It is not therefore possible to demonstrate the attribution of the effects observed.

7.2.3 Achievement of outcomes

As a reminder, for this study, outcomes are defined as the effects for the different types of beneficiaries, principally partners, final beneficiaries and, where appropriate, intermediate beneficiaries.

(i) The 4 selected interventions feature different types of partnership, not only in terms of the nature of the partners - local NGOs (in two interventions), municipalities (in one case), civil society groups (in two cases) and private partner (in one case) - but also in their number and the involvement of the Belgian NGO in the field. Some are long-standing partners, while others are more recent. The four NGOs have a local expatriate representative, but have different relationships in the field with both the partners and the population. It is not possible to draw conclusions on the partnership approach based on these 4 experiences. However, it is interesting to highlight several of the lessons learned:

- Overall, in the 4 case studies, the partners appreciated the support received.

- For the two NGOs with a strong field presence and frequent interactions with the partners, the latter admit that they have learned a lot working with the Belgian NGO. They say that they have improved their management practices, not only to satisfy the donor's requirements but because they saw its benefits for the workers and their actions.

- In the other two interventions, the findings are more tempered. In the first case, the partners have been working with the Belgian NGO for years. It is not easy to identify how the capacity building is today still necessary, or even effective. This relationship is more like a funding line. In the second case, the partner is recent. It was not easy to distinguish whether the shortcomings observed were due to everyday start-up difficulties but with a promising future or, more fundamentally, the partner's incompetence.

- The partnership with a public institution (the municipalities) was interesting since, while it was temporary (due to the cycle of municipal elections), it proved to be effective. Despite the fact that the current teams had just changed, the intervention sowed the seeds that led to greater capacity to grow under this new legislature. This is due to the fact that the intervention had tangible benefits for the municipalities and populations. The municipalities had understood how to mobilise and organise themselves in order to establish this type of project and the population knew that it could demand this type of support from its leaders.

- The partnership with civil society groups is also interesting. These groups are well-established in society, they therefore have a certain legitimacy since their actions for the target populations precede the intervention and go on after it. Civil society organisations sometime have fewer technical skills to mobilise than local NGOs. It is therefore interesting to involve them in a partnership.

(ii) There are different types of beneficiaries for the 4 interventions. Whether water management committee, producers' groups or even health committees, these stakeholders are often decisive for the achievement of the effects on the populations.

(iii) The conclusions on the achievement of the effects for the final beneficiaries are also difficult to summarise given that the 4 selected interventions had different objectives in various regions of the world. However, a few lessons did emerge:

- Firstly, the rigorous evaluation of an intervention focusing on the effects on the final beneficiaries yields interesting information about the design and implementation of the actions. It highlights shortcomings in terms of the theory of



change. In particular, it reveals the 'missing middle'64 between the results for the partners and those for the populations. In addition, it challenges the monitoring and evaluation system, from the formulation of indicators to their analysis, including their collection method. And finally, it holds the NGO and the partners to account for their responsibilities, namely to identify what works, for whom and how.

- In two of the four interventions, the effects on the final beneficiaries were identified, measured and could be attributed, at least in part, to the intervention. In the other two cases, the results are less tangible, but that is not to say that there were no effects.

7.2.4 Achievement of impact

As a reminder, in this study, impact is defined as the effects at a more general level, namely (i) the extent of the effects (proportion of the population affected and 'spillover effect'), (ii) the effects on local policies, (iii) the effects on the legitimacy of the NGO and its partners in the field of action and, where appropriate, (iv) the contribution of the intervention's effect to the programme's impact. For the four selected NGOs, the analysis of this level of results had never been explicitly considered.

- The proportion of individuals/families affected by the intervention could only be calculated in a single case, thanks to the availability of data. For the other interventions, these calculations would have been very approximate estimates, without any real lessons given the poor reliability/availability of the data. The 'spillover effect' was clearly identified for two of the four interventions but this could not be measured with precision.

- The effect on local policies is weak but present in three of the four interventions. This may also be due, in part, to the duration of the interventions. However, the intervention with the longest implementation period and aiming for the greatest 'policy influence' is the intervention where this effect is least visible. This is also the intervention that has the least interaction with the local authorities. However, strong conclusions cannot be drawn about this due to the small sample size.

- The legitimacy of the NGO and its partners in the sector of activity/region was explicitly assessed in one case study (that in Peru). This level of analysis can be used to consider the action in a longer and more consistent time frame.

- The contribution that achieving the impact of a programme's component makes to that programme's overall impact is beyond the scope of this study. This is complex, but not uninteresting. It should be noted that the 'missing middle' appears even larger in the causal sequence between the impact of the interventions and the overall impact of the programme.

7.2.5 Sustainability

The sustainability of the effects at partner level is questionable when the funding from the Belgian NGOs ceases. Nevertheless, this seems less problematic with civil society organisations or with a partner from the public sector.

The sustainability of the effects on the beneficiaries is often challenged by the sustainability of the effects on the intermediate beneficiaries. Here are two examples to illustrate this. If the water management committees do not collect the monthly fees for the consumption of drinking water, they no longer have the resources to maintain the network and people's access to drinking water and the resulting effects are therefore challenged. If the producers' organisations are no longer well organised enough

64 ‘Missing middle’ is the expression used when the causal sequence of effects is often interrupted between the

results for the partners and the results for the final beneficiaries (the population). I.e. the causal relationships between these two levels of results are unclear and therefore difficult to assess.



to access the market (or the inputs), the sale of products is no longer guaranteed (or crop productivity becomes a problem). It therefore becomes difficult to improve producers' income.

7.2.6 Gender and environment

Three of the four evaluated projects clearly target women as the beneficiaries.

None of the interventions explicitly refer to environmental protection. However, the components of each intervention may help with nature preservation.

7.2.7 The added value of a rigorous mixed approach

The four selected NGOs, but also the other participating NGOs, recognise a certain added value in implementing a rigorous mixed approach to assess the effects of development actions. This study confirms the non-governmental stakeholders' interest in improving the effectiveness of development assistance. However, it also highlights the institutional obstacles and fears, which imply that applying these methods is still problematic in practice (see section 7.3.2 and section 7.4.2 for more details).

7.3 Conclusions on the formative evaluation questions

7.3.1 The definition of impact

Confusion about the term 'impact' (a problem that affects the development cooperation community as a whole), also affects the NGO sector (and the DGD) in Belgium. The lack of a common normative framework for Belgian NGOs to define impact does not however seem to lead to problems when implementing or evaluating the development actions of the NGOs involved in this study.

The NGOs are accountable for their results, but the level of results for each type of beneficiary is not clearly specified (output/outcome/impact). Furthermore, the requirements in terms of quality for assessing the results are not defined. Both stakeholders and the donor also seem to be confused between results-based management to obtain results and management by results.

The lack of a shared vision about the definition of impact and therefore about what changes are pursued, creates confusion as to an impact evaluation's purpose and the methods to be used to account for aid effectiveness. In conclusion, a shared vision of the definitions of the different levels of results, outcome and impact in particular, is essential for defining the requirements in terms of results-based management (for obtaining the results) and consequently in terms of accountability and aid effectiveness.65

65 See also Évaluation du rapportage des résultats de la DGD (SEO, 2012).



7.3.2 The use of rigorous mixed methods: feasible, relevant and affordable

Conclusions based on the selection process

Twelve Belgian NGOs voluntarily submitted a total of 22 interventions (projects in their own right or components of a programme), all co-funded by the DGD between 2008 and 2013. These participating NGOs knew that only 4 of these interventions would be selected, but they did not know the selection criteria in advance.

In absolute terms, a rigorous methodological approach was feasible for all the interventions. However, once the context and the cost of accessing the beneficiaries and their random selection was taken into account, the cost of evaluating these 11 interventions became too great in relation to their benefits in terms of learning about the effects on the target populations.

After the selection process, only 5 interventions satisfied the three criteria categories (-1- the feasibility of implementing rigorous mixed methods; -2- the context at the time of the intervention's implementation and at the time of the evaluation; -3- the cost/benefit ratio of the exercise for an NGO and the entire sector).

This does not mean that the other 17 interventions could not have been the subject of an impact evaluation using rigorous mixed methods. Indeed, there was only a problem applying rigorous practices for half of the interventions and the two main reasons were as follows: (i) the difficulty in obtaining a list of final beneficiaries and/or the ease of accessing them; and (ii) a relatively high probability that the expected effects on the final beneficiaries could not be observed because the intervention presented shortcomings at the partners’ level. The difficulty in 'easily' finding a credible counterfactual only ruled out 2 of the 22 proposed interventions.

General conclusions on feasibility, relevance and affordability

(i) For both qualitative and quantitative methods, the evaluation implemented a rigorous information collection process following a counterfactual approach (to demonstrate attribution) and randomly selecting respondents for the household surveys and the discussion groups (in order to obtain the most objective and representative responses). This dispelled some of the negative perceptions regarding a more rigorous approach: it is feasible, even ex post, to identify a credible (although imperfect) counterfactual and select respondents randomly.

The ex post context, the time constraint, and/or a lack of knowledge about these rigorous practices by local experts and partners led to several sub-optimal situations for applying the rigorous mixed approach in the strictest sense. However, even with an imperfect counterfactual, baseline data relying on respondents' memories, a relatively small sample, etc. the approach proved to be feasible and the lessons were relatively important.

(ii) The four selected NGOs identified several benefits in applying rigorous mixed methods, both in terms of findings about the results and in methodological terms. This approach therefore seems relevant for assessing the results of a development action with precision, using certain adaptations for "policy influence" type interventions.

The four selected NGOs recognise the value in clarifying the causal sequence between an intervention's results and making it explicit up to the final beneficiaries. This approach was able to provide precision when measuring certain effects, due to the reliable data collected. The question of whether (or not) the effects could be attributed to the intervention was systematically tackled and put the importance given to certain effects into perspective. Some unexpected effects could be presented not only for the partners, but for the intermediate and final beneficiaries and more generally. The four NGOs also recognised



that this approach provided a better understanding of the underlying mechanisms behind the changes.

Furthermore, the qualitative and quantitative analysis tools are complementary and their combination (qualitative or quantitative dominant) can be adapted to the available resources. This study shows that even if the number of observations adds precision to the findings, the quality of the information collected should be preferred over its quantity. It also shows that the use of existing and relevant secondary data emerged as a viable option for the quantitative analysis of some results.

A frequently-heard objection is that the North-South relations are developing towards greater expertise in the South, greater autonomy for partners and a greater number and more diversified partners, which would reduce the interest and indeed the feasibility of an impact evaluation. We do not share this view. It is still important to (dis)prove and understand the results that the development actions have been able to generate for the populations. And an intervention's complexity is not an excuse not to (dis)prove this level of result. This type of evaluation generates knowledge about the process of change, can help to improve the effectiveness and efficiency of interventions and, thus, obtain better and more results with the same funding. The summative conclusions of this evaluation show the added value of the exercise.

(iii) In many cases, the impact evaluation is affordable. Although the overall cost of using a rigorous system to evaluate the impact is difficult to put a figure on (that would have to be done on a case-by-case basis), the idea that the impact evaluation is unaffordable does not hold up.

The cost/benefit ratio of such a methodological approach is difficult to judge ex post. This ex post evaluation certainly mobilised more resources than if a rigorous approach had been implemented from the start of the intervention. Given its ex post nature, certain shortcomings in the existing monitoring and evaluation systems had to be overcome in order to be able to measure the effects that could be attributed to the intervention for the final beneficiaries, and understand their underlying mechanisms.

The potential additional costs of a more rigorous approach in terms of the monitoring and evaluation systems of development stakeholders are therefore probably mostly covered by its resulting benefits. This approach will be much more profitable if it is designed ex ante, if all stakeholders take ownership of the concepts of rigour and especially if the stakeholders are given incentives to evaluate the impact of their actions.



8. Recommendations

This chapter presents the recommendations arising from the conclusions of this impact evaluation. They are structured according to whom they are addressed: firstly those directed at the politicians, followed by all the stakeholders, then those for the DGD, the NGO and finally the SEO.

8.1 Recommendation to the Federal Parliament and the Minister of Development Cooperation

Chapter 9 of the Law of 9 January 2014 stipulates that the results, particularly the impact, are evaluated based on the DAC criteria (Art. 32). This impact evaluation of four non-governmental cooperation projects, similar to the one focusing on four governmental cooperation projects (2013) and other recent studies shows that there is still not enough awareness and professionalism in the Belgian development cooperation sector in the rigorous evaluation of the effects of development cooperation on final beneficiaries.

It is recommended that politicians generally endorse the assessment of the changes to the target populations. In order to do so, they must allocate resources and expertise and provide the necessary incentives, not only to the State structures (SEO, DGD) but also the stakeholders in the field (like the NGOs) and those involved in the evaluation.

Furthermore, this type of evaluation must be part of the entire project cycle, from its formulation to the end of the intervention (even ex post) so that there is enough reliable data to provide relevant indicators for the target population, and in order to assess the project's effects, with rigour.

8.2 Recommendations for all stakeholders

8.2.1. Enhance training and experience sharing between all development cooperation stakeholders.

Discussions between, on the one hand, the evaluation team comprised of people who have a range of experience in evaluation and in the NGO sector and, on the other hand, Belgian development cooperation stakeholders such as NGOs, federations, the DGD and the SEO, have provided some interesting considerations. It is therefore recommended that these types of discussions continue since they promote a shared understanding of the issues, challenges, constraints and opportunities. It certainly makes sense in the context of the reform of the development cooperation sector and the certification of the NGOs' evaluation systems.

8.2.2. Encourage the use of communication methods other than paper for reporting a study's results.

The video tool should be considered for communicating a study's results to the key stakeholders, including the beneficiaries. This could speed up the learning process for the development actions and for the beneficiary populations. This has been successfully tested



within the framework of this evaluation, based on the evaluation of the intervention in Peru.66

8.3 Recommendations for the DGD

8.3.1. It is recommended that the DGD creates incentives for accounting for the results of non-governmental cooperation for the populations it is targeting. This is also set out in the "Development Results" Strategy Paper (2014, p.4).

This requires several changes, as detailed below.

The DGD should clarify its objectives in terms of evaluation. There are three different types of objectives: (i) accountability, (ii) learning for decision making and (iii) learning for knowledge sharing. In order to satisfy the objective of accountability for non-governmental cooperation, the DGD must clarify how it means to satisfy this objective and also to what extent and how it expects to fulfil the other two objectives. In order to achieve the first objective, it is important to demonstrate that the effects observed, in particular those for the final beneficiaries, can be attributed to the development actions. For the two others, it is essential to understand 'why' and 'how' the interventions have contributed to the changes observed (or not). There are also other considerations and constraints specific to each type of objective, which require priorities to be established and decisions made.67

It is recommended that the DGD promotes a shared vision of the definitions of the different levels of results, especially for outcomes and impact. The "Development Results" Strategy Paper (2014) proposes a relatively similar definition to the one²²²² proposed in this evaluation. However, these terms (outcomes and impact) are not defined and are therefore not consistently understood by all stakeholders. A precise and standardised definition is essential for defining the requirements in terms of results-based management and consequently in terms of accountability and aid effectiveness. Such a definition would enable the stakeholders (NGOs) to formulate more realistic, tangible and measurable objectives for the beneficiaries and more generally. This would also encourage the development of relevant methodological approaches to evaluate such objectives.

It is recommended that the DGD specifies the level of the results on which the NGOs must report.

o Must they report the results at programme level and/or the results at a more disaggregated level, i.e. for the different components of a programme?

o Must they report the results for the partners, intermediate beneficiaries and/or the population targeted by the intervention?

o Must they report on the outcomes and/or the impact? o Must they report at the level of the Joint Strategic Frameworks (JSF)?

It is also recommended that the DGD defines the required level of

reporting for the different levels of results and for the different types of beneficiaries. The required level is closely related to the type of methodological approach, qualitative and/or quantitative, and the degree of rigour in implementing the methodological approach in the monitoring and evaluation system.

66 The two short videos produced (one on the methodology applied, the other on the project's results) can be

viewed on the following webpage - they each run for 4 minutes: http://www.ade.eu/news-detail.php?news_id=56

67 See the Meta-evaluation (SEO, 2013).



The type of reporting and the level of requirement should be discussed with the NGO federations, in order to arrive at a feasible, affordable and above all useful practice for both the NGOs and the DGD.

o The idea is not to increase the administrative burden for either the NGOs or the DGD but to consider a reporting system that can be used to learn lessons for the sector and thus improve aid effectiveness.

- In order to satisfy its accountability objective, the DGD does not necessarily need evaluate every programme and every intervention individually. What is important is that it covers a large enough share of the development cooperation through a series of evaluations so that it is credible as a whole.

- It is recommended that the evaluations focus on the achievement of the outcomes and impact for the final beneficiaries. The study shows that these levels of results are very rarely evaluated, yet "...the true test of aid effectiveness is improvement in people's lives." (2006 Survey on Monitoring the Paris Declaration: Overview of the Results, OECD, 2007).

- Furthermore, it would be interesting to force the stakeholders to evaluate the components of their programme more generally, even if these effects were not initially explicitly sought. This means assessing the effects at sectoral level, the geographical scope of the effects (proportion of the population affected and 'spillover effect'), the effects in terms of the NGO's reputation (performance-related) and that of its partners in the region and finally the contribution of the intervention's results to the programme's overall objective.

o The idea is not to increase the cost significantly, but to do better with the

resources available for the evaluations and the monitoring and evaluation systems implemented to date. Box 10 details what is meant by "rigorous mixed methods”.

- Firstly, it is recommended that rigorous practices be encouraged in the implementation of monitoring and evaluation systems and evaluations. This means demanding a description of the methodological approach followed, where the limits are clearly set out. Such a practice can be used to judge the credibility of the results, whether a qualitative and/or quantitative approach is applied.

- Secondly, it is also recommended that the use of mixed methods be encouraged. Qualitative and quantitative analysis tools are complementary, they are used to measure quantifiable effects and assess other effects, while demonstrating or arguing the attribution of the results and giving explanations about the underlying mechanisms behind the changes.

o Here is a list of questions that could be used to define the type of reporting required for the beneficiary populations68:

- What types of changes can be observed after the introduction of the intervention?

- To what extent can the identified changes be attributed to the intervention? In other words, what would have happened if the intervention had not taken place? Are there elements independent of the intervention that could have fostered these changes? Were the contextual factors alone, without the intervention, enough to encourage these changes?

- How has the intervention encouraged this change?

68 "Broadening the range of design and methods for impact evaluations", DFID 2012, p.37.



- Would the intervention have had the same effects elsewhere? (if external validity is of interest).

It is recommended that development stakeholders use the theory of

change (ToC) as a basis for constructing the monitoring and evaluation system, and that the theory of change be reconstructed, even ex post, for each evaluation. This encourages the NGOs to develop a theory of change for each component of their programme but also for their programme. The objective is to highlight the causal relationships for each intervention, from the activities/resources to the final beneficiaries, while placing the action in its context and setting out the hypotheses upon which the envisaged theory of change is based. Such a practice has the result of encouraging them to be more realistic when defining the objectives for an intervention or a programme (see Box 10). Furthermore, due to its flexibility and its iterative process, the theory of change is more suitable than the logical framework for complex and/or lobbying type interventions.

Establishing a theory of change (ToC) at programme level helps to encourage the NGOs to make the connections between the specific objectives of a programme's components and the programme's overall objectives more explicit, in order to assess how each component is involved in the achievement of the overall objective.

It is recommended that the NGOs be given incentives to implement effective and rigorous monitoring and evaluation systems to report the results for the target populations that can be attributed to their actions. Incentives are needed to produce relevant and usable baselines (which is rarely the case currently) and to adopt a counterfactual (even qualitative) approach.

The creation of incentives is also recommended so that these monitoring and evaluation systems can be used to report on failures, without this leading to consequences for future funding, if lessons have been learned. One of the five fundamental principles of the vision of the "Development Results" Strategy Paper (2014) is a step in this direction since it consists of "learning lessons about what works and what does not work". Nevertheless, reporting problems or failures is not encouraged by the second fundamental principle which provides that "funding is based on past results". This evaluation therefore recommends that funding be based on (rigorously evaluated) results, but also on stakeholders' ability to learn the lessons of the past.

8.3.2. It is recommended that the notion of capacity building of partners be challenged. It is certainly important to allow partners to take advantage of training in order to improve their management practices, but it is just as important to develop their capacity to conduct strategic analyses or even to implement an effective monitoring and evaluation system to account for the results for the target populations.

8.3.3. It is also recommended that the NGOs be encouraged to collaborate with different types of partners. Partnerships with civil society organisations in their field of expertise or even with decentralised, local public institutions may improve the sustainability of the results compared with partnerships with local NGOs that are highly dependent on the funding provided by Belgian NGOs in order to act among populations targeted by the intervention. The local authorities and civil society organisations have relationships with the beneficiary populations that pre-exist the intervention and will continue beyond it. Furthermore, working with such bodies is a way to establish good governance practices.

Civil society stakeholders (in each field of competence) should be identified at the time of the common contextual analysis. This analysis should argue where and how this type of stakeholder could increase the chances of the interventions' success and/or their sustainability. It would also be worth identifying the contextual factors and elements that could discourage civil society organisations from springing up everywhere in response to the proliferation of external funding. This analysis should also address the issue of



legitimacy for a Belgian NGO working with civil society groups while ensuring the consistency of the interventions of the different Belgian stakeholders in the field (bilateral and non-governmental aid).

8.4 Recommendations for the NGO

8.4.1. It is recommended that monitoring and evaluation systems are developed that allow accounting for results for the final beneficiaries and more general results.

Such a system can be used to account for actual aid effectiveness and generate knowledge about the development for the populations.

Several elements must be taken into account in order to achieve it:

The theory of change should be used as a basis for constructing the monitoring and evaluation system. This tool is interesting because it can present the causal relationships starting with the activities/resources and ending with the final beneficiaries while putting the action into context and stipulating the underlying hypotheses behind the changes. Such a practice can help, among other things, in being more realistic when defining the objectives for an intervention or a programme (see Box 10). It should be defined for each of a programme's components but also for the programme as a whole (see details in the recommendations for the DGD).

The partners' capacities for monitoring and evaluation must be strengthened. The partners, responsible for implementing the interventions, are the front line for collecting information about the intermediate beneficiaries (e.g. farmers' groups or water management committees) and the populations. They must be given incentives to be more rigorous in their data collection practices. They should therefore take ownership of the concepts of mixed methods and rigorous practices (see Box 10). This requires that sufficient attention is paid to feedback about the monitoring and the evaluations given to all relevant parties. This then constantly fuels the awareness of the importance of (good) data collection and contributes to the sustainability of the system - this is also recommended in the Study of the practical evaluability of the interventions co-funded by Belgian development cooperation (SES, 2015).

It is also recommended that the more effects of an intervention at a more general level be assessed (especially if the period covers an intervention cycle and not just a funding cycle). For more details, see the recommendations for the DGD. In this regard, it is important to assess these general effects in line with the intervention's characteristics (duration, scope, budget). An ex ante discussion about this would also be required because it may potentially lead to a resizing of the intervention's ambitions.

8.4.2. It is recommended that, within the monitoring and evaluation system and in the evaluations, a more rigorous methodological approach be adopted, while focusing on mixed methods in order to produce credible findings, demonstrate the attribution of the results and explain the underlying mechanisms behind the changes.

Box 10 details what is meant by "rigorous mixed methods".

In practical terms, it is recommended that the monitoring and evaluation systems be revised by adding greater rigour according to three priority areas. Furthermore, it is essential to ensure that all stakeholders take ownership of these new 'rigorous'



concepts, that they learn their benefits, limits and challenges, in terms of data reliability and consequently the credibility of the findings.

1. Whether a qualitative or quantitative approach is favoured, it is important to identify a control group at the level of the population targeted from the start of the intervention. If the control group later becomes a group of beneficiaries, it is then normal to include it in the monitoring and evaluation system, this will allow the attribution of the effects to be demonstrated. Otherwise, it is at least important to verify the changes in this group throughout the intervention. Depending on the evaluation's objective and the available resources, discussion groups and/or household surveys must be conducted in order to measure the intervention's net effects.

2. Once more, whether the approach is qualitative and/or quantitative, it is

important to select the participants as randomly as possible, for meetings or discussion groups. If this is not possible, it is then essential to identify and describe any biases.

3. Favour a mixed approach, using both qualitative and quantitative

methods, for which the information collection process produces enough reliable data (see Box 10). These two approaches are complementary. In short, the advantage of the quantitative approach is that it allows the results to be measured with a certain degree of precision and thus demonstrates the attribution of the effects. It can also be used to distinguish, with precision, between the importance of the effects for different beneficiaries (heterogeneity of effects: different effects on women, the poorest, etc.). It is advisable to use an external expert where necessary in order to ensure the quality of the data collection process, particularly when implementing a household survey. Regardless of the resources, quality must be favoured over quantity and there must be transparency about the limits in terms of the monitoring and evaluation of the interventions.

- Identifying the necessary information to prove the effects of the

development action and understanding the mechanisms of change. This means that the indicators and the information collection and processing methods have been carefully considered.

- Creating a usable baseline, which means that the indicators collected must be useful in assessing the results at the end of the intervention. This also means that the collection method is rigorous and the limits are clearly expressed.

- A large number of observations will allow the effects to be measured while a small sample can only assess them. However, there must be the necessary expertise for the quantitative processing.

- Consider identifying any existing databases that could be used. The relevance of this data must be verified for the intervention in question, but also for its reliability (be careful about national statistical data and favour data from research centres and international institutions). Using household surveys is one option, but this assumes that it is done correctly. Otherwise, it is sometimes preferable to use secondary, publicly available data or even to limit oneself to a rigorous qualitative approach.

Revising the monitoring and evaluation system using these three priority areas would enable the NGOs and their partners to account for the results of their actions as far as the target populations, while learning about the mechanisms at work. Such evidence would allow relevant new interventions to be defined in line with the realities of the key stakeholders: the disadvantaged Southern populations.

Generally, the NGOs should be able to answer the following questions:



- What types of changes can be observed after the introduction of the intervention?

- To what extent can the identified changes be attributed to the intervention? In other words, what would have happened if the intervention had not taken place? Are there elements independent of the intervention that could have fostered these changes? Were the contextual factors alone, without the intervention, enough to encourage these changes?

- How has the intervention encouraged this change? - Would the intervention have had the same effects elsewhere? (if external

validity is of interest).

8.4.3. An intervention's complexity or the cost of a rigorous mixed approach are not valid reasons to avoid developing a more effective monitoring and evaluation system and producing more rigorous evaluations.

The context of development assistance is changing, due to growing expertise in the South, an increasingly large number and wide range of stakeholders and partners and 'lobbying' type approaches to aid, making some interventions highly complex. However, this is not a valid excuse not to evaluate these actions for the beneficiary populations. Furthermore, the Study on the practical evaluability of the interventions co-funded by Belgian development cooperation (2015) points out that the probability of having an efficient monitoring and evaluation system is positively correlated to the intervention's complexity. Thus, even if the context of aid is becoming more complex, reporting a development action's results for populations is no less relevant.

Furthermore, the lack of resources is often presented in order to avoid evaluation practices being challenged. More than the cost, we believe that it is a lack of knowledge, expertise and/or incentives that prevents the development of more rigorous and quantitative practices.

As a minimum, resources must not be wasted in an inefficient monitoring and evaluation system. A realistic budget must also be defined for a quality impact evaluation that satisfies the requirements and takes account of the constraints. Finally, the need to use quasi-experimental quantitative methods must be assessed on a case-by-case basis. The following costs should be identified:

- The cost of the monitoring and evaluation system (definition of a baseline and indicators, monitoring these indicators, etc.): this is normally borne by the project and should allow the outcomes and (potential) impact to be monitored and measured. In contrast, resources put into a system that is of little use are largely wasted. The ex post impact evaluation can sometimes overcome this shortcoming, but this is not supposed to be the rule.

- The cost of a rigorous qualitative approach (solid theory of change, counterfactual, random sample, quality analysis, etc.): as is clear from the Meta-Evaluation (SEO 2015), while it is difficult to produce a quality evaluation with a very small budget69, the budget is not, however, the main quality factor; this is primarily related to the quality of the requirement (which must be specific and realistic), the quality of the service providers and the quality of the management of the evaluation process. The use of a counterfactual or a random sample (not examined in the meta-evaluation) does not mean a significant cost overrun, especially if these practices are developed ex ante and owned by all stakeholders.

- The cost of a rigorous quantitative approach (primarily quasi-experimental methods): such methods involve a cost overrun, but this mainly depends on the

69 Of the 66 evaluations reviewed under the Meta-evaluation of the programmes of non-governmental

cooperation stakeholders, in 83% of cases, the final reports, judged of "very poor" quality were carried out with budgets of less than €20,000. However, small budgets have not prevented some quality evaluations: 17 of the 38 evaluation reports where the budget was less than €20,000 were judged to be of 'good' quality.



quality and quantity of the existing information. If there is a quality baseline and relevant indicators monitored throughout the project duration for enough final beneficiaries, its cost is limited. Otherwise, it will be larger.70 Furthermore, the counterfactual approach may also generate additional costs. This is particularly the case if a credible control group has not been identified ex ante and the baseline data has not been collected. It should be noted that the use of secondary data can be considered for creating a counterfactual.

Moreover, it is important to reflect on how these costs can sometimes be shared, whether across several programmes, several NGOs or the sector as a whole (for example through joint strategic frameworks).

Within all Belgian development cooperation, work must also be carried out on the constraints (expertise, budgets, etc.) and the incentives (financial, support, etc.) for accounting for the results of development actions on the final beneficiaries.

8.4.4. Collaboration with different types of partners is recommended. Partnerships with civil society organisations in their field of competence or even with decentralised, local public institutions may improve the sustainability of the results compared with partnerships with local NGOs that are highly dependent on external funding (for more details, see the recommendations for the DGD).

8.4.5. It is recommended to systematically explain where and how the intervention will have effects on women, (young people) and the environment.

8.5 Recommendations for the SEO

8.5.1.It is recommended that part of the annual SEO budget be dedicated to producing strategic impact evaluations investigating the results of development assistance for the target populations and/or more generally (extended to the effects on a region, a sector, a stakeholder). In doing so, the following aspects must be considered:

- Decide on the themes of these strategic evaluations in consultation with Belgian development cooperation stakeholders;

- Precisely define the priority objective of these evaluations; - Favour an evaluation designed ex ante and one where an external team will be

present throughout the process; - Think about an appropriate 'timing' so that the lessons are useful and come at the

right time (function of the main objective of the evaluation); - Consider involving as many stakeholders as possible (as far as the final

beneficiaries); - Provide a communication plan adapted to the target audiences, including a budget

for feedback with entertaining media for the beneficiary populations.

8.5.2. It is recommended that the SEO designs and supports the creation of a methodological framework and support system to develop the (impact) evaluation capacities of all Belgian development cooperation stakeholders. This should be done in consultation with the sector's key stakeholders. Other recent studies have also pointed to the creation of such a system - see for example recommendation no. 8 of the Meta-evaluation (SEO, 2013) and recommendation no. 6 of the Evaluability study (SEO, 2015). Resources (equal to its defined ambitions) will have to be provided for this - these could be found, for example by pooling certain services and costs and avoiding the current waste of resources dedicated to low quality monitoring and evaluation systems and evaluations. Many countries around us have for many years been ambitiously creating

70 In comparison, the budget for this evaluation, which proved to be realistic, was approximately 370,000 euros,

to cover a general exercise of learning and synthesis, a selection process, and evaluations of four interventions in four different sectors and four different countries, 3 of which were evaluated based on an approach including quantitative analysis methods, with two collections of primary data.



networks, systems and institutions specifically dedicated to the capacity building of the various stakeholders in the development cooperation sector: F3E in France, the IOB system in the Netherlands71 and more recently DEVAL in Germany, to name just our immediate neighbours. Currently, nothing similar exists in Belgium.

8.5.3. It is recommended that the SEO plays a decisive role in assessing the quality of the NGOs' monitoring and evaluation systems. This type of certification, in addition to accreditation after screening (in progress), would be two elements that would promote the relationship of trust that the DGD needs to have with these NGOs, meaning they would no longer have to report on activities and outputs and could focus on outcomes and impact.

Such certification practices have already been developed by our Dutch and English neighbours to provide better reporting on aid effectiveness.72 This has led to evaluation becoming more professional (see above).

The criteria used to judge the quality of a monitoring and evaluation system can be based on the elements presented in Box 10. We think that one DFID study seems to be particularly interesting in this regard "Broadening the range of designs and methods for impact evaluations" (2012, p.74-76). Below, we summarise some interesting concepts for defining criteria for certifying the quality of a monitoring and evaluation system. It is worth considering, broadening and adapting these to the Belgian context.

- Reliability - choice of methods, approaches and methodological tools: Do these methods make it possible to assess and even measure the results, account for the attribution and explain how the development action has been used to improve the lives of the final beneficiaries? Can they be used to identify shortcomings, failures?

- Robustness - the proper application of methodological approaches: Is there an explanation of how the methodologies have been applied? Has the previously defined protocol been followed? Do the team/stakeholders have the skills to apply them correctly?

- Transparency - drawing legitimate conclusions: Are the conclusions clearly based on the findings? Are they argued? Do they summarise the opinions of the different stakeholders? Are the limits clearly presented?

- Validity and rigour: o Contribution analysis: is it possible to identify multiple causal factors and

their interdependence? Has this been done? o The underlying explanation behind the mechanisms of change: has there

been a clear explanation of how and why the changes are expected and/or have taken place? Are these explanations supported by a theory of change? Are these explanations argued and illustrated correctly?

o Long term effects and on what type of beneficiaries: is there a consideration as to the sustainability and each type of beneficiary?

- Criteria appropriate to the intervention country: have local stakeholders been consulted/involved or is this planned; is the secondary data available in the countries being used; has the contextual analysis been properly conducted?

- Ethical criteria: have the evaluation's objectives been clearly defined with the parties concerned? Have steps been taken to ensure confidentiality? Is the feedback planned on the findings so that they can be validated by the key stakeholders?

- Institutional criteria: are local experts/partners involved? Has the evaluation been able to provoke discussions among policy makers in the donor country and the intervention country?

71 http://f3e.asso.fr and www.deval.org 72 Ministry of Foreign Affairs, 2012. Regarding IOB report “De methodische kwaliteit van programma-evaluaties

in het Medefinancieringsstelsel-I (2007-2010)”. Quoted in the NGO Meta-evaluation (2013) - recommendation 6.



Box 10: What is meant by "rigorous mixed methods"?

1. The first element is the definition of an impact evaluation's purpose and objective.

Regardless of the type of intervention, lobbying73, capacity building and/or supply of goods and services, the ultimate objective of any development action must be to improve the well-being of poor people. An impact evaluation therefore focuses in particular on measuring the results that can be attributed to an intervention at the level of the final beneficiaries (target population) and attempts to explain the underlying mechanisms behind the change process. Here, partner capacity building and the achievement of the effects on intermediate beneficiaries (e.g. farmers' groups, water management committees) are considered to be factors that encourage (or inhibit) the achievement of the effects on the final beneficiaries. The impact evaluation also focuses on the more general changes, i.e. at sector level, for a geographic area, the reputation of the NGO and its partners or even the contribution the impact makes to a programme's component and thus its overall impact. This global dimension, although not always required by the NGOs is important, because it can be used to judge the extent of the action's effects on the populations, the sector, or even on the NGO and its partners.

2. The second element is the construction of a theory of change (ToC).

The objective is to establish the explicit causal sequence from the resources/activities to the impact. All causal relationships between each level of results (from outputs to impact) and for each type of beneficiary (from partners to final beneficiaries) must be explicit and substantiated according to a theory (based on the literature, past experiences and/or contextual elements). A good theory of change requires a good contextual analysis, identifying all stakeholders present, while specifying the underlying hypotheses behind the causal relationships (constraints and opportunities). The ToC is a flexible, iterative tool which should be modified when necessary so that it remains as close as possible to the reality in the field. A ToC can also be used to provide a better account of the complexity of the world in which the stakeholders are involved.

The ToC should be used as a basis for the monitoring and evaluation system. It can be used to identify relevant indicators for the different types of beneficiaries and levels of results (up to the impact). It is also interesting to identify contextual indicators. These indicators should be monitored from the baseline to the end of the intervention and must be defined with precision. Their collection method should be specified along with the analyses for which they will be used. It is important to provide good incentives to the various stakeholders so that they collect quality information (see above) and also report on what is not working.

3. The third element is the rigorous mixed methodological approach.

The qualitative approach is based primarily on documentary and contextual analyses and on the information content collected in Belgium and in the field using qualitative tools. This study focused on semi-structured interviews with key stakeholders and group discussions (Focus Group Discussions, FGD) with different types of beneficiaries (from partners to final beneficiaries). The objective of such an approach is to understand the intervention as a whole and define the effects sought for the final beneficiaries with precision and even assess them qualitatively and argue the attribution of the effects. It also provides an understanding of the mechanisms of change. This qualitative approach is essential for preparing the quantitative approach.

The quantitative approach aims to measure the effects and demonstrate that these can be attributed to the intervention (the quasi-experimental approach is most commonly used). It therefore requires the use of enough disaggregated data for the

73 Or other "policy influence" type interventions: "advocacy, lobbying, activism, advising".



final beneficiaries. This data may exist or can be collected during the evaluation using household surveys.74 The data may exist within the NGO because it has been collected using the M&E system or during an external evaluation. There are also public databases within the intervention countries or international institutions (for example the World Bank's LSMS data) or even other databases collected by research centres. These are generally available free of charge. The data is then analysed using statistical and econometric tools. The greater the amount of data and the more reliable it is, the simpler the techniques. The choice of one or other quantitative approach depends on the context (ex ante; ex post; intervention type, duration, etc.). A description of the principal analysis tools is available in the synthesis report of the ex post evaluation of four governmental cooperation interventions (2013).

These two approaches are complementary, while allowing one or other approach to be dominant75. However, in order for them to provide reliable and relevant information for a rigorous analysis and credible findings about an intervention, they must be implemented rigorously. This means ensuring the quality of the procedure used for collecting and processing the data.

"Good data is an investment; Bad data is just an expense"76. This is true when following a qualitative and/or quantitative approach, and when few or many observations are collected.

The quality of the information collection process (for a monitoring and evaluation system or for an evaluation) is guaranteed through several elements.

1. Following a counterfactual approach. It is important to identify the effects that can be attributed to the intervention. Indeed, a change cannot be attributed to an intervention simply because a change has been identified in the target population between the start and end of an intervention. The non-beneficiary population also changes.

It should be noted that if a counterfactual approach is planned from the start of a development action and if this intervention is implemented sequentially (which is often the case), the control group can be a future beneficiary group. It should also be noted that the random selection of beneficiaries can also be considered in such a scenario (see below).

2. Randomly selecting respondents. If the respondents present themselves voluntarily or because they have been selected using certain criteria, this may influence the type of response that they give. Random selection can be used to obtain more objective responses which are therefore more representative of the reality for all target and non-beneficiary populations. The random selection of respondents can even be applied for a purely qualitative approach (for example, for the Most Significant Change or even for Focus Group Discussions, FGD). Only a random selection of respondents can justify generalising some of the analysis' findings, while stating the number of observations taken into account.

If contact lists are not available, villages or municipalities should be selected randomly and the list of individuals should be sought from the mayors and/or village chiefs. It is also possible to make the selection directly in the field using the "random walk". A procedure adapted to the field is defined in order to identify who to question (e.g. take the first right on arriving in the village and question the 3rd house, etc.).

74 The first is frequently called secondary data and the second, primary data. 75 See Chapter 5 which explains the methodological approach implemented for each of the selected

interventions. For example, in the Philippines, it is primarily a qualitative approach, while in Indonesia, the quantitative approach dominates.

76 www.surveyCTO.com.



It should be noted that the most scientifically rigorous approach is to select the beneficiaries randomly from the start of the intervention (randomized control trial77). This means that two groups, targets and controls, are created purely at random, which ensures the credibility of the counterfactual and, if done properly, allows the effects that can be attributed to the intervention to be measured with great precision. This approach is only possible if the methodology is developed before the intervention is implemented.

3. Having relevant collection tools. Interview guides (for example for FGD) or questionnaires (for household surveys) must be produced in which the questions are clearly formulated and are relevant to the desired objective and the type of contact (translation into the local language, adapt the questions to the local culture, ensure that the responses received allow the construction of measurement indicators for the effects, etc.). These collection tools should be tested in the field and then reviewed in order to check their quality. The use of tablets to collect data is recommended, especially for household surveys.78 The investment is small and offers significant savings in terms of encoding and cleaning data.

4. A well-trained and motivated team. The people responsible for collecting such information must be trained together in the use of the guides and/or questionnaires. They also need to be trained how to use the collection method (use of paper questionnaire or on tablet/smartphone, how to respect the random sample selected, etc.). These enumerators or the people responsible for the monitoring and evaluation must also be tested in the field, at the same time as the collection tool (check whether they do indeed have a neutral position with regards the respondents). They must have good incentives to produce quality work (team spirit, suitable remuneration, reporting of issues, etc.).

The information processing must be implemented by individuals who know at least the context and the intervention and who have the necessary skills to use quantitative tools. The findings of the qualitative analysis can be tested using quantitative analyses on the data collected, and conversely, the results of quantitative analyses (particularly the most surprising ones) can be explained by the qualitative information collected (complementarity of the approaches). The greater the amount of data, the more it is reliable and relevant to the analysis, then the simpler the techniques. In some cases, statistical processing requires the application of sophisticated techniques, although this does not justify the quality of the processing. A good analysis also presents its limits and identifies potential biases for under or over estimation of the effects (biases related to the tool, the people responsible for data collection, the information collection period, a problem with the credibility of the counterfactual, etc.). Furthermore, the readability of the results and the explanation that is given for the process which has or has not led to changes are two essential elements for learning from the evaluation (learning process).

Note that publicly available data is usually reliable. Nevertheless, it is important to read certain documents that clarify the context of the collection or possible biases.

77 "Randomized control trial" Shahidur R. et al, 2010. 78 www.surveyCTO.com is a good software package for this type of survey.



9. Bibliography

AFD, Renforcer la pertinence, l'utilité et la fiabilité des évaluations d'impact à l'AFD [Strengthening the relevance usefulness and reliability of impact evaluations at AFD] (Framework Paper), 23 June 2014.

Amsden J., VanWynsberghe R. Community mapping as a research tool with youth, 2005.

ATOL, L'AURA: L'Auto-Renforcement Accompagné, February 2003.

Bamberger, M. Introduction to mixed methods in impact evaluation in Impact Évaluation Notes, N°3. August 2012

Bédécarrats, F., Guérin, I., Roubaud, F. The gold standard for randomised evaluations: from discussion of method to political economics, CEB Working Paper N°15/009, March 2015.

CARE International UK. Defining Theories of Change, January 2012.

Church C., Rogers M. M. Designing for Results: Integrating Monitoring and Evaluation in Conflict Transformation Programs: Part 1 and 2. Search for Common Ground, 2006.

Coates, B., David, R. Learning for Change: The Art of Assessing the Impact of Advocacy Work, Development in practice, Vol. 12, No 3/4, August 2002.

Coffey and DFID. Evaluation Manager PPA and GPAF: Evaluation Strategy, February 2012.

Delta. Methodologie inzake de financiële controle op toegekende subsidies aan NGO’s. 2003.

DFID, Broadening the range of designs and methods for impact evaluations, Working paper 38, April 2012

Directorate General for Development Cooperation and Humanitarian Aid 'Résultats de développement' [Development Results] Strategy Paper. July 2014.

Equal Access Participatory Monitoring and Evaluation toolkit Module 2: Setting objectives and indicators.

Evaluation Gap Working Group. When will we ever learn? Improving Lives through Impact Evaluation, May 2006.

NGO Federation. "Communication aux ONG membres - Un accord-cadre entre le Ministre et les représentants des acteurs de la coopération non gouvernementale" [Communication to member NGOs - A framework agreement between the Ministry and representatives of non-governmental cooperation stakeholders], 11-09-2015.

Friedman J., Involving local non-state capacity to improve service delivery: it can be more difficult than it appears, World Bank blog, 26/06/2014.

Goüet, C. Farmers Advocacy Consultation Tool - FACT, January 2014.

Hughes, K., Hutchings, C. Can we obtain the required rigour without randomisation? Oxfam GB's non-experimental Global Performance Framework in International Initiative for Impact Evaluation, Working paper 13, August 2011.



IBM. Évaluation du processus de financement des ONG belges par programmes quinquennaux [Evaluation of the funding process for Belgian NGOs by five-year programmes], 2003.

IOB (Terms of references). "Dialogue and Dissent", Strategic Partnerships for Lobbying and Advocacy.

James C., Theory of Change review, Comic Relief, Sept. 2011.

Jane Goodall Institute, Community Mapping 101 Assessing Community Needs & Potential Resources.

Jones, H. A guide to monitoring and evaluating policy influence, ODI, Feb. 2011.

Keystone. Developing a theory of change: A guide to developing a theory of change as a framework for inclusive dialogue, learning and accountability for social impact, 2008.

Merchant-Vega, N. Practical challenges of rigorous impact evaluation in International governance NGOs: Experience and Lessons from the Asia foundation. November 2011.

Morra Imas L., Rist, R. C. The Road to Results: Designing and Conducting Effective Development Evaluations, Understanding the Evaluation Context and the Program Theory of Change, World Bank, 2009.

NONIE. Carefully articulate the theories linking interventions to outcomes. Impact Evaluations and Development: Nonie Guidance on Impact Evaluation (pp. 15-19), 2003.

NORAD. Evaluation of Norwegian Development Cooperation, Annual report 2014/2015

ODI, The Impact of NGO development projects, Briefing paper, May 1996.

OECD Development Assistance Committee, Guidance on Evaluation Conflict Prevention and Peacebuilding Activities, 2008.

Outcome Mapping Learning Community, When is Outcome Mapping not suitable? Demonstrating a diagnostic tool.

Puri Jyotsna, Embracing complexity: Some experiences from impact evaluations of development programs. A quarterly knowledge publication from Independent Development Evaluation at the African Development Bank Group, eVALUation Matters, 2015.

PWC consultant. Analyse des capacités d’organisations non gouvernementales de coopération au développement belges en vue de leur agrément ‘programme’, Final Report, March 2007.

Retolaza, I. Theory of Change: A thinking and action approach to navigate in the complexity of social change processes, Hivos/UNDP/Democratic Dialogue, 2011.

Rick Davis, Qualitative comparative analysis, Better Evaluation, 2014.

Savedoff, W. Impact Evaluations Everywhere: What's a Small NGO to do? 9/17/2012.

SEO. Évaluation du rapportage des résultats de la DGD [Evaluation of the reporting of the DGD's results], 2012.

SEO. Méta-évaluation des programmes des acteurs non-gouvernementaux [Meta-evaluation of the Programmes of Non-Governmental Stakeholders], Special Evaluation Office of International Cooperation, Final Report, July 2013.

SEO. Evaluation ex post de 4 projets de la coopération gouvernementale [Ex post evaluation of 4 governmental cooperation projects], Synthesis Report, Special Evaluation Office of International Cooperation, October 2013.



SEO. Etude de l’évaluabilité pratique des interventions cofinancées par la coopération belge, [Study on the practical evaluability of interventions co-funded by Belgian development cooperation], Final Report, November 2015.

Shahidur R., Khandker, Gayatri B., Koolwal, Hussain A., Samad. Handbook on Impact Evaluation: Quantitative Methods and Practices. World Bank, 2010.

Shapiro I., Extending the Framework of Inquiry: Theories of Change in Conflict Interventions, Berghof Handbook, 2006.

Shapiro, J. Monitoring and Evaluation, CIVICUS.

Smutylo, T. Outcome Mapping: A method for tracking behavioral changes in development programs, ILAC Brief 7. August 2005.

Stein D., Valters C., Understanding theory of change in international development, The Asia Foundation, August 2012.

Stern E., Stame N., Mayne J., Forss K., Davies R. and Befani B., Broadening the range of designs and methods for impact evaluation, DFID, Working paper 38, April 2012.

UNFPA, Programme Manager's Planning, Monitoring & Evaluation Toolkit, Tool 5: Planning and Managing an Evaluation, Part III: The Data Collection Process, August 2004.

Vanclay, F. Guidance for the design of qualitative case study evaluation, February 2012.

Vogel I., Review of the use of "Theory of change" in international development, DFID, April 2012.

Westhorp, G. Realist Impact Evaluation. Method Lab (www.ODI.org/methodlab).

Weyrauch, V., D'Agostino, J., Richards, C. Learners, practitioners and teachers: handbook on monitoring, evaluating and managing knowledge for policy influence, 2011.

Weyrauch, V., Echt, L. Toolkit N°1: What is an influence plan? Why should we plan, at How to design a policy influence plan?, CIPPEC, 2012.

Websites:

http://www.actknowledge.org/theory-of-change/

http://www.theoryofchange.nl/

http://betterevaluation.org/evaluation-options/mapping_stakeholders

http://www.mande.co.uk/docs/MSCGuide.pdf

http://www.econ.yale.edu/~cru2/pdf/Science-2015-TUP.pdf

Which future for impact evaluation in the Belgian NGO sector ?

Lessons from four case studies

Synthesis

inpu

ts o

utputs

outcomes impact

KINGDOM OF BELGIUM

Federal Public Service

Foreign Affairs,Foreign Trade andDevelopment Cooperation


KINGDOM OF BELGIUM

Federal Public ServiceForeign Affairs,Foreign Trade andDevelopment Cooperation


Editor: Dirk AchtenPresident of the ExecutiveGraphic Production: www.mediaprocess.be

Egmont • Rue des Petits Carmes 15, B-1000 Brussels • + 32 2 (0)2 501 38 34 www.diplomatie.belgium.be • www.dg-d.be • [email protected]

Dépôt légal n° 0218/2016/016

Wh

ich fu

ture

for im

pact e

valu

atio

n in

the B

elgia

n N

GO

secto

r ? - Lesso

ns fro

m fo

ur ca

se stu

die

s

Peru

. Pr

ojec

t ID

P ©

Sven

Kru

g

Tanz

ania

. Pr

ojec

t TR

IAS ©

AD

EIn

done

sia

©Ja

cque

line

Lién

ard

Phili

ppin

es ©

AD

E

which future for impact evaluation in the belgian ngo … · 2016-05-04 · which future for impact...

Documents